-
Notifications
You must be signed in to change notification settings - Fork 449
Open
Description
你好。我在trainer中设置了如下参数(
trainer = Trainer(
driver="torch",
train_dataloader=dl["train"],
evaluate_dataloaders=dl["dev"],
device=[4,7],
callbacks=callback,
optimizers=optimizer,
n_epochs=args.epoch,
accumulation_steps=args.accumulation_steps,
torch_kwargs = {'ddp_kwargs':{'find_unused_parameters':True}}
)
trainer.run())确实是在两张卡上运行了起来 但是训练过程打印的loss:NAN,并且每个epoch打印的每个指标都是一个相同的值,请问问题出在哪里
Metadata
Metadata
Assignees
Labels
No labels