You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, both scores are logged with the same key (I'm using wandb logger), meaning that they are merged into a single metric. I can always get the two values separately using their API, but in their UI, it's not easy (if even possible) to see and compare them.
This is also a problem when trying to evaluate two different checkpoints:
Ideally, it would be handy to allow Trainer.test (and maybe the other fit, validate and predict) to take kwargs arguments, that would be directly passed to LightningModule.test_step and LightningModule.on_test_epoch_end.
This would allow letting the user managing the logging process depending on its own arguments:
# Training scripttrainer.test(model, dataloaders=test_dataloader1, name="test1")
trainer.test(model, dataloaders=test_dataloader2, name="test2")
This would result in score being logger to test1/acc and test2/acc, making it easy to differentiate them in the wandb UI and the logs.
Alternatives
For the case of multiple test sets, one could first merge them and passing them as one unique dataloader. However, this prevents comparing the performance on each individual dataset.
Description & Motivation
In my problem, I need to evaluate my trained model twice, on two different sets at the end of my training:
However, both scores are logged with the same key (I'm using wandb logger), meaning that they are merged into a single metric. I can always get the two values separately using their API, but in their UI, it's not easy (if even possible) to see and compare them.
This is also a problem when trying to evaluate two different checkpoints:
Pitch
Ideally, it would be handy to allow
Trainer.test
(and maybe the otherfit
,validate
andpredict
) to takekwargs
arguments, that would be directly passed toLightningModule.test_step
andLightningModule.on_test_epoch_end
.This would allow letting the user managing the logging process depending on its own arguments:
This would result in score being logger to
test1/acc
andtest2/acc
, making it easy to differentiate them in the wandb UI and the logs.Alternatives
For the case of multiple test sets, one could first merge them and passing them as one unique dataloader. However, this prevents comparing the performance on each individual dataset.
Additional context
No response
cc @Borda
The text was updated successfully, but these errors were encountered: