Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs are logged twice with on_step=True and on_epoch=True #4100

Closed
rohitgr7 opened this issue Oct 12, 2020 · 7 comments
Closed

Logs are logged twice with on_step=True and on_epoch=True #4100

rohitgr7 opened this issue Oct 12, 2020 · 7 comments
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@rohitgr7
Copy link
Contributor

rohitgr7 commented Oct 12, 2020

馃悰 Bug

Logs are logged twice with on_step=True and on_epoch=True

Screenshot from 2020-10-13 01-43-26

To Reproduce

Code:
https://gist.github.com/rohitgr7/9322f92daee7d1991b836abd9f7b63f1

Expected behavior

Should log only train_loss_step and train_loss_epoch.

Environment

Please copy and paste the output from our

  • CUDA:
    • GPU: 馃様
    • available: False
    • version: 10.2
  • Packages:
    • numpy: 1.19.2
    • pyTorch_debug: False
    • pyTorch_version: 1.6.0
    • pytorch-lightning: 1.0.0rc5
    • tqdm: 4.50.2
  • System:
    • OS: Linux
    • architecture:
      • 64bit
      • ELF
    • processor: x86_64
    • python: 3.8.2
    • version: Rename ptl to pl聽#52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020

Additional context

@rohitgr7 rohitgr7 added bug Something isn't working help wanted Open to be worked on labels Oct 12, 2020
@xcliang2
Copy link

The strategy in the code is always log the original metric. Even if he looks weird, but i don't think this is a bug.

@rohitgr7
Copy link
Contributor Author

_step and _epoch are added only when both on_step=True and on_epoch=True. I don't think it make sense to log the metric with same data with original metric_name here since it's already logged with _step.

@xcliang2
Copy link

In my opinion, if you want to monitor the metric you defined in callback funciton (like ModelCheckpoint), any suffix or prefix should not add to your metric_name. I don't know if this is the reason.

@tchaton
Copy link
Contributor

tchaton commented Oct 14, 2020

Hey @rohitgr7,

_step and _epoch are added only when both on_step=True and on_epoch=True. I don't think it make sense to log the metric with same data with original metric_name here since it's already logged with _step.

I used logger=pl.loggers.CSVLogger() to visualise logs. I can see the duplicated value as reported.

Screenshot 2020-10-14 at 12 07 52

@rohitgr7 Please correct me if I am wrong, but you are suggesting train_loss should not be logged as redundant ?

IMO, I think it is a design choice from @williamFalcon to save a duplicated value, so the user can refer to it from its original logged key.

Best regards,
Thomas Chaton.

@williamFalcon
Copy link
Contributor

yeah, that's right.

When you log something at both step and epoch, how do we title it correctly? that's where we add the _step and _epoch.

But, when the user wants the metric directly, they rarely do metric_step but instead use the metric directly.

Open to suggestions about avoiding double logging.

Maybe one option is to not add a _step metric but instead use the original one as named? i think there were issues with that too though.

@rohitgr7
Copy link
Contributor Author

Maybe one option is to not add a _step

ModelCheckpoint might not work if you do this, since it will monitor _step metric rather than _epoch metric.

IMO a warning or info should be raised in case both on_step and on_epoch is set to True about the changes in the metric name. Something like

`metric_name` has been changed to `metric_name_step` and `metric_name_epoch` since `on_step=True` and
`on_epoch=True`. Please refer these names as such while accessing them or monitoring them accordingly.

@rohitgr7
Copy link
Contributor Author

I believe it's fixed here #4169

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

4 participants