-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
on_train_epoch_end and on_epoch_end are out of order #4001
Comments
Hi! thanks for your contribution!, great first issue! |
on_train_epoch_end
and on_epoch_end
are out of order
@wyessen it seems that the flow is incorrect the |
True, but depends how we should look at it: should validation be considered part of the training epoch scope? If so, then the current flow is fine; otherwise, you're right, it's incorrect. So, the original bug report complains about incorrect closing of the scope given the order in which the scopes were opened. You raise a valid issue, and in the broader scheme of things the current flow should be reconsidered. |
yes. validation is part of the flow. as mentioned many times, big research requires checking val multiple times within an epoch Train: --------------------------- (1 epoch = 2 days) Then we have: e1 = on_epoch_start Train: e1 t1 ------------------------------------------------------------- t2 e2 |
here is an added test to check the actual flow #4010 |
@williamFalcon @Borda So as to not hijack the original bug report, I want to clarify: The flow executed by PytorchLightning is incorrect in the sense that opening of a scope (with |
@SeanNaren Why did you close this issue? Your PR does not fix this. |
Apologies, I think the PR associated with this issue was incorrect! EDIT: after looking at the associated PR and the discussion here, I do think this PR addresses the issue of ensuring the order is correct. Was there anything in particular that wasn't addressed @wyessen? |
@wyessen this was closed with @williamFalcon explanation that the behavior is as expected 馃惏 |
Will close again for now... |
@Borda the behavior is not expected, please read my explanation (@williamFalcon was responding to @Borda鈥檚 message, which was different from my original issue). |
馃悰 Bug
Consider the following order in which the
LightningModule
hooks are called from #2816 (I have confirmed that in PytorchLightning version 0.10 this is still an issue):Naturally one would expect the opening and closing scope hooks to match. However,
on_train_epoch_end
is called afteron_epoch_end
, which seems incorrect. It is natural to open the epoch scope before the train epoch scope (as is being done currently), in which case the epoch scope should be closed after closing the train epoch scope (which is not currently being done)conda
,pip
, source):pip
The text was updated successfully, but these errors were encountered: