Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save the loop progress state by default #10784

Merged
merged 17 commits into from Dec 17, 2021
Merged

Conversation

carmocca
Copy link
Member

@carmocca carmocca commented Nov 27, 2021

What does this PR do?

Save the Loop's progress tracking state in the checkpoint.

We don't load it yet as that will be a breaking change. To be done in a follow-up.

Does your PR introduce any breaking changes? If yes, please list them.

None

Before submitting

  • [n/a] Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • [n/a] Did you make sure to update the documentation with your changes? (if necessary)
  • [n/a] Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

cc @Borda @awaelchli @ananthsub @ninginthecloud @carmocca @justusschock

@carmocca carmocca self-assigned this Nov 27, 2021
@carmocca carmocca added this to the 1.6 milestone Nov 27, 2021
@carmocca carmocca added checkpointing Related to checkpointing fault tolerance labels Nov 27, 2021
Base automatically changed from refactor/minor-changes to master November 30, 2021 14:07
@github-actions
Copy link
Contributor

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@mergify mergify bot added the ready PRs ready to be merged label Nov 30, 2021
@carmocca carmocca marked this pull request as draft November 30, 2021 20:50
@carmocca carmocca marked this pull request as ready for review December 17, 2021 02:51
@carmocca carmocca enabled auto-merge (squash) December 17, 2021 02:51
@github-actions
Copy link
Contributor

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

@carmocca carmocca merged commit 7e10f6d into master Dec 17, 2021
@carmocca carmocca deleted the feat/save-loop-progress branch December 17, 2021 16:00
@jjenniferdai
Copy link
Contributor

n00b question, where does this happen?

We don't load it yet as that will be a breaking change. To be done in a follow-up.

restore_loops seems to just proceed as long as loops entry in ckpt dict is present?

@carmocca
Copy link
Member Author

carmocca commented Jan 7, 2022

ft_enabled avoids saving the progress tracking state here

https://github.com/PyTorchLightning/pytorch-lightning/blob/59a7ba760548baadf6dbb30864b54cb01c7225a3/pytorch_lightning/loops/base.py#L281-L282

So load_state_dict (right below) will not load it

facebook-github-bot pushed a commit to facebookresearch/mmf that referenced this pull request Feb 2, 2022
Summary:
see https://fb.workplace.com/groups/pytorchLightning/posts/1604422676560931

partially avoid changes in sync diff D33193522 (0072207) = PR Lightning-AI/pytorch-lightning#10784

(the changes in `logger_connector/result.py` are not patches, rather just correcting what the original sync diff D33193522 (0072207) missed)

Reviewed By: sinannasir, HarounH, wat3rBro

Differential Revision: D33463938

fbshipit-source-id: 9e4cdc087e670fedbb6699aed4e33b73085f8fce
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
checkpointing Related to checkpointing fault tolerance ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants