New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert generator in Sampler back to lazy construction #63646
Conversation
[ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit d506308 (more details on the Dr. CI page):
1 failure not recognized by patterns:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Fixes #63609 [ghstack-poisoned]
@ejguan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ejguan looks good to me. Thanks. I left a nit comment.
Fixes #63609 Differential Revision: [D30451774](https://our.internmc.facebook.com/intern/diff/D30451774) [ghstack-poisoned]
@ejguan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
torch/utils/data/sampler.py
Outdated
else: | ||
yield from torch.randperm(n, generator=self.generator).tolist() | ||
yield from torch.randperm(n, generator=self._gen).tolist() | ||
self._gen = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will badly impact all currently running iterators, consider creating two individual iterators in test and yielding them in random order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emmm, you are right. So, we have to create iter-local generator. But, then it goes back to the problem that we can not serialize the state of this iter-local generator anymore. Then, when we resume from snapshotting, we can not really fast forward the state of generator, have to iterate over it to the certain amount of iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not going to work as intended for two or more iterators created from same object.
a = Sampler()
i1 = iter(a)
i2 = iter(a)
Return new generator from
|
Yeah, that's also something in my mind. Will update PR. |
Fixes #63609 Differential Revision: [D30451774](https://our.internmc.facebook.com/intern/diff/D30451774) [ghstack-poisoned]
ghstack-source-id: 81dde49f141f887850c1f2c87e68d35337e585a3 Pull Request resolved: #63646
Fixes #63609 Differential Revision: [D30451774](https://our.internmc.facebook.com/intern/diff/D30451774) [ghstack-poisoned]
ghstack-source-id: 17692bbbb9a7b9922e17426a543b2d0f6b59e1bf Pull Request resolved: #63646
@ejguan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Fixes #63609 - Revert #63026 - Sampler is expected to be re-seeded if user specify seed before each epoch - Can not attach generator to self with `__iter__` because multiple iterators will ruin the use case - Add tests to prevent the same case for different Samplers Differential Revision: [D30451774](https://our.internmc.facebook.com/intern/diff/D30451774) [ghstack-poisoned]
Fixes #63609 - Revert #63026 - Sampler is expected to be re-seeded if user specify seed before each epoch - Can not attach generator to self with `__iter__` because multiple iterators will ruin the use case - Add tests to prevent the same case for different Samplers - Keep same functionality introduced in #63026 that user can serialize the Sampler. Differential Revision: [D30451774](https://our.internmc.facebook.com/intern/diff/D30451774) [ghstack-poisoned]
@ejguan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Fixes #63609 - Revert #63026 - Sampler is expected to be re-seeded if user specify seed before each epoch - Can not attach generator to self with `__iter__` because multiple iterators will ruin the use case - Add tests to prevent the same case for different Samplers Differential Revision: [D30451774](https://our.internmc.facebook.com/intern/diff/D30451774) [ghstack-poisoned]
I split the original PR into two PRs. The another PR provide a new feature for potential users to serialize the iter-local generator. |
@ejguan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary: Pull Request resolved: pytorch#63646 Fixes pytorch#63609 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D30451774 Pulled By: ejguan fbshipit-source-id: 550d77494326446d1a42b5da0559e0d384c47413
ghstack-source-id: bf4b2badb725d51e23b570f8c43e903fc6d9bb71 Pull Request resolved: pytorch/pytorch#63646
Fixes #63609
Stack from ghstack:
__iter__
because multiple iterators will ruin the use caseDifferential Revision: D30451774