Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataLoader] Select available CUDA or 3rd devices automatically to pin memory #125016

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wizzniu
Copy link

@wizzniu wizzniu commented Apr 26, 2024

This PR improves the behavior for DataLoader when pin_memory is True and pin_memory_device is not set. Instead of only checking whether CUDA is available or not, a more reasonable approach is that first check available CUDA device, and if not, get privateuse1 backend automatically to pin memory to. Otherwise, throw some warnings that pin memory doesn't take effect.

Fixes #124908

cc @ssnl @VitalyFedyunin @ejguan @dzhulgakov @ezyang

@wizzniu wizzniu requested a review from ejguan as a code owner April 26, 2024 10:18
Copy link

pytorch-bot bot commented Apr 26, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125016

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 144a6e0 with merge base 769b1e6 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link

linux-foundation-easycla bot commented Apr 26, 2024

CLA Signed


The committers listed above are authorized under a signed CLA.

@pytorch-bot pytorch-bot bot added the release notes: dataloader release notes category label Apr 26, 2024
@vadimkantorov
Copy link
Contributor

Would it be better to explicitly accept a pin_memory_device to a DataLoader constructor? and fix things light PyTorch Lightning / HF Trainer to pass it in (instead of mixing in the data loader device detection)?

@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 26, 2024
@wizzniu
Copy link
Author

wizzniu commented Apr 28, 2024

Would it be better to explicitly accept a pin_memory_device to a DataLoader constructor? and fix things light PyTorch Lightning / HF Trainer to pass it in (instead of mixing in the data loader device detection)?

This PR maintains BC in the parameters pin_memory and pin_memory_device of API. You still can explicitly pass in a pin_memory_device to it.

@wizzniu
Copy link
Author

wizzniu commented Apr 30, 2024

@ezyang @albanD @ejguan Can you help to review this PR?

@ezyang
Copy link
Contributor

ezyang commented Apr 30, 2024

I'm deferring to @albanD for custom backend stuff

@wizzniu
Copy link
Author

wizzniu commented May 7, 2024

I'm deferring to @albanD for custom backend stuff

Thanks for your reply. Could you give some suggestions, @albanD ?

@albanD
Copy link
Collaborator

albanD commented May 7, 2024

Yes sorry, I wanted to give a longer answer on this but here is the short version:

  • We do not want for every API that takes pin_memory: bool flag to grow a pin_memory_device argument.
  • We want to use the newly introduced concept of accelerator at
    TORCH_API std::optional<c10::DeviceType> getAccelerator(bool checked = false);
    to redefine pin memory as: "Pin memory is always pinned for the current accelerator device".
  • This will allow us to have all the other factory function APIs work without the need for new arguments.

What it means for this PR is a bit more subtle:

  • Long term, we most likely just want to drop the pin_memory_device argument altogether
  • This will require updating the implementation of the pin memory concept for low level APIs
  • If you have time, it would be best to fix the low level constructs directly. If you do not, I'm happy to have a quick fix here that consists of binding the "getAccelerator()" API to python and use that here to set the default value of pin_memory_device. This is not perfect but a step in the direction where we want to go.

WDYT?

@mengpenghui
Copy link

@albanD
I am working on IPC-related interfaces to support third-party devices. The suggestions you gave in this PR are very helpful for modifying IPC. Now I have a new plan:

  1. The share_cuda_ interface is changed to share_device_
  2. The backend uses getAccelerator to determine the device and distribute different devices.
  3. Add the share_device implementation of third-party devices to PrivateUse1HooksInterface

I plan to modify the new solution on this PR: #125122
Do you think this is feasible? Looking forward to your reply.

@wizzniu
Copy link
Author

wizzniu commented May 10, 2024

Yes sorry, I wanted to give a longer answer on this but here is the short version:

  • We do not want for every API that takes pin_memory: bool flag to grow a pin_memory_device argument.
  • We want to use the newly introduced concept of accelerator at
    TORCH_API std::optional<c10::DeviceType> getAccelerator(bool checked = false);

    to redefine pin memory as: "Pin memory is always pinned for the current accelerator device".
  • This will allow us to have all the other factory function APIs work without the need for new arguments.

What it means for this PR is a bit more subtle:

  • Long term, we most likely just want to drop the pin_memory_device argument altogether
  • This will require updating the implementation of the pin memory concept for low level APIs
  • If you have time, it would be best to fix the low level constructs directly. If you do not, I'm happy to have a quick fix here that consists of binding the "getAccelerator()" API to python and use that here to set the default value of pin_memory_device. This is not perfect but a step in the direction where we want to go.

WDYT?

I agree with you that additional device argument is unnecessary when using pin_memory related APIs, which requires updating the implementation of the pin memory concept for low level APIs. And I am happy to fix the low level constructs directly.
So here I want to confirm whether our goal is to get rid of device argument of all pin_memory related APIs(e.g. Tensor.pin_memory())? And if it is, I will take time to work on this.

@albanD
Copy link
Collaborator

albanD commented May 10, 2024

So here I want to confirm whether our goal is to get rid of device argument of all pin_memory related APIs(e.g. Tensor.pin_memory())?

Yes! But also note that most of these APIs do NOT have a device argument today. So not much to remove for most of them haha
The main thing is that we want to enable any accelerator pinned memory but we don't want to introduce this device argument everywhere. So changing the low level constructs to leverage the Accelerator concept is the way to go.
Let me know if you think this needs more clarification!

@wizzniu
Copy link
Author

wizzniu commented May 16, 2024

@albanD hi, I have updated the implement of pin memory in a new PR, please see #126376

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open source release notes: dataloader release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataLoader's pin_memory is default to CUDA if parameter pin_memory_device is not set
7 participants