[DataLoader] Select available CUDA or 3rd devices automatically to pin memory #125016

wizzniu · 2024-04-26T10:18:06Z

This PR improves the behavior for DataLoader when pin_memory is True and pin_memory_device is not set. Instead of only checking whether CUDA is available or not, a more reasonable approach is that first check available CUDA device, and if not, get privateuse1 backend automatically to pin memory to. Otherwise, throw some warnings that pin memory doesn't take effect.

Fixes #124908

cc @ssnl @VitalyFedyunin @ejguan @dzhulgakov @ezyang

…DataLoader

pytorch-bot · 2024-04-26T10:18:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125016

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 144a6e0 with merge base 769b1e6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-04-26T10:18:10Z

✅login: wizzniu / (144a6e0)

The committers listed above are authorized under a signed CLA.

vadimkantorov · 2024-04-26T16:36:55Z

Would it be better to explicitly accept a pin_memory_device to a DataLoader constructor? and fix things light PyTorch Lightning / HF Trainer to pass it in (instead of mixing in the data loader device detection)?

wizzniu · 2024-04-28T03:49:19Z

Would it be better to explicitly accept a pin_memory_device to a DataLoader constructor? and fix things light PyTorch Lightning / HF Trainer to pass it in (instead of mixing in the data loader device detection)?

This PR maintains BC in the parameters pin_memory and pin_memory_device of API. You still can explicitly pass in a pin_memory_device to it.

wizzniu · 2024-04-30T09:49:25Z

@ezyang @albanD @ejguan Can you help to review this PR?

ezyang · 2024-04-30T15:10:54Z

I'm deferring to @albanD for custom backend stuff

wizzniu · 2024-05-07T06:22:22Z

I'm deferring to @albanD for custom backend stuff

Thanks for your reply. Could you give some suggestions, @albanD ?

albanD · 2024-05-07T19:19:35Z

Yes sorry, I wanted to give a longer answer on this but here is the short version:

We do not want for every API that takes pin_memory: bool flag to grow a pin_memory_device argument.
We want to use the newly introduced concept of accelerator at

pytorch/aten/src/ATen/DeviceAccelerator.h

Line 25 in 196a0b1

TORCH_API std::optional<c10::DeviceType> getAccelerator(bool checked = false);

to redefine pin memory as: "Pin memory is always pinned for the current accelerator device".
This will allow us to have all the other factory function APIs work without the need for new arguments.

What it means for this PR is a bit more subtle:

Long term, we most likely just want to drop the pin_memory_device argument altogether
This will require updating the implementation of the pin memory concept for low level APIs
If you have time, it would be best to fix the low level constructs directly. If you do not, I'm happy to have a quick fix here that consists of binding the "getAccelerator()" API to python and use that here to set the default value of pin_memory_device. This is not perfect but a step in the direction where we want to go.

WDYT?

mengpenghui · 2024-05-09T04:04:50Z

@albanD
I am working on IPC-related interfaces to support third-party devices. The suggestions you gave in this PR are very helpful for modifying IPC. Now I have a new plan:

The share_cuda_ interface is changed to share_device_
The backend uses getAccelerator to determine the device and distribute different devices.
Add the share_device implementation of third-party devices to PrivateUse1HooksInterface

I plan to modify the new solution on this PR: #125122
Do you think this is feasible? Looking forward to your reply.

wizzniu · 2024-05-10T06:36:02Z

Yes sorry, I wanted to give a longer answer on this but here is the short version:

We do not want for every API that takes pin_memory: bool flag to grow a pin_memory_device argument.

We want to use the newly introduced concept of accelerator at

pytorch/aten/src/ATen/DeviceAccelerator.h

Line 25 in 196a0b1

TORCH_API std::optional<c10::DeviceType> getAccelerator(bool checked = false);

to redefine pin memory as: "Pin memory is always pinned for the current accelerator device".

This will allow us to have all the other factory function APIs work without the need for new arguments.

What it means for this PR is a bit more subtle:

Long term, we most likely just want to drop the pin_memory_device argument altogether

This will require updating the implementation of the pin memory concept for low level APIs

If you have time, it would be best to fix the low level constructs directly. If you do not, I'm happy to have a quick fix here that consists of binding the "getAccelerator()" API to python and use that here to set the default value of pin_memory_device. This is not perfect but a step in the direction where we want to go.

WDYT?

I agree with you that additional device argument is unnecessary when using pin_memory related APIs, which requires updating the implementation of the pin memory concept for low level APIs. And I am happy to fix the low level constructs directly.
So here I want to confirm whether our goal is to get rid of device argument of all pin_memory related APIs(e.g. Tensor.pin_memory())? And if it is, I will take time to work on this.

albanD · 2024-05-10T14:42:50Z

So here I want to confirm whether our goal is to get rid of device argument of all pin_memory related APIs(e.g. Tensor.pin_memory())?

Yes! But also note that most of these APIs do NOT have a device argument today. So not much to remove for most of them haha
The main thing is that we want to enable any accelerator pinned memory but we don't want to introduce this device argument everywhere. So changing the low level constructs to leverage the Accelerator concept is the way to go.
Let me know if you think this needs more clarification!

wizzniu · 2024-05-16T06:25:43Z

@albanD hi, I have updated the implement of pin memory in a new PR, please see #126376

Select available CUDA or 3rd devices automatically to pin memory for …

144a6e0

…DataLoader

wizzniu requested a review from ejguan as a code owner April 26, 2024 10:18

pytorch-bot bot added the release notes: dataloader release notes category label Apr 26, 2024

pytorchbot added the open source label Apr 26, 2024

mikaylagawarecki requested a review from albanD April 26, 2024 22:07

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DataLoader] Select available CUDA or 3rd devices automatically to pin memory #125016

[DataLoader] Select available CUDA or 3rd devices automatically to pin memory #125016

wizzniu commented Apr 26, 2024

pytorch-bot bot commented Apr 26, 2024 •

edited

linux-foundation-easycla bot commented Apr 26, 2024 •

edited

vadimkantorov commented Apr 26, 2024

wizzniu commented Apr 28, 2024

wizzniu commented Apr 30, 2024

ezyang commented Apr 30, 2024

wizzniu commented May 7, 2024

albanD commented May 7, 2024

mengpenghui commented May 9, 2024

wizzniu commented May 10, 2024

albanD commented May 10, 2024

wizzniu commented May 16, 2024

[DataLoader] Select available CUDA or 3rd devices automatically to pin memory #125016

Are you sure you want to change the base?

[DataLoader] Select available CUDA or 3rd devices automatically to pin memory #125016

Conversation

wizzniu commented Apr 26, 2024

pytorch-bot bot commented Apr 26, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125016

✅ No Failures

linux-foundation-easycla bot commented Apr 26, 2024 • edited

vadimkantorov commented Apr 26, 2024

wizzniu commented Apr 28, 2024

wizzniu commented Apr 30, 2024

ezyang commented Apr 30, 2024

wizzniu commented May 7, 2024

albanD commented May 7, 2024

mengpenghui commented May 9, 2024

wizzniu commented May 10, 2024

albanD commented May 10, 2024

wizzniu commented May 16, 2024

pytorch-bot bot commented Apr 26, 2024 •

edited

linux-foundation-easycla bot commented Apr 26, 2024 •

edited