make torch.amp.autocast more generic #125103

guangyey · 2024-04-27T15:38:24Z

Stack from ghstack (oldest at bottom):

-> make torch.amp.autocast more generic #125103

Motivation

As discussed in #124479, torch.amp.autocast can NOT be completely equivalent to torch.cuda.amp.autocast and torch.cpu.amp.autocast since torch.amp.autocast has NOT the default dtype for CPU (torch.bfloat16 by default) and CUDA (torch.float16 by default) respectively. We would like torch.amp.autocast to be more generic to help the developer/customer write the device-agnostic code. Because there are not enough reasons to add device-specific autocast torch.xxx.amp.autocast for each device backend.

Solution

When None is passed to dtype, we should use torch.get_autocast_dtype to get the related dtype for each backend. Meanwhile, torch.get_autocast_dtype is necessary to be supported in JIT path for BC.

Additional Context

With this PR, torch.amp.autocast(device_type='cuda') is equivalent to torch.cuda.amp.autocast.
Add two new UTs to cover this change in eager and jit path respectively.

cc @mcarilli @ptrblck @leslie-fang-intel @jgong5 @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng

pytorch-bot · 2024-04-27T15:38:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125103

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e11d24b with merge base 5007312 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 5dbe659705ae42831596b9eb4126d126b261f01d Pull Request resolved: #125103

ghstack-source-id: 1cbc46833c8bcb946e0eff23f4311e766631d9c2 Pull Request resolved: #125103

[ghstack-poisoned]

cc mcarilli ptrblck leslie-fang-intel jgong5 ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

ghstack-source-id: 0066d18809b6652072307eb5727fa10c9b14d564 Pull Request resolved: #125103

ghstack-source-id: 4e580070ec0ec666d1a2929ac588f91c6ac7af70 Pull Request resolved: #125103

cc mcarilli ptrblck leslie-fang-intel jgong5 ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

ghstack-source-id: cd7a600e63177d5d4a7590a8c62c51cff7e10243 Pull Request resolved: #125103

ghstack-source-id: 31ac5e57dfcbf817bdbe1e45af5c21cb08d43050 Pull Request resolved: #125103

guangyey · 2024-05-06T06:38:38Z

torch/_dynamo/output_graph.py

@@ -600,28 +600,20 @@ def save_global_state(self, out=None):
        )
        global_state["grad_enabled"] = (torch.set_grad_enabled, torch.is_grad_enabled())

-        def autocast_specific_backend(


code improvements.

cc mcarilli ptrblck leslie-fang-intel jgong5 ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

jgong5 · 2024-05-07T01:21:03Z

torch/amp/autocast_mode.py

+        if dtype is None:
+            dtype = torch.get_autocast_dtype(device_type)


is it covered by existing UTs?

Add two new UTs to cover this change in eager and jit path respectively.

We should update the doc to mention the new default value for this arg?

ghstack-source-id: 62a264b2a9033119fee7ebd49a93b6252ba2c89f Pull Request resolved: #125103

ghstack-source-id: 62b35fb066c0ee740b0220e56bc153b861ff0c6e Pull Request resolved: #125103

ghstack-source-id: f4516ff524822c7994a009c6973ae7d118642b02 Pull Request resolved: #125103

# Motivation As discussed in [#124479](#124479), `torch.amp.autocast` can NOT be completely equivalent to `torch.cuda.amp.autocast` and `torch.cpu.amp.autocast` since `torch.amp.autocast` has NOT the default `dtype` for CPU (`torch.bfloat16` by default) and CUDA (`torch.float16` by default) respectively. We would like `torch.amp.autocast` to be more generic to help the developer/customer write the device-agnostic code. Because there are not enough reasons to add device-specific autocast `torch.xxx.amp.autocast` for each device backend. # Solution When `None` is passed to `dtype`, we should use `torch.get_autocast_dtype` to get the related dtype for each backend. Meanwhile, `torch.get_autocast_dtype` is necessary to be supported in JIT path for BC. # Additional Context With this PR, `torch.amp.autocast(device_type='cuda')` is equivalent to `torch.cuda.amp.autocast`. cc mcarilli ptrblck leslie-fang-intel jgong5 ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

guangyey · 2024-05-07T16:08:52Z

@albanD This PR intends to make torch.amp.autocast to be more generic. Developers can use it to write device-agnostic code instead of using torch.cuda.amp.autocast or torch.cpu.amp.autocast. Is it reasonable?

albanD · 2024-05-07T16:02:54Z

torch/amp/autocast_mode.py

+        if dtype is None:
+            dtype = torch.get_autocast_dtype(device_type)


We should update the doc to mention the new default value for this arg?

albanD · 2024-05-07T16:13:25Z

torch/utils/checkpoint.py

            ) if torch.amp.is_autocast_available(device) else contextlib.nullcontext()
-            with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), \
-                 recompute_context:
+            with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context:  # type: ignore[attr-defined]


Ho we gather and restore both the cpu context and another device's context here?
This makes this code a bit weird. But sounds fair. We definitely don't want to change the behavior here.

cc @soulitzer in case this is something you want to clean up for AC in general in a follow upnow that we have the nice API

We don't change the behavior here, just use torch.amp.autocast to be more generic code and leave the logic as it is.

yep perfect!

ghstack-source-id: 9f6516b793ad060ba57b52b1ba3bfcbe3077cd60 Pull Request resolved: #125103

# Motivation As discussed in [#124479](#124479), `torch.amp.autocast` can NOT be completely equivalent to `torch.cuda.amp.autocast` and `torch.cpu.amp.autocast` since `torch.amp.autocast` has NOT the default `dtype` for CPU (`torch.bfloat16` by default) and CUDA (`torch.float16` by default) respectively. We would like `torch.amp.autocast` to be more generic to help the developer/customer write the device-agnostic code. Because there are not enough reasons to add device-specific autocast `torch.xxx.amp.autocast` for each device backend. # Solution When `None` is passed to `dtype`, we should use `torch.get_autocast_dtype` to get the related dtype for each backend. Meanwhile, `torch.get_autocast_dtype` is necessary to be supported in JIT path for BC. # Additional Context With this PR, `torch.amp.autocast(device_type='cuda')` is equivalent to `torch.cuda.amp.autocast`. Add two new UTs to cover this change in eager and jit path respectively. cc mcarilli ptrblck leslie-fang-intel jgong5 ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

albanD

nit in doc, sounds good otherwise.

albanD · 2024-05-07T18:11:41Z

torch/amp/autocast_mode.py

@@ -191,7 +191,9 @@ def forward(self, x):
                                     Thus, you may obtain the device type of a tensor using `Tensor.device.type`.
        enabled(bool, optional):  Whether autocasting should be enabled in the region.
            Default: ``True``
-        dtype(torch_dtype, optional):  Whether to use torch.float16 or torch.bfloat16.
+        dtype(torch_dtype, optional):  Data type for ops run in autocast. It uses the default value
+            (``torch.float16`` for CUDA and ``torch.bfloat16`` for CPU, by default), given by


Suggested change

(``torch.float16`` for CUDA and ``torch.bfloat16`` for CPU, by default), given by

(``torch.float16`` for CUDA and ``torch.bfloat16`` for CPU), given by

# Motivation As discussed in [#124479](#124479), `torch.amp.autocast` can NOT be completely equivalent to `torch.cuda.amp.autocast` and `torch.cpu.amp.autocast` since `torch.amp.autocast` has NOT the default `dtype` for CPU (`torch.bfloat16` by default) and CUDA (`torch.float16` by default) respectively. We would like `torch.amp.autocast` to be more generic to help the developer/customer write the device-agnostic code. Because there are not enough reasons to add device-specific autocast `torch.xxx.amp.autocast` for each device backend. # Solution When `None` is passed to `dtype`, we should use `torch.get_autocast_dtype` to get the related dtype for each backend. Meanwhile, `torch.get_autocast_dtype` is necessary to be supported in JIT path for BC. # Additional Context With this PR, `torch.amp.autocast(device_type='cuda')` is equivalent to `torch.cuda.amp.autocast`. Add two new UTs to cover this change in eager and jit path respectively. cc mcarilli ptrblck leslie-fang-intel jgong5 ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

ghstack-source-id: ccfcbf2a1d43c618076ca9d883fab4d462dd2632 Pull Request resolved: #125103

ghstack-source-id: c05a9e0166e9fdfc6fc3284f876f96ba236e3939 Pull Request resolved: #125103

guangyey · 2024-05-08T07:56:14Z

@pytorchbot merge

pytorchmergebot · 2024-05-08T07:58:20Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Motivation As discussed in [#124479](#124479), `torch.amp.autocast` can NOT be completely equivalent to `torch.cuda.amp.autocast` and `torch.cpu.amp.autocast` since `torch.amp.autocast` has NOT the default `dtype` for CPU (`torch.bfloat16` by default) and CUDA (`torch.float16` by default) respectively. We would like `torch.amp.autocast` to be more generic to help the developer/customer write the device-agnostic code. Because there are not enough reasons to add device-specific autocast `torch.xxx.amp.autocast` for each device backend. # Solution When `None` is passed to `dtype`, we should use `torch.get_autocast_dtype` to get the related dtype for each backend. Meanwhile, `torch.get_autocast_dtype` is necessary to be supported in JIT path for BC. # Additional Context With this PR, `torch.amp.autocast(device_type='cuda')` is equivalent to `torch.cuda.amp.autocast`. Add two new UTs to cover this change in eager and jit path respectively. cc mcarilli ptrblck leslie-fang-intel jgong5 ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

Summary: # Motivation As discussed in [#124479](pytorch/pytorch#124479), `torch.amp.autocast` can NOT be completely equivalent to `torch.cuda.amp.autocast` and `torch.cpu.amp.autocast` since `torch.amp.autocast` has NOT the default `dtype` for CPU (`torch.bfloat16` by default) and CUDA (`torch.float16` by default) respectively. We would like `torch.amp.autocast` to be more generic to help the developer/customer write the device-agnostic code. Because there are not enough reasons to add device-specific autocast `torch.xxx.amp.autocast` for each device backend. # Solution When `None` is passed to `dtype`, we should use `torch.get_autocast_dtype` to get the related dtype for each backend. Meanwhile, `torch.get_autocast_dtype` is necessary to be supported in JIT path for BC. # Additional Context With this PR, `torch.amp.autocast(device_type='cuda')` is equivalent to `torch.cuda.amp.autocast`. Add two new UTs to cover this change in eager and jit path respectively. X-link: pytorch/pytorch#125103 Approved by: https://github.com/albanD, https://github.com/jgong5, https://github.com/gujinghui Reviewed By: izaitsevfb Differential Revision: D57138276 fbshipit-source-id: 17f883924e43f68dd6836d99b06fe8a47cfccbf6

pytorch-bot bot added ciflow/inductor module: amp (automated mixed precision) autocast module: dynamo oncall: pt2 labels Apr 27, 2024

guangyey added a commit that referenced this pull request Apr 27, 2024

make torch.amp.autocast more generic

1fa1afc

ghstack-source-id: 5dbe659705ae42831596b9eb4126d126b261f01d Pull Request resolved: #125103

guangyey changed the title ~~make torch.amp.autocast more generic~~ [WIP] make torch.amp.autocast more generic Apr 27, 2024

guangyey marked this pull request as draft April 27, 2024 15:42

pytorchbot added the open source label Apr 27, 2024

guangyey added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 27, 2024

guangyey added a commit that referenced this pull request Apr 27, 2024

make torch.amp.autocast more generic

345c080

ghstack-source-id: 1cbc46833c8bcb946e0eff23f4311e766631d9c2 Pull Request resolved: #125103

guangyey added 2 commits April 27, 2024 23:26

make torch.amp.autocast more generic

054a451

[ghstack-poisoned]

Update on "[WIP] make torch.amp.autocast more generic"

2b6089e

cc mcarilli ptrblck leslie-fang-intel jgong5 ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

guangyey added a commit that referenced this pull request Apr 28, 2024

make torch.amp.autocast more generic

5289910

ghstack-source-id: 0066d18809b6652072307eb5727fa10c9b14d564 Pull Request resolved: #125103

pytorch-bot bot added the release notes: jit release notes category label Apr 28, 2024

guangyey added a commit that referenced this pull request Apr 28, 2024

make torch.amp.autocast more generic

fc055f1

ghstack-source-id: 4e580070ec0ec666d1a2929ac588f91c6ac7af70 Pull Request resolved: #125103

guangyey added 2 commits April 29, 2024 00:55

Update on "[WIP] make torch.amp.autocast more generic"

c6e6fac

cc mcarilli ptrblck leslie-fang-intel jgong5 ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

Update on "[WIP] make torch.amp.autocast more generic"

58cabab

cc mcarilli ptrblck leslie-fang-intel jgong5 ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

guangyey added a commit that referenced this pull request May 6, 2024

make torch.amp.autocast more generic

12bb158

ghstack-source-id: cd7a600e63177d5d4a7590a8c62c51cff7e10243 Pull Request resolved: #125103

guangyey added the topic: improvements topic category label May 6, 2024

guangyey added a commit that referenced this pull request May 6, 2024

make torch.amp.autocast more generic

a4f128e

ghstack-source-id: 31ac5e57dfcbf817bdbe1e45af5c21cb08d43050 Pull Request resolved: #125103

guangyey changed the title ~~[WIP] make torch.amp.autocast more generic~~ make torch.amp.autocast more generic May 6, 2024

guangyey marked this pull request as ready for review May 6, 2024 05:52

guangyey requested review from gujinghui, jgong5, EikanWang, albanD, malfet and atalman May 6, 2024 05:53

guangyey commented May 6, 2024

View reviewed changes

guangyey added 2 commits May 6, 2024 09:49

Update on "[WIP] make torch.amp.autocast more generic"

a49c133

cc mcarilli ptrblck leslie-fang-intel jgong5 ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

Update on "[WIP] make torch.amp.autocast more generic"

7e129ca

cc mcarilli ptrblck leslie-fang-intel jgong5 ezyang msaroufim bdhirsh anijain2305 chauhang voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

jgong5 reviewed May 7, 2024

View reviewed changes

guangyey added a commit that referenced this pull request May 7, 2024

make torch.amp.autocast more generic

eb8c451

ghstack-source-id: 62a264b2a9033119fee7ebd49a93b6252ba2c89f Pull Request resolved: #125103

guangyey added a commit that referenced this pull request May 7, 2024

make torch.amp.autocast more generic

c7c5a07

ghstack-source-id: 62b35fb066c0ee740b0220e56bc153b861ff0c6e Pull Request resolved: #125103

guangyey requested a review from jgong5 May 7, 2024 07:24

guangyey added a commit that referenced this pull request May 7, 2024

make torch.amp.autocast more generic

a3e3f24

ghstack-source-id: f4516ff524822c7994a009c6973ae7d118642b02 Pull Request resolved: #125103

guangyey added 2 commits May 7, 2024 14:59

albanD reviewed May 7, 2024

View reviewed changes

guangyey added a commit that referenced this pull request May 7, 2024

make torch.amp.autocast more generic

2ea14d6

ghstack-source-id: 9f6516b793ad060ba57b52b1ba3bfcbe3077cd60 Pull Request resolved: #125103

albanD approved these changes May 7, 2024

View reviewed changes

jgong5 approved these changes May 8, 2024

View reviewed changes

guangyey added a commit that referenced this pull request May 8, 2024

make torch.amp.autocast more generic

794ec9e

ghstack-source-id: ccfcbf2a1d43c618076ca9d883fab4d462dd2632 Pull Request resolved: #125103

gujinghui approved these changes May 8, 2024

View reviewed changes

guangyey added a commit that referenced this pull request May 8, 2024

make torch.amp.autocast more generic

943a24a

ghstack-source-id: c05a9e0166e9fdfc6fc3284f876f96ba236e3939 Pull Request resolved: #125103

pytorchmergebot added the merging label May 8, 2024

guangyey added 2 commits May 8, 2024 10:19

pytorchmergebot added the Merged label May 8, 2024

pytorchmergebot closed this in d17be10 May 8, 2024

pytorchmergebot removed the merging label May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make torch.amp.autocast more generic #125103

make torch.amp.autocast more generic #125103

guangyey commented Apr 27, 2024 •

edited

pytorch-bot bot commented Apr 27, 2024 •

edited

guangyey May 6, 2024

jgong5 May 7, 2024

guangyey May 7, 2024

albanD May 7, 2024

guangyey May 7, 2024

guangyey commented May 7, 2024

albanD May 7, 2024

albanD May 7, 2024

guangyey May 7, 2024

albanD May 7, 2024

albanD left a comment

albanD May 7, 2024

guangyey May 8, 2024

guangyey commented May 8, 2024

pytorchmergebot commented May 8, 2024

		if dtype is None:
		dtype = torch.get_autocast_dtype(device_type)

	(``torch.float16`` for CUDA and ``torch.bfloat16`` for CPU, by default), given by
	(``torch.float16`` for CUDA and ``torch.bfloat16`` for CPU), given by

make torch.amp.autocast more generic #125103

make torch.amp.autocast more generic #125103

Conversation

guangyey commented Apr 27, 2024 • edited

Motivation

Solution

Additional Context

pytorch-bot bot commented Apr 27, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125103

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guangyey commented May 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albanD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guangyey commented May 8, 2024

pytorchmergebot commented May 8, 2024

Merge started

guangyey commented Apr 27, 2024 •

edited

pytorch-bot bot commented Apr 27, 2024 •

edited