Handle shared memory cases in MathBitFallback #66667

malfet · 2021-10-14T23:59:56Z

Handle shared memory cases in MathBithFallback (Handle shared memory cases in MathBitFallback #63602)
fix lint (fix lint #66572)

Summary: Pull Request resolved: pytorch#63602 This PR fixes the case when a read and write is performed on a memory shared between mutable and (or) non-mutable arguments. Example: ``` a=torch.tensor([1+1j]) b=a.conj() b.add_(a) # should return tensor([2]) but returns tensor ([2-2j]) ``` The issue here is that in the conjugate fallback, we resolve the conjugation in-place for mutable arguments which can be a problem as shown above in the case when other input arguments share memory with the mutable argument(s). This PR fixes this issue by: 1. first scanning through the operator input arguments and creating a vector of mutable arguments that have the conj bit set to `True` (and accordingly setting the flag `check_for_alias_with_mut_arg ` to `True` or `False`). 2. Iterating through all the arguments. At this time we only look at the non-mutable arguments. If `check_for_alias_with_mut_arg` is set to `True`, then we iterate through `mutable_inputs` to check if the current arg tensor in question doesn't alias any of the entries in `mutable_inputs`. If yes, then we clone the non-mutable tensor arg, else we resolve the conjugation as before. 3. Now we look through the mutable_inputs vector (which contains only mutable input tensors with conj bit set to `True`). We in-place conjugate each of the entries in the vector. 4. Do the computation. 5. Re-conjugate the mutable argument tensors. NOTE: `TensorLists` are not fully handled in ConjugateFallback. Please see the in-line comment for more details. Fixes pytorch#59943 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D30466905 Pulled By: anjali411 fbshipit-source-id: 58058e5e6481da04a12d03f743c1491942a6cc9b

Summary: Pull Request resolved: pytorch#66572 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D31624043 Pulled By: suo fbshipit-source-id: 9db9cee3140d78c2a2f0c937be84755206fee1dd

pytorch-probot · 2021-10-14T23:59:58Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/malfet/pytorch/blob/6583aaff47efd9147b59802897d1b0f09867a8bc/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/xla`	✅ triggered
linux-bionic-py3.8-gcc9-coverage	`ciflow/all`, `ciflow/coverage`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/win`	✅ triggered
Skipped Workflows
libtorch-linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	🚫 skipped
linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
puretorch-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
win-vs2019-cuda10.2-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/win`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2021-10-15T00:00:01Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/66667
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
↩️ [fb-only] Re-run with SSH instructions
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 6583aaf (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

linux-xenial-py3.6-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-10-15T00:12:37.0550216Z RuntimeError:

2021-10-15T00:12:36.4323365Z Author: PyTorch Team
2021-10-15T00:12:36.4324291Z Author-email: packages@pytorch.org
2021-10-15T00:12:36.4326068Z License: BSD-3
2021-10-15T00:12:36.4326797Z Location: /opt/conda/lib/python3.6/site-packages
2021-10-15T00:12:36.4327515Z Requires: dataclasses, typing-extensions
2021-10-15T00:12:36.4328086Z Required-by: 
2021-10-15T00:12:36.4541102Z + python check_backward_compatibility.py --existing-schemas nightly_schemas.txt
2021-10-15T00:12:37.0548315Z Traceback (most recent call last):
2021-10-15T00:12:37.0549322Z   File "check_backward_compatibility.py", line 155, in <module>
2021-10-15T00:12:37.0549857Z     s = parse_schema(line.strip())
2021-10-15T00:12:37.0550216Z RuntimeError: 
2021-10-15T00:12:37.0550727Z Unknown custom class type cuda.Stream. Please ensure it is registered.:
2021-10-15T00:12:37.0552152Z cuda::default_stream.device(Device? device) -> (__torch__.torch.classes.cuda.Stream)
2021-10-15T00:12:37.0553041Z                                                                              ~~~~~~ <--- HERE
2021-10-15T00:12:37.0553277Z 
2021-10-15T00:12:37.1481893Z + cleanup
2021-10-15T00:12:37.1482546Z + retcode=1
2021-10-15T00:12:37.1482992Z + set +x
2021-10-15T00:12:37.1483368Z =================== sccache compilation log ===================
2021-10-15T00:12:37.1682652Z =========== If your build fails, please take a look at the log above for possible reasons ===========
2021-10-15T00:12:37.1702593Z Compile requests                      0

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

anjali411 and others added 2 commits October 14, 2021 16:59

fix lint (pytorch#66572)

6583aaf

Summary: Pull Request resolved: pytorch#66572 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D31624043 Pulled By: suo fbshipit-source-id: 9db9cee3140d78c2a2f0c937be84755206fee1dd

pytorch-probot bot added the ciflow/default label Oct 14, 2021

malfet mentioned this pull request Oct 15, 2021

[v.1.10.0] Release Tracker #65438

Closed

facebook-github-bot added the cla signed label Oct 15, 2021

malfet merged commit b544cbd into pytorch:release/1.10 Oct 15, 2021

malfet deleted the malfet/cp-63602 branch October 15, 2021 01:34

WBobby mentioned this pull request Jul 14, 2022

Add ROCm5.2/AMDGPU support for PyTorch 1.10 WBobby/pytorch#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle shared memory cases in MathBitFallback #66667

Handle shared memory cases in MathBitFallback #66667

malfet commented Oct 14, 2021 •

edited

pytorch-probot bot commented Oct 14, 2021

⚛️ CI Flow

facebook-github-bot commented Oct 15, 2021 •

edited

Handle shared memory cases in MathBitFallback #66667

Handle shared memory cases in MathBitFallback #66667

Conversation

malfet commented Oct 14, 2021 • edited

pytorch-probot bot commented Oct 14, 2021

⚛️ CI Flow

facebook-github-bot commented Oct 15, 2021 • edited

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

linux-xenial-py3.6-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (1/1)

malfet commented Oct 14, 2021 •

edited

facebook-github-bot commented Oct 15, 2021 •

edited