Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dtype kwarg in _foreach_norm #125665

Closed
wants to merge 8 commits into from

Conversation

crcrpar
Copy link
Collaborator

@crcrpar crcrpar commented May 7, 2024

Fixes #125040

cc @awgu

Copy link

pytorch-bot bot commented May 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125665

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 1e300d6 with merge base 5fb11cd (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: foreach_frontend release notes category label May 7, 2024
@crcrpar

This comment was marked as outdated.

@crcrpar crcrpar force-pushed the crpa/foreach-norm-with-dtype-kwarg branch from 9d0bb68 to ae98aff Compare May 13, 2024 13:30
@crcrpar crcrpar marked this pull request as ready for review May 13, 2024 13:31
@albanD albanD requested review from janeyx99 and removed request for albanD May 13, 2024 22:21
Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look at the changes, they seem reasonable but the meta test failures are real.

Typically I think we've skipped when the original OpInfos are broken, but I have not dug into the problem here so it may be a different concern.

@ezyang ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 14, 2024
@crcrpar
Copy link
Collaborator Author

crcrpar commented May 15, 2024

Took a look at the changes, they seem reasonable but the meta test failures are real.

Typically I think we've skipped when the original OpInfos are broken, but I have not dug into the problem here so it may be a different concern.

yes, they are real failures but not that opinfos are broken. The cause is that dtype parameterization which is kind of hidden in test_foreach's parity check

@janeyx99
Copy link
Contributor

yes, they are real failures but not that opinfos are broken. The cause is that dtype parameterization which is kind of hidden in test_foreach's parity check

Could you elaborate what you mean by this? Or will you be pushing a fix for this?

@crcrpar
Copy link
Collaborator Author

crcrpar commented May 16, 2024

yes, they are real failures but not that opinfos are broken. The cause is that dtype parameterization which is kind of hidden in test_foreach's parity check

Could you elaborate what you mean by this? Or will you be pushing a fix for this?

pushed a fix🙂

for num_tensors, ord, out_dtype in product(
num_input_tensors,
(0, 1, 2, -1, -2, float('inf'), float('-inf')),
[None] + get_all_dtypes(),
(None,) + (torch.complex128,) if dtype in complex_types() else (torch.float64,),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what coverage does this remove?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

integers, complex64, float32, float16, and bfloat16.
integers wouldn't be supported anyway, i.e., test_foreach's test_parity checks the error messages are identical.
some lower prec's could cause no direct cast or something

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather still test more + just have expected skips vs silently testing less

Copy link
Collaborator Author

@crcrpar crcrpar May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it not really straightforward to give skips/xfail to some of the sample inputs generated which I'm speculating test_meta needs (thought the failures would be that broken opinfo ilk).
Would you happen to have some reference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@crcrpar crcrpar May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure my previous response is misleading. DecorateInfo to cover all the dtypes here would void the valuee of tests. The dtypes with if-else here doesn’t eliminate any of the existing test cases as it’s aimed to test the new kwarg. Inappropriate dtype to the kwarg just leads to an error, and unfortunately for meta test, all dtypes (meaning parameterization of inputs) end up being decorated as xfail if the dtype kwarg parameterization covers all the possible ones

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this parametrizes out_dtype, which is new.

@crcrpar crcrpar changed the title Support dypte kwarg in _foreach_norm Support dtype kwarg in _foreach_norm May 16, 2024
@crcrpar
Copy link
Collaborator Author

crcrpar commented May 20, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
@pytorchmergebot
Copy link
Collaborator

Successfully rebased crpa/foreach-norm-with-dtype-kwarg onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout crpa/foreach-norm-with-dtype-kwarg && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the crpa/foreach-norm-with-dtype-kwarg branch from 546421e to 1e300d6 Compare May 20, 2024 06:03
for num_tensors, ord, out_dtype in product(
num_input_tensors,
(0, 1, 2, -1, -2, float('inf'), float('-inf')),
[None] + get_all_dtypes(),
(None,) + (torch.complex128,) if dtype in complex_types() else (torch.float64,),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this parametrizes out_dtype, which is new.

@janeyx99
Copy link
Contributor

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 22, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm6.1-py3.8 / test (distributed, 1, 1, linux.rocm.gpu)

Details for Dev Infra team Raised by workflow job

@janeyx99
Copy link
Contributor

@pytorchbot merge -i

ROCm dist failures seem not related.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-focal-rocm6.1-py3.8 / test (distributed, 1, 1, linux.rocm.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

r-barnes pushed a commit to r-barnes/pytorch that referenced this pull request May 22, 2024
@crcrpar
Copy link
Collaborator Author

crcrpar commented May 23, 2024

is this still being checked...? The checks look okay

@janeyx99
Copy link
Contributor

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-focal-rocm6.1-py3.8 / test (distributed, 1, 1, linux.rocm.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch commit --author="Masaki Kozuki <mkozuki@nvidia.com>" -m Support dtypekwarg in_foreach_norm` (#125665)

Fixes #125040

Pull Request resolved: #125665
Approved by: https://github.com/janeyx99
` returned non-zero exit code 1

On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean
Details for Dev Infra team Raised by workflow job

@janeyx99
Copy link
Contributor

Ah this was merged but just not closed on Wednesday around 4pm. Closing as this commit has been merged.

@janeyx99 janeyx99 closed this May 24, 2024
@crcrpar crcrpar deleted the crpa/foreach-norm-with-dtype-kwarg branch May 24, 2024 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: foreach_frontend release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Support dtype arg in torch._foreach_norm
5 participants