Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ONNX operator_export_type on the new registry #87735

Conversation

thiagocrepaldi
Copy link
Collaborator

@thiagocrepaldi thiagocrepaldi commented Oct 25, 2022

Fixes #87313

Pytorch's ONNX pipelines do not run with BUILD_CAFFE2=0, so tests for operator_export_type==ONNX_ATEN or ONNX_ATEN_FALLBACK are not fully tested, which caused this regression to the core feature of ONNX exporter: ONNX ATen fallback.

This PR fixes this regression, but we discuss with Meta on how to use only BUILD_CAFFE2=0 builds instead of ``BUILD_CAFFE2=1`

@pytorch-bot pytorch-bot bot added the release notes: onnx torch.onnx related changes that should show up in the release notes label Oct 25, 2022
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 25, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87735

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 14ec6aa:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/onnx/utils.py Outdated Show resolved Hide resolved
torch/onnx/utils.py Outdated Show resolved Hide resolved
torch/onnx/utils.py Show resolved Hide resolved
@thiagocrepaldi thiagocrepaldi force-pushed the thiagofc/fix-onnx-registry-fallback branch from 41e5234 to 06dc424 Compare October 26, 2022 01:31
@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 27, 2022
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 28, 2022
# Direct ATen export requested
# For BUILD_CAFFE2=0 builds, if domain=="aten" and operator_export_type==ONNX_ATEN,
# an aten::ATen operator is created regardless of symbolics existence
# For BUILD_CAFFE2=1, the same applies only if there is no symbolic available
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the CI failure on test_py_diagnose_emits_error, where it expects a operator_supported_in_newer_opset_version error for aten::diagonal when exporting in opset 9. After this change, the error was not thrown, and aten::diagonal is exported as aten::ATen(...). The operator_export_type of export is the default _C_onnx.OperatorExportTypes.ONNX.

Is this behavior expected, for BUILD_CAFFE2=1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, there was a failure in the logic. Thanks for catching this

@BowenBao
Copy link
Collaborator

Some test failures

2022-10-26T02:05:53.1811889Z =========================== short test summary info ============================
2022-10-26T02:05:53.1812306Z FAILED test/onnx/test_pytorch_onnx_caffe2_quantized.py::TestQuantizedOps::test_cat
2022-10-26T02:05:53.1815203Z FAILED test/onnx/test_pytorch_onnx_no_runtime.py::TestONNXExport::test_export_dict
2022-10-26T02:05:53.1816998Z FAILED test/onnx/test_pytorch_onnx_caffe2_quantized.py::TestQuantizedOps::test_qconv_model
2022-10-26T02:05:53.1818106Z FAILED test/onnx/test_pytorch_onnx_caffe2_quantized.py::TestQuantizedOps::test_qlinear_model
2022-10-26T02:05:53.1818678Z FAILED test/onnx/test_pytorch_onnx_caffe2_quantized.py::TestQuantizedOps::test_quantized_add
2022-10-26T02:05:53.1821321Z FAILED test/onnx/internal/test_diagnostics.py::TestOnnxDiagnostics::test_py_diagnose_emits_error
2022-10-26T02:05:53.1822149Z FAILED test/onnx/test_pytorch_onnx_caffe2_quantized.py::TestQuantizedOps::test_sequential
2022-10-26T02:05:53.1832674Z FAILED test/onnx/test_pytorch_onnx_caffe2_quantized.py::TestQuantizedOps::test_small_model

@thiagocrepaldi thiagocrepaldi force-pushed the thiagofc/fix-onnx-registry-fallback branch 4 times, most recently from 4379d65 to ea89404 Compare October 31, 2022 22:42
@thiagocrepaldi
Copy link
Collaborator Author

@BowenBao all tests are passing now

@thiagocrepaldi thiagocrepaldi force-pushed the thiagofc/fix-onnx-registry-fallback branch from ea89404 to ae7b749 Compare November 1, 2022 16:54
Copy link
Collaborator

@BowenBao BowenBao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚢

test/onnx/test_pytorch_onnx_no_runtime.py Outdated Show resolved Hide resolved
Signed-off-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
@thiagocrepaldi thiagocrepaldi force-pushed the thiagofc/fix-onnx-registry-fallback branch from ae7b749 to 14ec6aa Compare November 1, 2022 17:39
@thiagocrepaldi
Copy link
Collaborator Author

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: The following mandatory check(s) failed (Rule ONNX exporter):

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

@thiagocrepaldi
Copy link
Collaborator Author

@pytorchbot merge -f "macos pipeline failed due to aborted artifact upload"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

thiagocrepaldi added a commit to thiagocrepaldi/pytorch that referenced this pull request Nov 4, 2022
Follow-up for pytorch#87735

Once again, because BUILD_CAFFE2=0 is not tested for ONNX exporter, one
scenario slipped through. A use case where the model can be exported
without aten fallback when operator_export_type=ONNX_ATEN_FALLBACK and
BUILD_CAFFE2=0

A new unit test has been added, but it won't prevent regressions if
BUILD_CAFFE2=0 is not executed on CI again
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Nov 5, 2022
Fixes pytorch#87313

Our ONNX pipelines do not run with BUILD_CAFFE2=0, so tests for operator_export_type ONNX_ATEN and ONNX_ATEN_FALLBACK will not be fully tested, allowing regressions to happen again.

We need to run the same set of tests for both BUILD_CAFFE2=0 and 1
Pull Request resolved: pytorch#87735
Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao
thiagocrepaldi added a commit to thiagocrepaldi/pytorch that referenced this pull request Nov 9, 2022
Follow-up for pytorch#87735

Once again, because BUILD_CAFFE2=0 is not tested for ONNX exporter, one
scenario slipped through. A use case where the model can be exported
without aten fallback when operator_export_type=ONNX_ATEN_FALLBACK and
BUILD_CAFFE2=0

A new unit test has been added, but it won't prevent regressions if
BUILD_CAFFE2=0 is not executed on CI again
pytorchmergebot pushed a commit that referenced this pull request Nov 11, 2022
Follow-up for #87735

Once again, because BUILD_CAFFE2=0 is not tested for ONNX exporter, one scenario slipped through. A use case where the model can be exported without aten fallback when operator_export_type=ONNX_ATEN_FALLBACK and BUILD_CAFFE2=0

A new unit test has been added, but it won't prevent regressions if BUILD_CAFFE2=0 is not executed on CI again

Fixes #87313

Pull Request resolved: #88504
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
@thiagocrepaldi thiagocrepaldi added this to the 1.13.1 milestone Nov 16, 2022
@thiagocrepaldi thiagocrepaldi added module: regression It used to work, and now it doesn't triage review labels Nov 16, 2022
@thiagocrepaldi
Copy link
Collaborator Author

#84382 was the base PR that replaced the "old" implementation for onnx symbolic registry, leading to a regression on 1.13 that was fixed by this PR

izaitsevfb pushed a commit to izaitsevfb/pytorch that referenced this pull request Dec 1, 2022
CONFLICT (content): Merge conflict in test/onnx/test_pytorch_onnx_no_runtime.py
Adding test/functorch
error: could not apply 2aed670... Fix ONNX operator_export_type on the new registry (pytorch#87735)
izaitsevfb pushed a commit to izaitsevfb/pytorch that referenced this pull request Dec 2, 2022
CONFLICT (content): Merge conflict in test/onnx/test_pytorch_onnx_no_runtime.py
Adding test/functorch
error: could not apply 2aed670... Fix ONNX operator_export_type on the new registry (pytorch#87735)
izaitsevfb pushed a commit to izaitsevfb/pytorch that referenced this pull request Dec 2, 2022
Fixes pytorch#87313

Our ONNX pipelines do not run with BUILD_CAFFE2=0, so tests for operator_export_type ONNX_ATEN and ONNX_ATEN_FALLBACK will not be fully tested, allowing regressions to happen again.

We need to run the same set of tests for both BUILD_CAFFE2=0 and 1
Pull Request resolved: pytorch#87735
Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao

(cherry picked from commit 2aed670)
izaitsevfb added a commit that referenced this pull request Dec 3, 2022
Fixes #87313

Our ONNX pipelines do not run with BUILD_CAFFE2=0, so tests for operator_export_type ONNX_ATEN and ONNX_ATEN_FALLBACK will not be fully tested, allowing regressions to happen again.

We need to run the same set of tests for both BUILD_CAFFE2=0 and 1
Pull Request resolved: #87735
Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao

(cherry picked from commit 2aed670)

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
izaitsevfb pushed a commit to izaitsevfb/pytorch that referenced this pull request Dec 3, 2022
Follow-up for pytorch#87735

Once again, because BUILD_CAFFE2=0 is not tested for ONNX exporter, one scenario slipped through. A use case where the model can be exported without aten fallback when operator_export_type=ONNX_ATEN_FALLBACK and BUILD_CAFFE2=0

A new unit test has been added, but it won't prevent regressions if BUILD_CAFFE2=0 is not executed on CI again

Fixes pytorch#87313

Pull Request resolved: pytorch#88504
Approved by: https://github.com/justinchuby, https://github.com/BowenBao

(cherry picked from commit 5f0783b)
izaitsevfb pushed a commit to izaitsevfb/pytorch that referenced this pull request Dec 3, 2022
Follow-up for pytorch#87735

Once again, because BUILD_CAFFE2=0 is not tested for ONNX exporter, one scenario slipped through. A use case where the model can be exported without aten fallback when operator_export_type=ONNX_ATEN_FALLBACK and BUILD_CAFFE2=0

A new unit test has been added, but it won't prevent regressions if BUILD_CAFFE2=0 is not executed on CI again

Fixes pytorch#87313

Pull Request resolved: pytorch#88504
Approved by: https://github.com/justinchuby, https://github.com/BowenBao

(cherry picked from commit 5f0783b)
izaitsevfb pushed a commit to izaitsevfb/pytorch that referenced this pull request Dec 6, 2022
Follow-up for pytorch#87735

Once again, because BUILD_CAFFE2=0 is not tested for ONNX exporter, one scenario slipped through. A use case where the model can be exported without aten fallback when operator_export_type=ONNX_ATEN_FALLBACK and BUILD_CAFFE2=0

A new unit test has been added, but it won't prevent regressions if BUILD_CAFFE2=0 is not executed on CI again

Fixes pytorch#87313

Pull Request resolved: pytorch#88504
Approved by: https://github.com/justinchuby, https://github.com/BowenBao

(cherry picked from commit 5f0783b)
atalman added a commit that referenced this pull request Dec 7, 2022
* Fix ATen Fallback for BUILD_CAFFE2=0 for ONNX-only ops (#88504)

Follow-up for #87735

Once again, because BUILD_CAFFE2=0 is not tested for ONNX exporter, one scenario slipped through. A use case where the model can be exported without aten fallback when operator_export_type=ONNX_ATEN_FALLBACK and BUILD_CAFFE2=0

A new unit test has been added, but it won't prevent regressions if BUILD_CAFFE2=0 is not executed on CI again

Fixes #87313

Pull Request resolved: #88504
Approved by: https://github.com/justinchuby, https://github.com/BowenBao

(cherry picked from commit 5f0783b)

* Update test/onnx/test_pytorch_onnx_no_runtime.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Update test/onnx/test_pytorch_onnx_no_runtime.py

Fix linter

* Update test/onnx/test_pytorch_onnx_no_runtime.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* fix lint warnings

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
Co-authored-by: Andrey Talman <atalman@fb.com>
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
Fixes pytorch#87313

Our ONNX pipelines do not run with BUILD_CAFFE2=0, so tests for operator_export_type ONNX_ATEN and ONNX_ATEN_FALLBACK will not be fully tested, allowing regressions to happen again.

We need to run the same set of tests for both BUILD_CAFFE2=0 and 1
Pull Request resolved: pytorch#87735
Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
Follow-up for pytorch#87735

Once again, because BUILD_CAFFE2=0 is not tested for ONNX exporter, one scenario slipped through. A use case where the model can be exported without aten fallback when operator_export_type=ONNX_ATEN_FALLBACK and BUILD_CAFFE2=0

A new unit test has been added, but it won't prevent regressions if BUILD_CAFFE2=0 is not executed on CI again

Fixes pytorch#87313

Pull Request resolved: pytorch#88504
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
@thiagocrepaldi thiagocrepaldi deleted the thiagofc/fix-onnx-registry-fallback branch May 4, 2023 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged module: onnx Related to torch.onnx module: regression It used to work, and now it doesn't open source release notes: onnx torch.onnx related changes that should show up in the release notes topic: bug fixes topic category triage review triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improper model conversion from PyTorch to ONNX with torch.onnx.OperatorExportTypes.ONNX_ATEN flag
7 participants