Exclude test data from the Python package #5970

cbourjau · 2024-02-27T22:21:53Z

Description

onnx/backend/test/data contains large test files which should not be included in the PyPI package since they are irrelvant to the end user. This PR simply excludes them when building the package. The files remain available for running the test cases. This reduces the size of the built onnx package from 51MB to 12MB (uncompressed).

Motivation and Context

Shipping a package that is more than 4x larger than necessary is not good. More discussion and links to previous discussions around this issue can be found at #5925.

Closes #5925

codecov · 2024-02-27T22:30:44Z

Codecov Report

Attention: Patch coverage is 76.92308% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 57.04%. Comparing base (83194ed) to head (d33f29b).
Report is 17 commits behind head on main.

Files	Patch %	Lines
onnx/backend/test/stat_coverage.py	0.00%	2 Missing ⚠️
onnx/backend/test/case/node/__init__.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5970      +/-   ##
==========================================
+ Coverage   56.95%   57.04%   +0.08%     
==========================================
  Files         506      503       -3     
  Lines       30467    30673     +206     
  Branches     4592     4529      -63     
==========================================
+ Hits        17353    17496     +143     
- Misses      12285    12365      +80     
+ Partials      829      812      -17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

andife · 2024-02-28T18:28:22Z

@cjvolzka I think that would be good for 1.16? At least to establish that nobody really uses it in this way? But shouldn't that actually be the case?

cjvolzka · 2024-02-28T19:21:48Z

@andife My initial thought is that this missed the boat for the 1.16 release. I'm working on uploading 1.16.0rc1 ~~later today~~ tomorrow morning and this hasn't even merged into main yet.

I don't think anyone would be using these test models but I can't say for sure. Since there's uncertainty on that, if we removed them, I'd want it in rc1. Otherwise I fear someone may validate our rc1, and if that goes well, they may not try rc2, and then miss that this somehow breaks them.

So getting into 1.16 seems like too much of a rush for an issue that's already existed for a while. Yes we should definitely fix it, but I lean towards not rushing it just to get it in now.

I could be swayed if some of the other primary reviewers disagree and want it in, but that's my hot take.

gramalingam · 2024-02-29T00:59:29Z

I agree with @cjvolzka ... furthermore, this is still a draft. I agree it is a good idea to do this, but not rush this (just like Charles says).

cbourjau · 2024-03-04T22:10:51Z

I agree that given how long the data was packaged in the PyPI distribution one more release is not going to make a big difference.

postrational · 2024-03-28T13:28:17Z

Reducing the size of the ONNX package does sound like a good idea. However, there are scenarios which do utilize the package in order to run testing. In particular, the ONNX Scoreboard relies on installing ONNX packages as they are released to run tests on submitted backends.

In addition, other users may have based their CI tests on this approach of downloading the ONNX package as a test platform. We need to provide a well defined upgrade path for these users, as this is a breaking change for them.

Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>

cbourjau · 2024-04-14T20:18:30Z

I cleaned up the last CI issues and marked this as "ready for review". As @postrational noted, this is indeed a breaking change, but I hardly see a way around it. I see three possible ways forward for downstream projects:

Migrate to a git-submodule setup
Generate the test files on the fly (much more work)
Vendor the files via a separate package such as onnx-test-data

justinchuby

Thanks! LGTM; would like to let others chime in. There may be some documentation we need to update as well.

onnx/backend/test/loader/__init__.py

onnx/backend/test/runner/__init__.py

tests/backend/test_backend_reference.py

tests/backend/test_backend_test.py

tests/backend/test_backend_reference.py

tests/backend/test_backend_test.py

tests/backend/test_backend_reference.py

tests/backend/test_backend_test.py

tests/backend/test_backend_reference.py

tests/backend/test_backend_test.py

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Christian Bourjau <cbourjau@users.noreply.github.com>

onnx/backend/test/runner/__init__.py

onnx/backend/test/stat_coverage.py

tests/backend/test_backend_onnxruntime.py

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Christian Bourjau <cbourjau@users.noreply.github.com>

onnx/backend/test/runner/__init__.py

Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>

justinchuby · 2024-04-16T14:00:36Z

Reopening to trigger test

xadupre · 2024-04-23T16:54:28Z

pyproject.toml

    "py.typed",
 ]

+[tool.setuptools.exclude-package-data]


I think the change goes into the right direction. My only concern is runtimes such as onnxruntime expects to find this data. I'd me more confortable to display an error message saying what instruction is needed to get the data and have the backend tests passing.

onnxruntime already fetches the entire onnx source during the build process anyway. Are they really using the PyPI package for testing, too?

Yes, it is used. onnx should be shipped without data. That's not a debate. Since this data can be automatically generated, we could also generate this data when the package is being installed. Something like pip install onnx <options>... So developers using the data could just install onnx with this option and nothing else would change.

gramalingam · 2024-04-24T22:31:12Z

Combining the suggestions made by @cbourjau and @postrational : providing a separate testdata package, while removing it from the main package, makes sense to me.

It comes down to a question of who is willing to do that (create the new testdata package): it would be great if we have a volunteer to do this!

There's an interesting versioning complication to this issue: due to the way tests are defined and generated, the test-cases generated in different releases of onnx (from the same test-case source) may differ (using different opset-versions of the op). Eg., onnx release N and N+1 may have versions of the same test-case, but using different opset-versions (if the particular op's version was bumped in release N+1). Ideally, we should provide a better API/interface to allow users to get the versions of these test-cases they want from a single package. See here

cbourjau · 2024-05-02T08:57:06Z

I'm not sure it needs to be PyPI package at the end of the day. The Array API standard has a phenomenal testing project that (a) does not ship with binaries and (b) is not a PyPI package. Having something like that for ONNX would be ideal (but a lot of work), IMHO.

mgehre-amd · 2024-05-07T07:15:01Z

We also use the test models from the onnx package to test our ONNX compilers against. I'd ask you to please continue shipping this either inside the onnx package or in another python wheel.

justinchuby · 2024-05-07T13:44:06Z

We also use the test models from the onnx package to test our ONNX compilers against. I'd ask you to please continue shipping this either inside the onnx package or in another python wheel.

Would it be a solution if you cloned the onnx repository to get the test data?

mgehre-amd · 2024-05-10T05:29:54Z

We also use the test models from the onnx package to test our ONNX compilers against. I'd ask you to please continue shipping this either inside the onnx package or in another python wheel.

Would it be a solution if you cloned the onnx repository to get the test data?

If I could run onnx.backend.test without any additional build step (just changing PYTHONPATH), that would be fine for us.

justinchuby · 2024-05-13T17:12:54Z

We also use the test models from the onnx package to test our ONNX compilers against. I'd ask you to please continue shipping this either inside the onnx package or in another python wheel.

Would it be a solution if you cloned the onnx repository to get the test data?

If I could run onnx.backend.test without any additional build step (just changing PYTHONPATH), that would be fine for us.

Could you share an example usage just to make sure we understand it clearly? (A script etc.)

mgehre-amd · 2024-05-14T07:22:12Z

We use the (I hope) common way of running the backend tests via

import sys
import os
import unittest
import onnx.backend.test

import OurBackend

# A pytest magic variable to load pytest_report.py from this directory.
pytest_plugins = ("pytest_report",)

backend_test = onnx.backend.test.runner.Runner(OurBackend, __name__)
globals().update(backend_test.enable_report().test_cases)

if __name__ == "__main__":
    unittest.main()

cbourjau requested a review from a team as a code owner February 27, 2024 22:21

cbourjau mentioned this pull request Feb 27, 2024

Remove test data from PyPI package #5925

Open

cbourjau marked this pull request as draft February 27, 2024 22:30

cbourjau force-pushed the exclude-test-data branch from ea15494 to 7d0c3ad Compare February 27, 2024 22:51

justinchuby added this to the 1.17 milestone Mar 5, 2024

Remove test data from PyPI package

477fcf1

Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>

cbourjau force-pushed the exclude-test-data branch from 674376a to 477fcf1 Compare April 14, 2024 18:37

cbourjau marked this pull request as ready for review April 14, 2024 20:07

cbourjau requested a review from a team as a code owner April 14, 2024 20:07

justinchuby reviewed Apr 14, 2024

View reviewed changes

justinchuby added the release notes Important changes to call out in release notes label Apr 15, 2024

github-advanced-security bot found potential problems Apr 15, 2024

View reviewed changes

tests/backend/test_backend_reference.py Fixed Show fixed Hide fixed

tests/backend/test_backend_test.py Fixed Show fixed Hide fixed

tests/backend/test_backend_reference.py Fixed Show fixed Hide fixed

tests/backend/test_backend_test.py Fixed Show fixed Hide fixed

cbourjau force-pushed the exclude-test-data branch 2 times, most recently from 4a82e37 to 6976c1a Compare April 15, 2024 07:41

Apply suggestions from code review

b6233e9

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Christian Bourjau <cbourjau@users.noreply.github.com>

cbourjau force-pushed the exclude-test-data branch from 6976c1a to b6233e9 Compare April 15, 2024 09:45

justinchuby reviewed Apr 15, 2024

View reviewed changes

onnx/backend/test/runner/__init__.py Outdated Show resolved Hide resolved

justinchuby reviewed Apr 15, 2024

View reviewed changes

onnx/backend/test/stat_coverage.py Outdated Show resolved Hide resolved

justinchuby reviewed Apr 15, 2024

View reviewed changes

onnx/backend/test/stat_coverage.py Outdated Show resolved Hide resolved

justinchuby reviewed Apr 15, 2024

View reviewed changes

tests/backend/test_backend_onnxruntime.py Outdated Show resolved Hide resolved

cbourjau commented Apr 15, 2024

View reviewed changes

tests/backend/test_backend_onnxruntime.py Outdated Show resolved Hide resolved

cbourjau force-pushed the exclude-test-data branch from 6a81b80 to fe257c4 Compare April 15, 2024 14:53

Apply suggestions from code review

4b5db83

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Christian Bourjau <cbourjau@users.noreply.github.com>

cbourjau force-pushed the exclude-test-data branch from fe257c4 to 4b5db83 Compare April 15, 2024 14:56

justinchuby reviewed Apr 15, 2024

View reviewed changes

onnx/backend/test/runner/__init__.py Outdated Show resolved Hide resolved

justinchuby approved these changes Apr 15, 2024

View reviewed changes

cbourjau and others added 2 commits April 15, 2024 17:57

Apply lintrunner

31b3e58

Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>

Merge branch 'main' into exclude-test-data

1afcc1c

justinchuby added the review needed: operators approvers Require reviews from members of operators-approvers label Apr 16, 2024

cbourjau added 3 commits April 16, 2024 09:20

Code style guide

19eac83

Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>

Merge remote-tracking branch 'upstream/main' into exclude-test-data

fd30e64

Clean-ups

d33f29b

Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>

justinchuby changed the title ~~Exclude test data~~ Exclude test data from the Python package Apr 16, 2024

justinchuby added the run release CIs Use this label to trigger release tests in CI label Apr 16, 2024

justinchuby closed this Apr 16, 2024

justinchuby reopened this Apr 16, 2024

justinchuby assigned gramalingam Apr 23, 2024

xadupre reviewed Apr 23, 2024

View reviewed changes

andife mentioned this pull request May 6, 2024

Installation error on Windows #5773

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exclude test data from the Python package #5970

Exclude test data from the Python package #5970

cbourjau commented Feb 27, 2024

codecov bot commented Feb 27, 2024 •

edited

andife commented Feb 28, 2024

cjvolzka commented Feb 28, 2024 •

edited

gramalingam commented Feb 29, 2024

cbourjau commented Mar 4, 2024

postrational commented Mar 28, 2024 •

edited

cbourjau commented Apr 14, 2024

justinchuby left a comment

justinchuby commented Apr 16, 2024

xadupre Apr 23, 2024

cbourjau May 2, 2024

xadupre May 2, 2024

gramalingam commented Apr 24, 2024

cbourjau commented May 2, 2024

mgehre-amd commented May 7, 2024

justinchuby commented May 7, 2024

mgehre-amd commented May 10, 2024

justinchuby commented May 13, 2024

mgehre-amd commented May 14, 2024

Exclude test data from the Python package #5970

Are you sure you want to change the base?

Exclude test data from the Python package #5970

Conversation

cbourjau commented Feb 27, 2024

Description

Motivation and Context

codecov bot commented Feb 27, 2024 • edited

Codecov Report

andife commented Feb 28, 2024

cjvolzka commented Feb 28, 2024 • edited

gramalingam commented Feb 29, 2024

cbourjau commented Mar 4, 2024

postrational commented Mar 28, 2024 • edited

cbourjau commented Apr 14, 2024

justinchuby left a comment

Choose a reason for hiding this comment

justinchuby commented Apr 16, 2024

xadupre Apr 23, 2024

Choose a reason for hiding this comment

cbourjau May 2, 2024

Choose a reason for hiding this comment

xadupre May 2, 2024

Choose a reason for hiding this comment

gramalingam commented Apr 24, 2024

cbourjau commented May 2, 2024

mgehre-amd commented May 7, 2024

justinchuby commented May 7, 2024

mgehre-amd commented May 10, 2024

justinchuby commented May 13, 2024

mgehre-amd commented May 14, 2024

codecov bot commented Feb 27, 2024 •

edited

cjvolzka commented Feb 28, 2024 •

edited

postrational commented Mar 28, 2024 •

edited