New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude test data from the Python package #5970
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5970 +/- ##
==========================================
+ Coverage 56.95% 57.04% +0.08%
==========================================
Files 506 503 -3
Lines 30467 30673 +206
Branches 4592 4529 -63
==========================================
+ Hits 17353 17496 +143
- Misses 12285 12365 +80
+ Partials 829 812 -17 ☔ View full report in Codecov by Sentry. |
ea15494
to
7d0c3ad
Compare
@cjvolzka I think that would be good for 1.16? At least to establish that nobody really uses it in this way? But shouldn't that actually be the case? |
@andife My initial thought is that this missed the boat for the 1.16 release. I'm working on uploading 1.16.0rc1 I don't think anyone would be using these test models but I can't say for sure. Since there's uncertainty on that, if we removed them, I'd want it in rc1. Otherwise I fear someone may validate our rc1, and if that goes well, they may not try rc2, and then miss that this somehow breaks them. So getting into 1.16 seems like too much of a rush for an issue that's already existed for a while. Yes we should definitely fix it, but I lean towards not rushing it just to get it in now. I could be swayed if some of the other primary reviewers disagree and want it in, but that's my hot take. |
I agree with @cjvolzka ... furthermore, this is still a draft. I agree it is a good idea to do this, but not rush this (just like Charles says). |
I agree that given how long the data was packaged in the PyPI distribution one more release is not going to make a big difference. |
Reducing the size of the ONNX package does sound like a good idea. However, there are scenarios which do utilize the package in order to run testing. In particular, the ONNX Scoreboard relies on installing ONNX packages as they are released to run tests on submitted backends. In addition, other users may have based their CI tests on this approach of downloading the ONNX package as a test platform. We need to provide a well defined upgrade path for these users, as this is a breaking change for them. |
Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>
674376a
to
477fcf1
Compare
I cleaned up the last CI issues and marked this as "ready for review". As @postrational noted, this is indeed a breaking change, but I hardly see a way around it. I see three possible ways forward for downstream projects:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM; would like to let others chime in. There may be some documentation we need to update as well.
4a82e37
to
6976c1a
Compare
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Christian Bourjau <cbourjau@users.noreply.github.com>
6976c1a
to
b6233e9
Compare
6a81b80
to
fe257c4
Compare
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Christian Bourjau <cbourjau@users.noreply.github.com>
fe257c4
to
4b5db83
Compare
Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>
Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>
Reopening to trigger test |
"py.typed", | ||
] | ||
|
||
[tool.setuptools.exclude-package-data] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change goes into the right direction. My only concern is runtimes such as onnxruntime expects to find this data. I'd me more confortable to display an error message saying what instruction is needed to get the data and have the backend tests passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onnxruntime already fetches the entire onnx source during the build process anyway. Are they really using the PyPI package for testing, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is used. onnx should be shipped without data. That's not a debate. Since this data can be automatically generated, we could also generate this data when the package is being installed. Something like pip install onnx <options>
... So developers using the data could just install onnx with this option and nothing else would change.
Combining the suggestions made by @cbourjau and @postrational : providing a separate testdata package, while removing it from the main package, makes sense to me. It comes down to a question of who is willing to do that (create the new testdata package): it would be great if we have a volunteer to do this! There's an interesting versioning complication to this issue: due to the way tests are defined and generated, the test-cases generated in different releases of onnx (from the same test-case source) may differ (using different opset-versions of the op). Eg., onnx release N and N+1 may have versions of the same test-case, but using different opset-versions (if the particular op's version was bumped in release N+1). Ideally, we should provide a better API/interface to allow users to get the versions of these test-cases they want from a single package. See here |
I'm not sure it needs to be PyPI package at the end of the day. The Array API standard has a phenomenal testing project that (a) does not ship with binaries and (b) is not a PyPI package. Having something like that for ONNX would be ideal (but a lot of work), IMHO. |
We also use the test models from the |
Would it be a solution if you cloned the onnx repository to get the test data? |
If I could run |
Could you share an example usage just to make sure we understand it clearly? (A script etc.) |
We use the (I hope) common way of running the backend tests via
|
Description
onnx/backend/test/data
contains large test files which should not be included in the PyPI package since they are irrelvant to the end user. This PR simply excludes them when building the package. The files remain available for running the test cases. This reduces the size of the builtonnx
package from 51MB to 12MB (uncompressed).Motivation and Context
Shipping a package that is more than 4x larger than necessary is not good. More discussion and links to previous discussions around this issue can be found at #5925.
Closes #5925