Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove test data from PyPI package #5925

Open
cbourjau opened this issue Feb 9, 2024 · 8 comments · May be fixed by #5970
Open

Remove test data from PyPI package #5925

cbourjau opened this issue Feb 9, 2024 · 8 comments · May be fixed by #5970
Labels
build Issues related to ONNX builds and packages contributions welcome enhancement Request for new feature or operator
Milestone

Comments

@cbourjau
Copy link
Contributor

cbourjau commented Feb 9, 2024

Describe the bug

The ONNX package on PyPI contains all test files found at https://github.com/onnx/onnx/tree/main/onnx/backend/test/data . These constitute ~40MB unpacked or more than 70% of the total package size.

System information

I checked the 1.15.0 MacOS wheel, but judging by the compressed file size all platforms are affected: https://pypi.org/project/onnx/#files

Expected behavior

These test files should not be installed in a production environment.

Other notes

Are there any plans to move those binary files out of git / generate them on the fly?

@cbourjau cbourjau added the bug label Feb 9, 2024
@jcwchen
Copy link
Member

jcwchen commented Feb 9, 2024

+1 on this. Otherwise ONNX PyPI package will grow significantly when there are more backend tests from more ops. See related discussion in this issue. For now, some users might still need static backend tests from the package and in that case onnx can have 2 packages -- one with test data and one without test data, but personally I feel ONNX eventually should stop providing them from PyPI and just let users produce them on the fly.

@justinchuby
Copy link
Contributor

We should.

@justinchuby justinchuby added build Issues related to ONNX builds and packages enhancement Request for new feature or operator contributions welcome and removed bug labels Feb 9, 2024
@justinchuby
Copy link
Contributor

One advantage with distributing the test data of course, is that runtimes do not need the Python tool chain to run tests (protobuf python, numpy etc.)

@cbourjau
Copy link
Contributor Author

When you say "distributing" do you mean shipping them in the PyPI package or having them in the repository? I have a hard time seeing a use case where a downstream project would rather fish the test files out of the PyPI package than using a git submodule.

@justinchuby
Copy link
Contributor

Ah you are right. We can check those in without distributing them with the Python package.

1 similar comment
@justinchuby
Copy link
Contributor

Ah you are right. We can check those in without distributing them with the Python package.

@cbourjau
Copy link
Contributor Author

I think the best way forward is if we were to move the onnx/onnx/backend/test folder out of the Python package. While being at it, we may want to do the same with onnx/onnx/backend/sample.

@justinchuby justinchuby changed the title PyPI package contains test data (40MB unpacked) Remove test data from PyPI package Feb 13, 2024
cbourjau added a commit to cbourjau/onnx that referenced this issue Feb 27, 2024
Fixes onnx#5925 in a minimally invasive way.

onnx/backend/test/data contains large test files which should not be
included in the PyPI package. This PR simply excludes them when
building the package. This reduces the size of the `onnx`
package (uncompressed) from 51MB to 12MB.
cbourjau added a commit to cbourjau/onnx that referenced this issue Feb 27, 2024
Fixes onnx#5925 in a minimally invasive way.

onnx/backend/test/data contains large test files which should not be
included in the PyPI package. This PR simply excludes them when
building the package. This reduces the size of the `onnx`
package (uncompressed) from 51MB to 12MB.

Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>
@cbourjau cbourjau linked a pull request Feb 27, 2024 that will close this issue
@cbourjau
Copy link
Contributor Author

I took a closer look at this issue. Unfortunately, the lines between tests that should not be packaged, test utilities, and the reference implementation are blurry. Moving the "tests" out of onnx/ is quite a large and technically breaking change. The most minimally invasive way to exclude those test files is by simply excluding them from the final package as done in #5970 .

cbourjau added a commit to cbourjau/onnx that referenced this issue Feb 27, 2024
Fixes onnx#5925 in a minimally invasive way.

onnx/backend/test/data contains large test files which should not be
included in the PyPI package. This PR simply excludes them when
building the package. This reduces the size of the `onnx`
package (uncompressed) from 51MB to 12MB.

Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>
@justinchuby justinchuby added this to the 1.17 milestone Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Issues related to ONNX builds and packages contributions welcome enhancement Request for new feature or operator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants