Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude test data from the Python package #5970

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

cbourjau
Copy link
Contributor

Description

onnx/backend/test/data contains large test files which should not be included in the PyPI package since they are irrelvant to the end user. This PR simply excludes them when building the package. The files remain available for running the test cases. This reduces the size of the built onnx package from 51MB to 12MB (uncompressed).

Motivation and Context

Shipping a package that is more than 4x larger than necessary is not good. More discussion and links to previous discussions around this issue can be found at #5925.

Closes #5925

@cbourjau cbourjau requested a review from a team as a code owner February 27, 2024 22:21
@cbourjau cbourjau marked this pull request as draft February 27, 2024 22:30
Copy link

codecov bot commented Feb 27, 2024

Codecov Report

Attention: Patch coverage is 76.92308% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 57.04%. Comparing base (83194ed) to head (d33f29b).
Report is 17 commits behind head on main.

Files Patch % Lines
onnx/backend/test/stat_coverage.py 0.00% 2 Missing ⚠️
onnx/backend/test/case/node/__init__.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5970      +/-   ##
==========================================
+ Coverage   56.95%   57.04%   +0.08%     
==========================================
  Files         506      503       -3     
  Lines       30467    30673     +206     
  Branches     4592     4529      -63     
==========================================
+ Hits        17353    17496     +143     
- Misses      12285    12365      +80     
+ Partials      829      812      -17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andife
Copy link
Member

andife commented Feb 28, 2024

@cjvolzka I think that would be good for 1.16? At least to establish that nobody really uses it in this way? But shouldn't that actually be the case?

@cjvolzka
Copy link
Contributor

cjvolzka commented Feb 28, 2024

@andife My initial thought is that this missed the boat for the 1.16 release. I'm working on uploading 1.16.0rc1 later today tomorrow morning and this hasn't even merged into main yet.

I don't think anyone would be using these test models but I can't say for sure. Since there's uncertainty on that, if we removed them, I'd want it in rc1. Otherwise I fear someone may validate our rc1, and if that goes well, they may not try rc2, and then miss that this somehow breaks them.

So getting into 1.16 seems like too much of a rush for an issue that's already existed for a while. Yes we should definitely fix it, but I lean towards not rushing it just to get it in now.

I could be swayed if some of the other primary reviewers disagree and want it in, but that's my hot take.

@gramalingam
Copy link
Contributor

I agree with @cjvolzka ... furthermore, this is still a draft. I agree it is a good idea to do this, but not rush this (just like Charles says).

@cbourjau
Copy link
Contributor Author

cbourjau commented Mar 4, 2024

I agree that given how long the data was packaged in the PyPI distribution one more release is not going to make a big difference.

@justinchuby justinchuby added this to the 1.17 milestone Mar 5, 2024
@postrational
Copy link
Contributor

postrational commented Mar 28, 2024

Reducing the size of the ONNX package does sound like a good idea. However, there are scenarios which do utilize the package in order to run testing. In particular, the ONNX Scoreboard relies on installing ONNX packages as they are released to run tests on submitted backends.

In addition, other users may have based their CI tests on this approach of downloading the ONNX package as a test platform. We need to provide a well defined upgrade path for these users, as this is a breaking change for them.

Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>
@cbourjau cbourjau marked this pull request as ready for review April 14, 2024 20:07
@cbourjau cbourjau requested a review from a team as a code owner April 14, 2024 20:07
@cbourjau
Copy link
Contributor Author

I cleaned up the last CI issues and marked this as "ready for review". As @postrational noted, this is indeed a breaking change, but I hardly see a way around it. I see three possible ways forward for downstream projects:

  • Migrate to a git-submodule setup
  • Generate the test files on the fly (much more work)
  • Vendor the files via a separate package such as onnx-test-data

Copy link
Contributor

@justinchuby justinchuby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM; would like to let others chime in. There may be some documentation we need to update as well.

onnx/backend/test/loader/__init__.py Outdated Show resolved Hide resolved
onnx/backend/test/runner/__init__.py Outdated Show resolved Hide resolved
tests/backend/test_backend_reference.py Outdated Show resolved Hide resolved
tests/backend/test_backend_reference.py Outdated Show resolved Hide resolved
tests/backend/test_backend_test.py Outdated Show resolved Hide resolved
tests/backend/test_backend_test.py Outdated Show resolved Hide resolved
tests/backend/test_backend_reference.py Outdated Show resolved Hide resolved
@justinchuby justinchuby added the release notes Important changes to call out in release notes label Apr 15, 2024
tests/backend/test_backend_reference.py Fixed Show fixed Hide fixed
tests/backend/test_backend_reference.py Fixed Show fixed Hide fixed
tests/backend/test_backend_reference.py Fixed Show fixed Hide fixed
tests/backend/test_backend_test.py Fixed Show fixed Hide fixed
tests/backend/test_backend_test.py Fixed Show fixed Hide fixed
tests/backend/test_backend_test.py Fixed Show fixed Hide fixed
tests/backend/test_backend_reference.py Fixed Show fixed Hide fixed
tests/backend/test_backend_test.py Fixed Show fixed Hide fixed
tests/backend/test_backend_reference.py Fixed Show fixed Hide fixed
tests/backend/test_backend_test.py Fixed Show fixed Hide fixed
@cbourjau cbourjau force-pushed the exclude-test-data branch 2 times, most recently from 4a82e37 to 6976c1a Compare April 15, 2024 07:41
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: Christian Bourjau <cbourjau@users.noreply.github.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: Christian Bourjau <cbourjau@users.noreply.github.com>
cbourjau and others added 2 commits April 15, 2024 17:57
Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>
@justinchuby justinchuby added the review needed: operators approvers Require reviews from members of operators-approvers label Apr 16, 2024
Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>
Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>
@justinchuby justinchuby changed the title Exclude test data Exclude test data from the Python package Apr 16, 2024
@justinchuby justinchuby added the run release CIs Use this label to trigger release tests in CI label Apr 16, 2024
@justinchuby
Copy link
Contributor

Reopening to trigger test

"py.typed",
]

[tool.setuptools.exclude-package-data]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change goes into the right direction. My only concern is runtimes such as onnxruntime expects to find this data. I'd me more confortable to display an error message saying what instruction is needed to get the data and have the backend tests passing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onnxruntime already fetches the entire onnx source during the build process anyway. Are they really using the PyPI package for testing, too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is used. onnx should be shipped without data. That's not a debate. Since this data can be automatically generated, we could also generate this data when the package is being installed. Something like pip install onnx <options>... So developers using the data could just install onnx with this option and nothing else would change.

@gramalingam
Copy link
Contributor

Combining the suggestions made by @cbourjau and @postrational : providing a separate testdata package, while removing it from the main package, makes sense to me.

It comes down to a question of who is willing to do that (create the new testdata package): it would be great if we have a volunteer to do this!

There's an interesting versioning complication to this issue: due to the way tests are defined and generated, the test-cases generated in different releases of onnx (from the same test-case source) may differ (using different opset-versions of the op). Eg., onnx release N and N+1 may have versions of the same test-case, but using different opset-versions (if the particular op's version was bumped in release N+1). Ideally, we should provide a better API/interface to allow users to get the versions of these test-cases they want from a single package. See here

@cbourjau
Copy link
Contributor Author

cbourjau commented May 2, 2024

I'm not sure it needs to be PyPI package at the end of the day. The Array API standard has a phenomenal testing project that (a) does not ship with binaries and (b) is not a PyPI package. Having something like that for ONNX would be ideal (but a lot of work), IMHO.

@mgehre-amd
Copy link

We also use the test models from the onnx package to test our ONNX compilers against. I'd ask you to please continue shipping this either inside the onnx package or in another python wheel.

@justinchuby
Copy link
Contributor

We also use the test models from the onnx package to test our ONNX compilers against. I'd ask you to please continue shipping this either inside the onnx package or in another python wheel.

Would it be a solution if you cloned the onnx repository to get the test data?

@mgehre-amd
Copy link

We also use the test models from the onnx package to test our ONNX compilers against. I'd ask you to please continue shipping this either inside the onnx package or in another python wheel.

Would it be a solution if you cloned the onnx repository to get the test data?

If I could run onnx.backend.test without any additional build step (just changing PYTHONPATH), that would be fine for us.

@justinchuby
Copy link
Contributor

We also use the test models from the onnx package to test our ONNX compilers against. I'd ask you to please continue shipping this either inside the onnx package or in another python wheel.

Would it be a solution if you cloned the onnx repository to get the test data?

If I could run onnx.backend.test without any additional build step (just changing PYTHONPATH), that would be fine for us.

Could you share an example usage just to make sure we understand it clearly? (A script etc.)

@mgehre-amd
Copy link

We use the (I hope) common way of running the backend tests via

import sys
import os
import unittest
import onnx.backend.test

import OurBackend

# A pytest magic variable to load pytest_report.py from this directory.
pytest_plugins = ("pytest_report",)

backend_test = onnx.backend.test.runner.Runner(OurBackend, __name__)
globals().update(backend_test.enable_report().test_cases)

if __name__ == "__main__":
    unittest.main()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes Important changes to call out in release notes review needed: operators approvers Require reviews from members of operators-approvers run release CIs Use this label to trigger release tests in CI
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

Remove test data from PyPI package
8 participants