Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Use check-sdist to check sdist rather than check-manifest #28757

Merged
merged 6 commits into from
Apr 9, 2024

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Apr 3, 2024

Close #28695

I have run this locally and it works fine, the CI is only run on a cron schedule so no real way to test in this PR.

I checked that the sdist with setup.py and meson were identical following numpy/numpy#23981 (comment)

Copy link

github-actions bot commented Apr 3, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 8354d5f. Link to the linter CI: here

@lesteve
Copy link
Member Author

lesteve commented Apr 3, 2024

Weird CI error in the OpenMP build that is completely unrelated to this PR, let's retrigger the CI to see if it persists:

[60/249] Compiling Cython source /Users/runner/work/1/s/sklearn/cluster/_hdbscan/_linkage.pyx
  FAILED: sklearn/cluster/_hdbscan/_linkage.cpython-312-darwin.so.p/sklearn/cluster/_hdbscan/_linkage.pyx.c
  cython -M --fast-fail -3 '-X language_level=3' '-X boundscheck=False' '-X wraparound=False' '-X initializedcheck=False' '-X nonecheck=False' '-X cdivision=True' '-X profile=False' --include-dir /Users/runner/work/1/s/build/cp312 /Users/runner/work/1/s/sklearn/cluster/_hdbscan/_linkage.pyx -o sklearn/cluster/_hdbscan/_linkage.cpython-312-darwin.so.p/sklearn/cluster/_hdbscan/_linkage.pyx.c

  Error compiling Cython file:
  ------------------------------------------------------------
  ...

  cimport numpy as cnp
  from libc.float cimport DBL_MAX

  import numpy as np
  from ...metrics._dist_metrics cimport DistanceMetric64
  ^
  ------------------------------------------------------------

  /Users/runner/work/1/s/sklearn/cluster/_hdbscan/_linkage.pyx:38:0: 'sklearn/metrics/_dist_metrics.pxd' not found
  [61/249] Compiling Cython source /Users/runner/work/1/s/sklearn/utils/_isfinite.pyx
  [62/249] Compiling Cython source /Users/runner/work/1/s/sklearn/metrics/cluster/_expected_mutual_info_fast.pyx
  [63/249] Compiling C++ object sklearn/utils/_vector_sentinel.cpython-312-darwin.so.p/meson-generated_sklearn_utils__vector_sentinel.pyx.cpp.o
  [64/249] Compiling C++ object sklearn/utils/_fast_dict.cpython-312-darwin.so.p/meson-generated_sklearn_utils__fast_dict.pyx.cpp.o
  ninja: build stopped: subcommand failed.
  error: subprocess-exited-with-error
  
  × Preparing editable metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /usr/local/miniconda/envs/testvenv/bin/python /usr/local/miniconda/envs/testvenv/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py prepare_metadata_for_build_editable /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmpagbgst0x
  cwd: /Users/runner/work/1/s
  Preparing editable metadata (pyproject.toml): finished with status 'error'
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

##[error]Bash exited with code '1'.

@jeremiedbb
Copy link
Member

I already saw this error. It looks like a race condition. It must be trying to build hdbscan while dist_metrics is not built yet. Is there a way to tell meson that some module depend on others ?

@jeremiedbb
Copy link
Member

For the releases, we build the sdist with the command python setup.py sdist (see)

check-sdist builds it using the command python -m build --sdist (see)

It's good that you checked they're the same. So should we switch and use the second command in our CI ?

@jeremiedbb
Copy link
Member

I don't understand what's the difference between check-manifest and check-sdist :/

@lesteve
Copy link
Member Author

lesteve commented Apr 3, 2024

I already saw this error. It looks like a race condition. It must be trying to build hdbscan while dist_metrics is not built yet. Is there a way to tell meson that some module depend on others ?

Interesting, there is probably a fix in the meson.build files, but I need to take a closer look. I thought that putting the subfolders in the right order would be enough but apparently not ...

# Subpackages are mostly in alphabetical order except to handle Cython
# dependencies across subpackages
subdir('__check_build')
subdir('_loss')
# utils needs to be early since plenty of other modules cimports utils .pxd
subdir('utils')
# metrics needs to be to be before cluster since cluster cimports metrics .pxd
subdir('metrics')
subdir('cluster')
subdir('datasets')
subdir('decomposition')
subdir('ensemble')
subdir('feature_extraction')
subdir('linear_model')
subdir('manifold')
subdir('neighbors')
subdir('preprocessing')
subdir('svm')
subdir('tree')

@lesteve
Copy link
Member Author

lesteve commented Apr 3, 2024

I don't understand what's the difference between check-manifest and check-sdist :/

I don't know for sure either all my knowledge comes from mgedmin/check-manifest#163 (comment)

@lesteve
Copy link
Member Author

lesteve commented Apr 3, 2024

It's good that you checked they're the same. So should we switch and use the second command in our CI ?

I think with setup.py, python -m build --sdist or python setup.py sdist do exactly the same. I double-checked this is the case on the 1.4.X branch.

I re-double-checked to be 100% sure, and there are some differences actually with Meson. I need to look into it more but maybe most of them are minor. The first difference is that the sdist filename is not the same scikit_learn-1.5.dev0.tar.gz with Meson (note the _ in scikit_learn) vs scikit-learn-1.5.dev0.tar.gz with setup.py (note the - in scikit-learn), not sure whether this is expected and whether this can cause an issue somehow.

There is an argument that was made in numpy/numpy#23981 (comment) that says that tweaking things for them to match is not that useful:

Sure, I know how to exclude files, I just don't really see the point of doing so. My experience with MANIFEST.in is that it's only a source of bugs, inevitable some config files end up included and some excluded. Given that the sdist is simply a tarball, why would we not make it a complete snapshot of the repo? Less work and no downside I think.

❯ python compare-sdist.py
scikit_learn.egg-info missing from meson/scikit_learn-1.5.dev0
.binder missing from setup/scikit-learn-1.5.dev0
.circleci missing from setup/scikit-learn-1.5.dev0
.cirrus.star missing from setup/scikit-learn-1.5.dev0
.codecov.yml missing from setup/scikit-learn-1.5.dev0
.git-blame-ignore-revs missing from setup/scikit-learn-1.5.dev0
.github missing from setup/scikit-learn-1.5.dev0
.gitignore missing from setup/scikit-learn-1.5.dev0
.mailmap missing from setup/scikit-learn-1.5.dev0
.pre-commit-config.yaml missing from setup/scikit-learn-1.5.dev0
CODE_OF_CONDUCT.md missing from setup/scikit-learn-1.5.dev0
CONTRIBUTING.md missing from setup/scikit-learn-1.5.dev0
SECURITY.md missing from setup/scikit-learn-1.5.dev0
asv_benchmarks missing from setup/scikit-learn-1.5.dev0
azure-pipelines.yml missing from setup/scikit-learn-1.5.dev0
benchmarks missing from setup/scikit-learn-1.5.dev0
build_tools missing from setup/scikit-learn-1.5.dev0
maint_tools missing from setup/scikit-learn-1.5.dev0
diff_file PKG-INFO found in setup/scikit-learn-1.5.dev0 and meson/scikit_learn-1.5.dev0
diff_file setup.cfg found in setup/scikit-learn-1.5.dev0 and meson/scikit_learn-1.5.dev0

@lesteve
Copy link
Member Author

lesteve commented Apr 3, 2024

I opened #28761 to tweak pyproject.toml metadata to keep this PR about check-sdist vs check-manifest ...

@jeremiedbb
Copy link
Member

As discussed irl, let's avoid to include at least the directories we used to not include to avoid introducing new sources of vulnerability in our sdist (not saying that there are any, but by precaution).

This should be doable for meson using .gitattributes + export-ignore.
On the check_sdist side, it seems to be configurable through pyproject.toml: https://github.com/henryiii/check-sdist?tab=readme-ov-file#configuration

@lesteve
Copy link
Member Author

lesteve commented Apr 4, 2024

I pushed a commit making both sdist closer:

.coveragerc missing from meson/scikit_learn-1.5.dev0
scikit_learn.egg-info missing from meson/scikit_learn-1.5.dev0
CODE_OF_CONDUCT.md missing from setup/scikit-learn-1.5.dev0
CONTRIBUTING.md missing from setup/scikit-learn-1.5.dev0
SECURITY.md missing from setup/scikit-learn-1.5.dev0
diff_file PKG-INFO found in setup/scikit-learn-1.5.dev0 and meson/scikit_learn-1.5.dev0
diff_file setup.cfg found in setup/scikit-learn-1.5.dev0 and meson/scikit_learn-1.5.dev0
.gitignore missing from meson/scikit_learn-1.5.dev0/doc/tutorial/text_analytics

I think differences are acceptable:

  • not in meson sdist:
    • .coveragerc do we care? It can be added back if needed
    • .gitignore from very old tutorial, probably we don't care
  • only in meson sdist:
    • .md files at the root of the folder like CONTRIBUTING.md, SECURITY.md and CODE_OF_CONDUCT.md, I think it is better to have them actually
  • some differences:
    • PKG-INFO has some metadata differences that I think are minor (here is the diff after reordering i.e. calling sort on both files to make the diff less noisy)
--- PKG-INFO-setup-reordered	2024-04-04 19:49:48.704495559 +0200
+++ PKG-INFO-meson-reordered	2024-04-04 19:50:01.624835679 +0200
@@ -131,7 +131,6 @@
 .. _DOI: https://zenodo.org/badge/latestdoi/21369/scikit-learn/scikit-learn
 .. |DOI| image:: https://zenodo.org/badge/21369/scikit-learn/scikit-learn.svg
 - Download releases: https://pypi.org/project/scikit-learn/
-Download-URL: https://pypi.org/project/scikit-learn/#files
 - Facebook: https://www.facebook.com/scikitlearnofficial/
 - FAQ: https://scikit-learn.org/stable/faq.html
 for a history of notable changes to scikit-learn.
@@ -158,13 +157,11 @@
 It is currently maintained by a team of volunteers.
 - joblib (>= |JoblibMinVersion|)
 .. |JoblibMinVersion| replace:: 1.2.0
-License-File: COPYING
 License: new BSD
 - LinkedIn: https://www.linkedin.com/company/scikit-learn
 - Logos & Branding: https://github.com/scikit-learn/scikit-learn/tree/main/doc/logos
 - Mailing list: https://mail.python.org/mailman/listinfo/scikit-learn
-Maintainer-email: scikit-learn developers <scikit-learn@python.org>
-Maintainer: scikit-learn developers
+Maintainer-Email: scikit-learn developers <scikit-learn@python.org>
 - Mastodon: https://mastodon.social/@sklearn@fosstodon.org
 .. |MatplotlibMinVersion| replace:: 3.3.4
 Metadata-Version: 2.1
@@ -184,11 +181,11 @@
     pip install -U scikit-learn
 .. |PlotlyMinVersion| replace:: 5.14.0
 Project History
-Project-URL: download, https://pypi.org/project/scikit-learn/#files
-Project-URL: homepage, https://scikit-learn.org
-Project-URL: release notes, https://scikit-learn.org/stable/whats_new
-Project-URL: source, https://github.com/scikit-learn/scikit-learn
-Project-URL: tracker, https://github.com/scikit-learn/scikit-learn/issues
+Project-URL: Download, https://pypi.org/project/scikit-learn/#files
+Project-URL: Homepage, https://scikit-learn.org
+Project-URL: Release notes, https://scikit-learn.org/stable/whats_new
+Project-URL: Source, https://github.com/scikit-learn/scikit-learn
+Project-URL: Tracker, https://github.com/scikit-learn/scikit-learn/issues
 Provides-Extra: benchmark
 Provides-Extra: build
 Provides-Extra: docs
  • setup.cfg has non meaningful differences (whitespace replace by tabs and comments stripped by setuptools)

@jeremiedbb
Copy link
Member

jeremiedbb commented Apr 5, 2024

  • not in meson sdist:
    .coveragerc do we care? It can be added back if needed
    .gitignore from very old tutorial, probably we don't care

I don't know about these ones. I have the same feeling as you.

  • only in meson sdist:
    .md files at the root of the folder like CONTRIBUTING.md, SECURITY.md and CODE_OF_CONDUCT.md, I think it is better to have them actually

I agree. I wonder why we explicitly exclude them in MANIFEST.in. I looked at the PR that did it (#18128) and the reason for that is not mentioned.

  • some differences:
    PKG-INFO has some metadata differences that I think are minor (here is the diff after reordering i.e. calling sort on both files to make the diff less noisy)

Indeed they all seem minor (capital letters).
-License-File: COPYING: I don't know if that's important...
-Maintainer: scikit-learn developers: weird that it's not there since you set the maintainer's name in the pyproject.toml, probably not that important.

Ping @thomasjpfan who reviewed the latest big change in MANIFEST.in. Do you see any issue with the difference between the sdist generated using meson and the one we used to generate with setuptools (see #28757 (comment))?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @lesteve.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with the changes to the sdist.

@@ -19,15 +19,15 @@ jobs:
# scipy and cython are required to build sdist
run: |
python -m pip install --upgrade pip
pip install check-manifest scipy cython
pip install check-sdist scipy cython
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By reading the check-sdist docs, we do not need to install the dependencies:

Suggested change
pip install check-sdist scipy cython
pip install check-sdist

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- run: |
check-manifest -v
check-sdist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it'll be useful to include --inject-junk here to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure but I included it 😉

@thomasjpfan thomasjpfan merged commit e27425f into scikit-learn:main Apr 9, 2024
30 checks passed
@lesteve lesteve deleted the check-sdist branch April 9, 2024 13:07
@lesteve
Copy link
Member Author

lesteve commented Apr 9, 2024

Nice, I'll check tomorrow that the new check sdist workflow succeeds!

@jeremiedbb
Copy link
Member

that

if 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

⚠️ CI failed on Check Manifest ⚠️
4 participants