Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLD: Build wheels using cibuildwheel #48283

Merged
merged 10 commits into from
Sep 21, 2022

Conversation

lithomas1
Copy link
Member

@lithomas1 lithomas1 commented Aug 27, 2022

We can't switch to this instantly since there's still a lot of work to be done, but this is a good start(the wheels build).
This PR can be merged anytime, though, and separate PRs can be put up to address issues.

The upload script probably won't work(it should, though, it is copy pasted from numpy's), but the only way to check is to merge this.

Current issues:

  • Align flags when building wheels ( the wheel sizes are way off, I need to turn on stripping for sure)
    - [ ] Figure out licensing for MSVC redistributables (I don't think we were doing this correctly before)
    - I think we should be good on this after reading through https://visualstudio.microsoft.com/license-terms/vs2022-ga-community/.
  • An admin needs to add the anaconda keys.
  • aarch64 wheel builds
  • adjust the flaker rules

@lithomas1 lithomas1 added Build Library building on various platforms CI Continuous Integration Release labels Aug 27, 2022
ci/upload_wheels.sh Outdated Show resolved Hide resolved
ci/upload_wheels.sh Outdated Show resolved Hide resolved
lithomas1 and others added 2 commits August 30, 2022 09:42
Co-Authored-By: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
ci/fix_wheels.py Outdated Show resolved Hide resolved
success = True
exception = None
repaired_wheel_path = os.path.join(dest_dir, wheel_name)
with zipfile.ZipFile(repaired_wheel_path, "a") as zipf:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for using zipfile versus wheel? Not sure what implementation differences there are but likely safer to use wheel unpack

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wheels are zip files underneath. I don't think using wheel is a good idea since wheel has no public Python API, and I don't want to do a bunch of subprocess calls here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that's a bit unfortunate with wheel. The other consideration point is that the zipfile is mostly an implementation detail of wheel at the moment; may or may not last forever. There's been some talk about this over the past few years

https://discuss.python.org/t/making-the-wheel-format-more-flexible-for-better-compression-speed/3810

There's also apparently some file hashes that wheel takes care of that you wouldn't get this way (saw in wheel documentation). Not sure of the ultimate impacts of that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some interesting discussion here as well

pypa/wheel#262

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can stick with this for now. I'm just not a huge fan of The file hashes might be important, but given that pip installs the wheel without complaint, I'm going to ignore it for now. If anything fails, it will fail loudly in the future.

try:
# TODO: figure out how licensing works for the redistributables
base_redist_dir = (
f"C:/Program Files (x86)/Microsoft Visual Studio/2019/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should maybe consider using a package to do this unless we have in house expertise on Windows libraries. delvewheel might be an option

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last time I tried(~1 year back), delvewheel was doing creating extra directories in the repaired wheels.

Note: We test the pandas wheels on the windowsservercore docker images, which does not have the visual c++ redistributables. Any missing DLLS should fail the job there.

ci/test_wheels.py Outdated Show resolved Hide resolved
if wheel_path is None:
raise ValueError("Wheel path must be passed in if on 64-bit Windows")
print(f"Pulling docker image to test Windows 64-bit Python {py_ver}")
subprocess.run(f"docker pull python:{py_ver}-windowsservercore", check=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also be worth just using the docker python package rather than running all of these in subprocess commands

https://pypi.org/project/docker/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there will be too much improvement here(have not used that package myself). Looking at the examples, though, (e.g. client.containers.run("ubuntu:latest", "echo hello world")), it seems like the syntax is very similar to a subprocess call.

@mroeschke
Copy link
Member

Looks pretty good to start. Could you show a commit where the workflow ran (and maybe save the wheel as an artifact)?

@lithomas1
Copy link
Member Author

If you scroll down on https://github.com/pandas-dev/pandas/actions/runs/3029167054, you can find the artifacts.

Note that the manylinux wheels are much larger, since I haven't passed the flags to strip wheels yet. I'll try to match flags in a follow up PR.
(The macosx wheels are normal size, even though they seem bigger since both the x86_64 and arm64 wheels are included in the artifact)

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment otherwise looks okay to merge for now and do the follow ups in subsequent PRs.

@lithomas1
Copy link
Member Author

The last remaining blocker I think is for someone with access to the Anaconda org to add the keys to this repo.

@lithomas1
Copy link
Member Author

@TomAugspurger Sorry for the ping, but do you know how to get the tokens for the staging and nightly anaconda uploads?
I think you were the one to add the tokens originally to the pandas-wheels repo.

@TomAugspurger
Copy link
Contributor

No worries.

I think anyone in the scipy-wheels-nightly group on anaconda.org (https://anaconda.org/scipy-wheels-nightly) can generate a token for uploading or view the existing tokens. Do you have an account on anaconda.org @lithomas1?

@TomAugspurger
Copy link
Contributor

Oh, or I can just set the environment variable in this repo. Let me do that quick.

@TomAugspurger
Copy link
Contributor

@lithomas1 there's now a PANDAS_NIGHTLY_UPLOAD_TOKEN secret in GitHub that should have permission to upload to that group in anaconda.org.

@lithomas1
Copy link
Member Author

@lithomas1 there's now a PANDAS_NIGHTLY_UPLOAD_TOKEN secret in GitHub that should have permission to upload to that group in anaconda.org.

Thanks, can you also add the token for multibuild-wheels-staging under PANDAS_STAGING_UPLOAD_TOKEN?

@TomAugspurger
Copy link
Contributor

Done!

@lithomas1
Copy link
Member Author

@WillAyd Can you re-review again when you have time?
I think your review is blocking the merge.

(No rush though, this can go in anytime before 1.6/2.0)

@lithomas1 lithomas1 added this to the 1.6 milestone Sep 20, 2022
@WillAyd
Copy link
Member

WillAyd commented Sep 20, 2022

@lithomas1 sorry about that. Let's see how this goes - thanks for the ongoing improvements

@lithomas1 lithomas1 merged commit 71fc89c into pandas-dev:main Sep 21, 2022
@lithomas1 lithomas1 deleted the cibuildwheel-submit branch September 21, 2022 18:54
@lithomas1
Copy link
Member Author

Going to put this in now given the approvals. Let's see how the nightlies work out this night.

phofl pushed a commit to phofl/pandas that referenced this pull request Sep 22, 2022
* BLD: Build wheels using cibuildwheel

* update from code review

Co-Authored-By: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* fix 3.11 version

* changes from code review

* Update test_wheels.py

* sync run time with pandas-wheels

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
mroeschke added a commit that referenced this pull request Sep 26, 2022
…8662)

* BUG: Series.getitem not falling back to positional for bool index

* Update pandas/tests/series/indexing/test_getitem.py

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Fix build warning for use of `strdup` in ultrajson (#48369)

* WEB: Update versions json to fix version switcher in the docs (#48655)

* PERF: join/merge on subset of MultiIndex (#48611)

* DOC: Update documentation for date_range(), bdate_range(), and interval_range() to include timedelta as a possible data type for the freq parameter (#48631)

* Update documentation for date_range(), bdate_range(), and interval_range() to include timedelta as a possible data type for the freq parameter

* Add test case for date_range construction using datetime.timedelta

* TYP: tighten Axis (#48612)

* TYP: tighten Axis

* allow 'rows'

* BUG: Fix metadata propagation in df.corr and df.cov, GH28283 (#48616)

* Add finalize to df.corr and df.cov

* Clean

* TST: add test case for PeriodIndex in HDFStore(GH7796) (#48618)

* TST: add test case for PeriodIndex in HDFStore

* TST: add test case for PeriodIndex in HDFStore

* use pytest.mark.parameterize instead

* Add OpenSSF Scorecards GitHub Action (#48570)

* Create scorecards.yml

* Update scorecards.yml

* Add OpenSSF Scorecards badge to README.md

* Trim whitespace in scorecards.yml

* Skip scorecards.yml on forks

* Fix whitespace

* Pin scorecards.yml dependencies to major versions

* ENH: move an exception and add a prehook to check for exception place… (#48088)

* ENH: move an exception and add a prehook to check for exception placement

* ENH: fix import

* ENH: revert moving error

* ENH: add docstring and fix import for test

* ENH: re-design approach based on feedback

* ENH: update whatsnew rst

* ENH: apply feedback changes

* ENH: refactor to remove exception_warning_list and ignore _version.py

* ENH: remove NotThisMethod from tests and all

* REGR: TextIOWrapper raising an error in read_csv (#48651)

* REGR: TextIOWrapper raising an error in read_csv

* pyupgrade

* do not try to seek on unseekable buffers

* unseekable buffer might also have read ahead

* safer alternative: do not mess with internal/private(?) buffer of TextIOWrapper (effectively applies the shortcut only to files pandas opens)

* Fix scorecard.yml workflow (#48668)

* Set scorecard-action to v2.0.3

scorecard-action does not have a major version tag.

Temporarily disabling github.repository check to ensure action now works.

* Enable github.repository check

* BUG: DatetimeIndex ignoring explicit tz=None (#48659)

* BUG: DatetimeIndex ignoring explicit tz=None

* GH ref

* Corrected pd.merge indicator type hint (#48677)

* Corrected pd.merge indicator type hint

https://pandas.pydata.org/docs/reference/api/pandas.merge.html
It should be "str | bool" instead of just string

* Update merge.py

fixed type hint in merge.py

* Update merge.py

Update indicator type hint in _MergeOperation

* Update merge.py

Added type hint _MergeOperation init

* DOC: Document default value for options.display.max_cols when not running in terminal (#48672)

DOC: Document default value for options.display.max_cols

display.max_cols has a default value of 20 when not running in a terminal
such as Jupyter Notebook

* ENH: DTA/TDA add datetimelike scalar with mismatched reso (#48669)

* ENH: DTA/TDA add datetimelike scalar with mismatched reso

* mypy fixup

* REF: support reso in remaining tslibs helpers (#48661)

* REF: support reso in remaining tslibs helpers

* update setup.py

* PERF: Avoid fragmentation of DataFrame in read_sas (#48603)

* PERF: Avoid fragmentation of DataFrame in read_sas

* Add whatsnew

* Add warning

* DOC: Add deprecation infos to deprecated functions (#48599)

* DOC: Add deprecation infos to deprecated functions

* Add sections

* Fix

* BLD: Build wheels using cibuildwheel (#48283)

* BLD: Build wheels using cibuildwheel

* update from code review

Co-Authored-By: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* fix 3.11 version

* changes from code review

* Update test_wheels.py

* sync run time with pandas-wheels

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* REGR: Performance decrease in factorize (#48620)

* TYP: type all arguments with str default values (#48508)

* TYP: type all arguments with str default values

* na_rep: back to str

* na(t)_rep is always a string

* add float for some functions

* and the same for the few float default arguments

* define a few more literal constants

* avoid itertools.cycle mypy error

* revert mistake

* TST: Catch more pyarrow PerformanceWarnings (#48699)

* REGR: to_hdf raising AssertionError with boolean index (#48696)

* REGR: to_hdf raising AssertionError with boolean index

* Add gh ref

* REGR: Regression in DataFrame.loc when setting df with all True indexer (#48711)

* BUG: pivot_table raising for nullable dtype and margins (#48714)

* TST: Address MPL 3.6 deprecation warnings (#48695)

* TST: Address MPL 3.6 deprecation warnings

* Address min build

* missing ()

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
Co-authored-by: Marc Garcia <garcia.marc@gmail.com>
Co-authored-by: Luke Manley <lukemanley@gmail.com>
Co-authored-by: Siddhartha Gandhi <siddhartha.a.gandhi@gmail.com>
Co-authored-by: Torsten Wörtwein <twoertwein@users.noreply.github.com>
Co-authored-by: Xiao Yuan <yuanx749@gmail.com>
Co-authored-by: paradox-lab <57354735+paradox-lab@users.noreply.github.com>
Co-authored-by: Pedro Nacht <15221358+pnacht@users.noreply.github.com>
Co-authored-by: dataxerik <dsshar@gmail.com>
Co-authored-by: jbrockmendel <jbrockmendel@gmail.com>
Co-authored-by: Pablo <48098178+PabloRuizCuevas@users.noreply.github.com>
Co-authored-by: tmoschou <5567550+tmoschou@users.noreply.github.com>
Co-authored-by: Thomas Li <47963215+lithomas1@users.noreply.github.com>
Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com>
@mroeschke mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
* BLD: Build wheels using cibuildwheel

* update from code review

Co-Authored-By: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* fix 3.11 version

* changes from code review

* Update test_wheels.py

* sync run time with pandas-wheels

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
…ndas-dev#48662)

* BUG: Series.getitem not falling back to positional for bool index

* Update pandas/tests/series/indexing/test_getitem.py

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Fix build warning for use of `strdup` in ultrajson (pandas-dev#48369)

* WEB: Update versions json to fix version switcher in the docs (pandas-dev#48655)

* PERF: join/merge on subset of MultiIndex (pandas-dev#48611)

* DOC: Update documentation for date_range(), bdate_range(), and interval_range() to include timedelta as a possible data type for the freq parameter (pandas-dev#48631)

* Update documentation for date_range(), bdate_range(), and interval_range() to include timedelta as a possible data type for the freq parameter

* Add test case for date_range construction using datetime.timedelta

* TYP: tighten Axis (pandas-dev#48612)

* TYP: tighten Axis

* allow 'rows'

* BUG: Fix metadata propagation in df.corr and df.cov, GH28283 (pandas-dev#48616)

* Add finalize to df.corr and df.cov

* Clean

* TST: add test case for PeriodIndex in HDFStore(GH7796) (pandas-dev#48618)

* TST: add test case for PeriodIndex in HDFStore

* TST: add test case for PeriodIndex in HDFStore

* use pytest.mark.parameterize instead

* Add OpenSSF Scorecards GitHub Action (pandas-dev#48570)

* Create scorecards.yml

* Update scorecards.yml

* Add OpenSSF Scorecards badge to README.md

* Trim whitespace in scorecards.yml

* Skip scorecards.yml on forks

* Fix whitespace

* Pin scorecards.yml dependencies to major versions

* ENH: move an exception and add a prehook to check for exception place… (pandas-dev#48088)

* ENH: move an exception and add a prehook to check for exception placement

* ENH: fix import

* ENH: revert moving error

* ENH: add docstring and fix import for test

* ENH: re-design approach based on feedback

* ENH: update whatsnew rst

* ENH: apply feedback changes

* ENH: refactor to remove exception_warning_list and ignore _version.py

* ENH: remove NotThisMethod from tests and all

* REGR: TextIOWrapper raising an error in read_csv (pandas-dev#48651)

* REGR: TextIOWrapper raising an error in read_csv

* pyupgrade

* do not try to seek on unseekable buffers

* unseekable buffer might also have read ahead

* safer alternative: do not mess with internal/private(?) buffer of TextIOWrapper (effectively applies the shortcut only to files pandas opens)

* Fix scorecard.yml workflow (pandas-dev#48668)

* Set scorecard-action to v2.0.3

scorecard-action does not have a major version tag.

Temporarily disabling github.repository check to ensure action now works.

* Enable github.repository check

* BUG: DatetimeIndex ignoring explicit tz=None (pandas-dev#48659)

* BUG: DatetimeIndex ignoring explicit tz=None

* GH ref

* Corrected pd.merge indicator type hint (pandas-dev#48677)

* Corrected pd.merge indicator type hint

https://pandas.pydata.org/docs/reference/api/pandas.merge.html
It should be "str | bool" instead of just string

* Update merge.py

fixed type hint in merge.py

* Update merge.py

Update indicator type hint in _MergeOperation

* Update merge.py

Added type hint _MergeOperation init

* DOC: Document default value for options.display.max_cols when not running in terminal (pandas-dev#48672)

DOC: Document default value for options.display.max_cols

display.max_cols has a default value of 20 when not running in a terminal
such as Jupyter Notebook

* ENH: DTA/TDA add datetimelike scalar with mismatched reso (pandas-dev#48669)

* ENH: DTA/TDA add datetimelike scalar with mismatched reso

* mypy fixup

* REF: support reso in remaining tslibs helpers (pandas-dev#48661)

* REF: support reso in remaining tslibs helpers

* update setup.py

* PERF: Avoid fragmentation of DataFrame in read_sas (pandas-dev#48603)

* PERF: Avoid fragmentation of DataFrame in read_sas

* Add whatsnew

* Add warning

* DOC: Add deprecation infos to deprecated functions (pandas-dev#48599)

* DOC: Add deprecation infos to deprecated functions

* Add sections

* Fix

* BLD: Build wheels using cibuildwheel (pandas-dev#48283)

* BLD: Build wheels using cibuildwheel

* update from code review

Co-Authored-By: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* fix 3.11 version

* changes from code review

* Update test_wheels.py

* sync run time with pandas-wheels

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* REGR: Performance decrease in factorize (pandas-dev#48620)

* TYP: type all arguments with str default values (pandas-dev#48508)

* TYP: type all arguments with str default values

* na_rep: back to str

* na(t)_rep is always a string

* add float for some functions

* and the same for the few float default arguments

* define a few more literal constants

* avoid itertools.cycle mypy error

* revert mistake

* TST: Catch more pyarrow PerformanceWarnings (pandas-dev#48699)

* REGR: to_hdf raising AssertionError with boolean index (pandas-dev#48696)

* REGR: to_hdf raising AssertionError with boolean index

* Add gh ref

* REGR: Regression in DataFrame.loc when setting df with all True indexer (pandas-dev#48711)

* BUG: pivot_table raising for nullable dtype and margins (pandas-dev#48714)

* TST: Address MPL 3.6 deprecation warnings (pandas-dev#48695)

* TST: Address MPL 3.6 deprecation warnings

* Address min build

* missing ()

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
Co-authored-by: Marc Garcia <garcia.marc@gmail.com>
Co-authored-by: Luke Manley <lukemanley@gmail.com>
Co-authored-by: Siddhartha Gandhi <siddhartha.a.gandhi@gmail.com>
Co-authored-by: Torsten Wörtwein <twoertwein@users.noreply.github.com>
Co-authored-by: Xiao Yuan <yuanx749@gmail.com>
Co-authored-by: paradox-lab <57354735+paradox-lab@users.noreply.github.com>
Co-authored-by: Pedro Nacht <15221358+pnacht@users.noreply.github.com>
Co-authored-by: dataxerik <dsshar@gmail.com>
Co-authored-by: jbrockmendel <jbrockmendel@gmail.com>
Co-authored-by: Pablo <48098178+PabloRuizCuevas@users.noreply.github.com>
Co-authored-by: tmoschou <5567550+tmoschou@users.noreply.github.com>
Co-authored-by: Thomas Li <47963215+lithomas1@users.noreply.github.com>
Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms CI Continuous Integration Release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider cibuildwheel for building wheels?
4 participants