From c817a088b7f9b64f6e424045b93d7655683c0e45 Mon Sep 17 00:00:00 2001 From: JMBurley Date: Fri, 21 Oct 2022 12:47:16 -0400 Subject: [PATCH] BLD: add optional dependencies as extras_require in setup.py (#47336) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * add-recommended-dependencies-as-extras_require-in-setup.cfg See issue #47335. https://github.com/pandas-dev/pandas/issues/47335 recommended dependencies should have package mgmt facilitated through pandas. This will make mgmt of pandas in production docker environments a lot simpler. * Update v1.4.3.rst * double backtick code in rst * rebundle under extras_require `recommended` * [options.extras_require] bundled dependencies by featureset see #39164 for discussion. * note: [options.extras_require] bundled dependencies by featureset * Update setup.cfg rollback numba from recommended. This would necessitate and update to documentation that requires broad agreement from pd-dev-core team that would slow down the overall PR * add adlfs for azure to `access data in cloud` see #39164 for discussion. 0.6.0 might be an overly restrictive version, but it is compatible * fix extras_require: PyTables is actually `tables` on PyPi * Update setup.cfg * add `all` option to [options.extras_require] * moved changelog to 1.4.4 as 1.4.3 released while this PR was stalled * Updated to 1.5.0 compliance * simplify sql option names * extras rename: recommended -> performance * remove azure support is currently unofficial as of 1.5.0 * align with actions-38-minimum_versions.yaml add specific installs and, where required, missing install documentation for - odfpy - pyreadstat - compression options * Pandas -> pandas in doc Co-authored-by: Matthew Roeschke * extras rename: s3 -> aws see https://github.com/pandas-dev/pandas/pull/47336#discussion_r923930271 * extras rename: table -> output_formatting to be more general in case of future changes * bug: `>=` not `=` * Apply suggestions from code review Co-authored-by: Simon Hawkins * align 1.5.0.rst to latest extras_require updates * 1.5.0.rst example updated to use valid extras * add optional dep mgmt instructions to install.rst * lint scipy optional import Co-authored-by: Matthew Roeschke * Apply suggestions from code review * detailed extras guidance in install.rst - updated numbas to a full recommended dependency with a promotional bullet point like bottleneck and numexpr - clarified the extra to use for each set of optional dependencies - made xml an optional extra, because is does have usage outside of read_html. * _optional.py note to keep track of setup.cfg * bug: indent after bullet in install.rst * remove numba from computation extra. * Backport PR #48197 on branch 1.5.x (DOC: Cleanup 1.5 whatsnew) (#48228) Backport PR #48197: DOC: Cleanup 1.5 whatsnew Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Backport PR #48232 on branch 1.5.x (CI: Ensure jobs run on 1.5.x branch) (#48235) Backport PR #48232: CI: Ensure jobs run on 1.5.x branch Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Backport PR #48180 on branch 1.5.x (CI: Switch to large for circleci) (#48251) Backport PR #48180: CI: Switch to large for circleci Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48245 on branch 1.5.x (CI: Skip test_round_sanity tests due to failures) (#48257) Backport PR #48245: CI: Skip test_round_sanity tests due to failures Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48240 on branch 1.5.x (Fix mypy erroring on backport branches) (#48259) Backport PR #48240: Fix mypy erroring on backport branches Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48215 on branch 1.5.x (REGR: properly update DataFrame cache in Series.__setitem__) (#48268) Backport PR #48215: REGR: properly update DataFrame cache in Series.__setitem__ Co-authored-by: Joris Van den Bossche * Backport PR #48272 on branch 1.5.x (CI: Require s3fs greater than minumum version in builds) (#48276) Backport PR #48272: CI: Require s3fs greater than minumum version in builds Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48299 on branch 1.5.x (Bump s3fs to 2021.08.00) (#48305) Backport PR #48299: Bump s3fs to 2021.08.00 Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48027 on branch 1.5.x (ENH: Support masks in groupby prod) (#48302) Backport PR #48027: ENH: Support masks in groupby prod Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #47762 on branch 1.5.x (REGR: preserve reindexed array object (instead of creating new array) for concat with all-NA array) (#48309) Backport PR #47762: REGR: preserve reindexed array object (instead of creating new array) for concat with all-NA array Co-authored-by: Joris Van den Bossche * Backport PR #48246 on branch 1.5.x (REGR: iloc not possible for sparse DataFrame) (#48311) Backport PR #48246: REGR: iloc not possible for sparse DataFrame Co-authored-by: Simon Hawkins * Backport PR #48314 on branch 1.5.x (DOC: v1.4.4 release date and tidy up release notes) (#48320) Backport PR #48314: DOC: v1.4.4 release date and tidy up release notes Co-authored-by: Simon Hawkins * Backport PR #48301 on branch 1.5.x (DEPR: Deprecate positional arguments in pivot) (#48326) Backport PR #48301: DEPR: Deprecate positional arguments in pivot Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48214 on branch 1.5.x (WEB: Removing links to pdf version of the docs from web and docs) (#48242) Backport PR #48214: WEB: Removing links to pdf version of the docs from web and docs * Backport PR #48159 on branch 1.5.x (TST: Fix interchange/plotting/groupby test warnings) (#48279) Backport PR #48159: TST: Fix interchange/plotting/groupby test warnings Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Backport PR #48324 on branch 1.5.x (BUG: Add note in whatsnew for DataFrame.at behavior change) (#48345) Backport PR #48324: BUG: Add note in whatsnew for DataFrame.at behavior change Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com> * Backport PR #48336 on branch 1.5.x (DOC: Add whatsnew note for #45404) (#48341) Backport PR #48336: DOC: Add whatsnew note for #45404 Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Backport PR #48254 on branch 1.5.x (REF: avoid FutureWarning about using deprecates loc.__setitem__ non-inplace usage) (#48353) Backport PR #48254: REF: avoid FutureWarning about using deprecates loc.__setitem__ non-inplace usage Co-authored-by: jbrockmendel * Backport PR #48334 on branch 1.5.x (BUG: read_html(extract_links=all) with no header) (#48350) Backport PR #48334: BUG: read_html(extract_links=all) with no header Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Backport PR #48265 on branch 1.5.x (CI: Setting up ssh key to upload prod docs) (#48370) Backport PR #48265: CI: Setting up ssh key to upload prod docs Co-authored-by: Marc Garcia * Backport PR #48381 on branch 1.5.x (CI: Pin mambaforge image) (#48401) Backport PR #48381: CI: Pin mambaforge image Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48229 on branch 1.5.x (TST: Test Nullable int floordiv by 0) (#48413) Backport PR #48229: TST: Test Nullable int floordiv by 0 Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Backport PR #48414 on branch 1.5.x (DOC: Add deprecation to is_categorical) (#48418) Backport PR #48414: DOC: Add deprecation to is_categorical Co-authored-by: Kevin Sheppard * Backport PR #48264 on branch 1.5.x (BUG: ArrowExtensionArray._from_* accepts pyarrow arrays) (#48422) * Backport PR #48264: BUG: ArrowExtensionArray._from_* accepts pyarrow arrays * Add missing import Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Backport PR #48411 on branch 1.5.x (REGR: get_loc for ExtensionEngine not returning bool indexer for na) (#48430) Backport PR #48411: REGR: get_loc for ExtensionEngine not returning bool indexer for na Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48419 on branch 1.5.x (BUG: ensure to return writable buffer in __dataframe__ interchange for categorical column) (#48441) Backport PR #48419: BUG: ensure to return writable buffer in __dataframe__ interchange for categorical column Co-authored-by: Joris Van den Bossche * Backport PR #48444 on branch 1.5.x (CI: Pin ipython version) (#48449) Backport PR #48444: CI: Pin ipython version Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48380 on branch 1.5.x (DOC: Clarify that objects dtype takes precedence in where) (#48445) * Backport PR #48380: DOC: Clarify that objects dtype takes precedence in where * Update generic.py Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Manual Backport PR #48427 on branch 1.5.x (BLD: Refactor Dockerfile to not install dev enviornment on base) (#48450) Backport PR #48427: BLD: Refactor Dockerfile to not install dev enviornment on base * Backport PR #48426 on branch 1.5.x (BUG: Column.size should be a method) (#48465) Backport PR #48426: BUG: Column.size should be a method Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Backport PR #48398 on branch 1.5.x (WARN: Avoid FutureWarnings in tests) (#48420) * Backport PR #48398: WARN: Avoid FutureWarnings in tests * Update Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> Co-authored-by: Marc Garcia * Backport PR #48416 on branch 1.5.x (REF: ensure to apply suffixes before concat step in merge code) (#48470) Backport PR #48416: REF: ensure to apply suffixes before concat step in merge code Co-authored-by: Joris Van den Bossche * Backport PR #48354 on branch 1.5.x (CI: Bump timeout to 180 minutes) (#48474) Backport PR #48354: CI: Bump timeout to 180 minutes Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Backport PR #48472 on branch 1.5.x (PERF: keep using ObjectEngine for ExtensionArrays for 1.5) (#48486) Backport PR #48472: PERF: keep using ObjectEngine for ExtensionArrays for 1.5 Co-authored-by: Joris Van den Bossche * Backport PR #48473 on branch 1.5.x (REGR: .describe on unsigned dtypes results in object) (#48501) Backport PR #48473: REGR: .describe on unsigned dtypes results in object Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com> * Backport PR #48443 on branch 1.5.x (BUG: Fix pyarrow groupby tests) (#48494) * BUG: Fix pyarrow groupby tests (#48443) # Conflicts: # pandas/tests/extension/test_arrow.py * CI: Fix failing tests (#48493) Co-authored-by: jbrockmendel * Backport PR #48490 on branch 1.5.x (CI: Use -j1 for python-dev build to avoid flaky build error) (#48517) Backport PR #48490: CI: Use -j1 for python-dev build to avoid flaky build error Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Manual Backport PR #48428 on branch 1.5.x (BUG/TST: fix a bunch of arraymanager+pyarrow tests) (#48518) Backport PR #48428: BUG/TST: fix a bunch of arraymanager+pyarrow tests Co-authored-by: jbrockmendel * Backport PR #48525 on branch 1.5.x (CI: Fix py311 builds different exception message) (#48529) Backport PR #48525: CI: Fix py311 builds different exception message Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48489 on branch 1.5.x (BUG: fix test_arrow.py tests) (#48532) Backport PR #48489: BUG: fix test_arrow.py tests Co-authored-by: jbrockmendel * Backport PR #48543 on branch 1.5.x (DOC: Update footer and include OVH) (#48548) Backport PR #48543: DOC: Update footer and include OVH Co-authored-by: Marc Garcia * Manual Backport PR #48417 on branch 1.5.x (Revert set_index inplace and copy keyword changes) (#48552) Backport PR #48417: Revert set_index inplace and copy keyword changes Co-authored-by: Joris Van den Bossche * Backport PR #48550 on branch 1.5.x (TST: remove 2D tests irrelevant for pyarrow) (#48554) Backport PR #48550: TST: remove 2D tests irrelevant for pyarrow Co-authored-by: jbrockmendel * Backport PR #48556 on branch 1.5.x (DOC: Fix docs footer) (#48558) Backport PR #48556: DOC: Fix docs footer Co-authored-by: Marc Garcia * Backport PR #48562 on branch 1.5.x (TST: Testing that no warnings are emitted and that inplace fillna produces the correct result (GH48480)) (#48564) Backport PR #48562: TST: Testing that no warnings are emitted and that inplace fillna produces the correct result (GH48480) Co-authored-by: RaphSku <45042665+RaphSku@users.noreply.github.com> * Backport PR #48563 on branch 1.5.x (DOC: Fix read_sas 1.5 release notes) (#48565) Backport PR #48563: DOC: Fix read_sas 1.5 release notes Co-authored-by: Jonas Haag * Backport PR #48539 on branch 1.5.x (REGR: groupby doesn't identify null values when sort=False) (#48568) Backport PR #48539: REGR: groupby doesn't identify null values when sort=False Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com> * Backport PR #48555 on branch 1.5.x (DEPR: Series.astype(np.datetime64)) (#48569) Backport PR #48555: DEPR: Series.astype(np.datetime64) Co-authored-by: jbrockmendel * Backport PR #48557 on branch 1.5.x (WEB: Add new footer to web) (#48571) Backport PR #48557: WEB: Add new footer to web Co-authored-by: Marc Garcia * Backport PR #48285 on branch 1.5.x (WEB: Unpin pydata sphinx theme) (#48585) Backport PR #48285: WEB: Unpin pydata sphinx theme Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48572 on branch 1.5.x (DOC: Fixing styles for the dark theme) (#48584) Backport PR #48572: DOC: Fixing styles for the dark theme Co-authored-by: Marc Garcia * Backport PR #48397 on branch 1.5.x (WARN: Remove false positive warning for iloc inplaceness) (#48583) Backport PR #48397: WARN: Remove false positive warning for iloc inplaceness Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48587 on branch 1.5.x (Fix `series.str.startswith(tuple)`) (#48593) Backport PR #48587: Fix `series.str.startswith(tuple)` Co-authored-by: Janosh Riebesell * Backport PR #48601 on branch 1.5.x (CI: Fix matplolib release issues) (#48617) Backport PR #48601: CI: Fix matplolib release issues Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48623 on branch 1.5.x (REGR/DOC: Docs left navbar broke) (#48625) Backport PR #48623: REGR/DOC: Docs left navbar broke Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com> * Backport PR #48619 on branch 1.5.x (REGR: Loc.setitem with enlargement raises for nested data) (#48629) Backport PR #48619: REGR: Loc.setitem with enlargement raises for nested data Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48627 on branch 1.5.x (DOC: Last changes to release notes for 1.5.0 release) (#48630) Backport PR #48627: DOC: Last changes to release notes for 1.5.0 release Co-authored-by: Marc Garcia * RLS: 1.5.0 * Backport PR #48642 on branch 1.5.x (DOC: Add release notes for 1.5.1) (#48647) Backport PR #48642: DOC: Add release notes for 1.5.1 Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48639 on branch 1.5.x (CI: Fix directory name for published prod docs) (#48648) Backport PR #48639: CI: Fix directory name for published prod docs Co-authored-by: Marc Garcia * Backport PR #48651 on branch 1.5.x (REGR: TextIOWrapper raising an error in read_csv) (#48666) Backport PR #48651: REGR: TextIOWrapper raising an error in read_csv Co-authored-by: Torsten Wörtwein * Backport PR #48599 on branch 1.5.x (DOC: Add deprecation infos to deprecated functions) (#48690) Backport PR #48599: DOC: Add deprecation infos to deprecated functions Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48620 on branch 1.5.x (REGR: Performance decrease in factorize) (#48710) Backport PR #48620: REGR: Performance decrease in factorize Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com> * Backport PR #48711 on branch 1.5.x (REGR: Regression in DataFrame.loc when setting df with all True indexer) (#48717) Backport PR #48711: REGR: Regression in DataFrame.loc when setting df with all True indexer Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48696 on branch 1.5.x (REGR: to_hdf raising AssertionError with boolean index) (#48716) Backport PR #48696: REGR: to_hdf raising AssertionError with boolean index Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48734 on branch 1.5.x (REGR: Raise on invalid colormap for scatter plot) (#48744) Backport PR #48734: REGR: Raise on invalid colormap for scatter plot Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48713 on branch 1.5.x (BUG: pivot_table raising Future Warning with datetime column as index) (#48742) Backport PR #48713: BUG: pivot_table raising Future Warning with datetime column as index Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48703 on branch 1.5.x (Added theme_switcher ) (#48741) Backport PR #48703: Added theme_switcher Co-authored-by: Deepak Sirohiwal <38135521+deepaksirohiwal@users.noreply.github.com> * Backport PR #48697 on branch 1.5.x (REGR: None converted to NaN when enlarging Series) (#48745) Backport PR #48697: REGR: None converted to NaN when enlarging Series Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48702 on branch 1.5.x (REGR: dropna affects observed in groupby) (#48750) Backport PR #48702: REGR: dropna affects observed in groupby Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com> * Backport PR #48782 on branch 1.5.x (REGR: describe raising when result contains NA) (#48793) Backport PR #48782: REGR: describe raising when result contains NA Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48662 on branch 1.5.x (BUG: Series.getitem not falling back to positional for bool index) (#48799) Backport PR #48662: BUG: Series.getitem not falling back to positional for bool index Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48751 on branch 1.5.x (WEB: Update link to datapythonista blog url) (#48798) Backport PR #48751: WEB: Update link to datapythonista blog url Co-authored-by: Marc Garcia * Backport PR #48608 on branch 1.5.x (REGR: assert_index_equal raising with non matching pd.NA) (#48800) * Backport PR #48608: REGR: assert_index_equal raising with non matching pd.NA Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48785 on branch 1.5.x (BUG: still emitting unnecessary FutureWarning in DataFrame.sort_values with sparse columns) (#48807) Backport PR #48785: BUG: still emitting unnecessary FutureWarning in DataFrame.sort_values with sparse columns Co-authored-by: Marco Edward Gorelli <33491632+MarcoGorelli@users.noreply.github.com> * Backport PR #48693 on branch 1.5.x (ENH: Make deprecate_nonkeyword_arguments alter function signature) (#48795) Backport PR #48693: ENH: Make deprecate_nonkeyword_arguments alter function signature Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> * Backport PR #48579 on branch 1.5.x (BUG: Fix calling groupBy(...).apply(func) on an empty dataframe invokes func) (#48817) BUG: Fix calling groupBy(...).apply(func) on an empty dataframe invokes func (#48579) (cherry picked from commit 8b0ad717d1ec54dd40136817a326b41817ffcb86) Co-authored-by: Dennis Chukwunta * Backport PR #48760 on branch 1.5.x (REGR: groupby.size with axis=1 doesn't return a Series) (#48825) * Backport PR #48820 on branch 1.5.x (BUG: to_datetime(format='...%f') parses nanoseconds) (#48860) BUG: to_datetime(format='...%f') parses nanoseconds (#48820) Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Backport PR #48850 on branch 1.5.x (TYP: Fix typing errors caused by new numpy) (#48859) * Backport PR #48790 on branch 1.5.x (Created icons for dark theme) (#48875) Backport PR #48790: Created icons for dark theme Co-authored-by: Lorenzo Vainigli * Backport PR #48805 on branch 1.5.x (Added padding and fixed columns for sponsor logos in mobile view) (#48874) Backport PR #48805: Added padding and fixed columns for sponsor logos in mobile view Co-authored-by: Amay Patel <92037532+amay-patel@users.noreply.github.com> * Backport PR #48866 on branch 1.5.x (REGR: replace replacing wrong values with inplace and datetime) (#48872) Backport PR #48866: REGR: replace replacing wrong values with inplace and datetime Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48853 on branch 1.5.x (REGR: Avoid unnecessary warning when setting empty dataframe) (#48873) Backport PR #48853: REGR: Avoid unnecessary warning when setting empty dataframe Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Backport PR #48833 on branch 1.5.x (BUG: ArrowExtensionArray compared to invalid object not raising) (#48878) Backport PR #48833: BUG: ArrowExtensionArray compared to invalid object not raising Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Backport PR #48686 on branch 1.5.x (BUG: to_datetime(tz_mix, utc=True) converts to UTC) (#48882) Backport PR #48686: BUG: to_datetime(tz_mix, utc=True) converts to UTC Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * Backport PR #48736 on branch 1.5.x ( BUG: AttributeError: 'function' object has no attribute 'currentframe') (#48887) * Backport PR #48797 on branch 1.5.x (REGR: fix df.apply with keyword non-zero axis) (#48886) REGR: fix df.apply with keyword non-zero axis (#48797) Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> * add pandas[foo] throughout to explain the optional extras process Also `Recommended dependencies` to `Performance dependencies (recommended)` * extend optional_extra`all` to cover tests and redundant packages * add tzdata from pandas 1.5.0 * move summary to latest whatsnew doc Code is now stable and updated. Attempt a port from doc/source/whatsnew/v1.5.0.rst over to latest document doc/source/whatsnew/v1.5.1.rst. * explicitly pair packages to optional_extra in installs * fix sphinx errors in install.rst extra padding on column + empty return before table * add: pytest-asyncio>=0.19.0 pytest-asyncio>=0.19.0 Closes #48361. No version guidance available so went for latest. No-one should be messing around with tests without modern installs. * fsspec note * repin pytest-asyncio>=0.17.0 matches pytest-asyncio>=0.17.0 in ci/deps/actions-38-minimum_versions.yaml * move summary from whatsnew/v1.5.1 to v1.6.0 plus add (:issue:`48361`) resolution note * linting double-backtick ``test`` underline tilde match title length * Add `clipboard` as optional extra * Review comments - fss optional extra - better numba description * remove unneeded comment * fix: leave 1.5.1.rst unchanged by this PR * Update doc/source/whatsnew/v1.5.1.rst Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * pyarrow bump 1.0.1 -> 6.0.0 Co-authored-by: Matthew Roeschke Co-authored-by: Simon Hawkins Co-authored-by: MeeseeksMachine <39504233+meeseeksmachine@users.noreply.github.com> Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> Co-authored-by: Joris Van den Bossche Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com> Co-authored-by: jbrockmendel Co-authored-by: Marc Garcia Co-authored-by: Kevin Sheppard Co-authored-by: RaphSku <45042665+RaphSku@users.noreply.github.com> Co-authored-by: Jonas Haag Co-authored-by: Janosh Riebesell Co-authored-by: Pandas Development Team Co-authored-by: Torsten Wörtwein Co-authored-by: Deepak Sirohiwal <38135521+deepaksirohiwal@users.noreply.github.com> Co-authored-by: Marco Edward Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Dennis Chukwunta Co-authored-by: Lorenzo Vainigli Co-authored-by: Amay Patel <92037532+amay-patel@users.noreply.github.com> --- doc/source/getting_started/install.rst | 223 +++++++++++++++---------- doc/source/whatsnew/v2.0.0.rst | 16 +- pandas/compat/_optional.py | 2 +- setup.cfg | 107 ++++++++++++ 4 files changed, 254 insertions(+), 94 deletions(-) diff --git a/doc/source/getting_started/install.rst b/doc/source/getting_started/install.rst index 54da61a5c074a9..5f258973b3db92 100644 --- a/doc/source/getting_started/install.rst +++ b/doc/source/getting_started/install.rst @@ -242,8 +242,11 @@ Package Minimum support .. _install.recommended_dependencies: -Recommended dependencies -~~~~~~~~~~~~~~~~~~~~~~~~ +Performance dependencies (recommended) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +pandas recommends the following optional dependencies for performance gains. These dependencies can be specifically +installed with ``pandas[performance]`` (i.e. add as optional_extra to the pandas requirement) * `numexpr `__: for accelerating certain numerical operations. ``numexpr`` uses multiple cores as well as smart chunking and caching to achieve large speedups. @@ -253,6 +256,10 @@ Recommended dependencies evaluations. ``bottleneck`` uses specialized cython routines to achieve large speedups. If installed, must be Version 1.3.2 or higher. +* `numba `__: alternative execution engine for operations that accept `engine="numba" + argument (eg. apply). ``numba`` is a JIT compiler that translates Python functions to optimized machine code using + the LLVM compiler library. If installed, must be Version 0.53.1 or higher. + .. note:: You are highly encouraged to install these libraries, as they provide speed improvements, especially @@ -270,69 +277,83 @@ For example, :func:`pandas.read_hdf` requires the ``pytables`` package, while optional dependency is not installed, pandas will raise an ``ImportError`` when the method requiring that dependency is called. +Optional pandas dependencies can be managed as optional extras (e.g.,``pandas[performance, aws]>=1.5.0``) +in a requirements.txt, setup, or pyproject.toml file. +Available optional dependencies are ``[all, performance, computation, aws, +gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql, sql-other, html, xml, +plot, output_formatting, compression, test]`` + Timezones ^^^^^^^^^ -========================= ========================= ============================================================= -Dependency Minimum Version Notes -========================= ========================= ============================================================= -tzdata 2022.1(pypi)/ Allows the use of ``zoneinfo`` timezones with pandas. - 2022a(for system tzdata) **Note**: You only need to install the pypi package if your - system does not already provide the IANA tz database. - However, the minimum tzdata version still applies, even if it - is not enforced through an error. +Can be managed as optional_extra with ``pandas[timezone]``. + +========================= ========================= =============== ============================================================= +Dependency Minimum Version optional_extra Notes +========================= ========================= =============== ============================================================= +tzdata 2022.1(pypi)/ timezone Allows the use of ``zoneinfo`` timezones with pandas. + 2022a(for system tzdata) **Note**: You only need to install the pypi package if your + system does not already provide the IANA tz database. + However, the minimum tzdata version still applies, even if it + is not enforced through an error. - If you would like to keep your system tzdata version updated, - it is recommended to use the ``tzdata`` package from - conda-forge. -========================= ========================= ============================================================= + If you would like to keep your system tzdata version updated, + it is recommended to use the ``tzdata`` package from + conda-forge. +========================= ========================= =============== ============================================================= Visualization ^^^^^^^^^^^^^ -========================= ================== ============================================================= -Dependency Minimum Version Notes -========================= ================== ============================================================= -matplotlib 3.3.2 Plotting library -Jinja2 3.0.0 Conditional formatting with DataFrame.style -tabulate 0.8.9 Printing in Markdown-friendly format (see `tabulate`_) -========================= ================== ============================================================= +Can be managed as optional_extra with ``pandas[plot, output_formatting]``, depending on the required functionality. + +========================= ================== ================== ============================================================= +Dependency Minimum Version optional_extra Notes +========================= ================== ================== ============================================================= +matplotlib 3.3.2 plot Plotting library +Jinja2 3.0.0 output_formatting Conditional formatting with DataFrame.style +tabulate 0.8.9 output_formatting Printing in Markdown-friendly format (see `tabulate`_) +========================= ================== ================== ============================================================= Computation ^^^^^^^^^^^ -========================= ================== ============================================================= -Dependency Minimum Version Notes -========================= ================== ============================================================= -SciPy 1.7.1 Miscellaneous statistical functions -numba 0.53.1 Alternative execution engine for rolling operations - (see :ref:`Enhancing Performance `) -xarray 0.19.0 pandas-like API for N-dimensional data -========================= ================== ============================================================= +Can be managed as optional_extra with ``pandas[computation]``. + +========================= ================== =============== ============================================================= +Dependency Minimum Version optional_extra Notes +========================= ================== =============== ============================================================= +SciPy 1.7.1 computation Miscellaneous statistical functions +xarray 0.19.0 computation pandas-like API for N-dimensional data +========================= ================== =============== ============================================================= Excel files ^^^^^^^^^^^ -========================= ================== ============================================================= -Dependency Minimum Version Notes -========================= ================== ============================================================= -xlrd 2.0.1 Reading Excel -xlwt 1.3.0 Writing Excel -xlsxwriter 1.4.3 Writing Excel -openpyxl 3.0.7 Reading / writing for xlsx files -pyxlsb 1.0.8 Reading for xlsb files -========================= ================== ============================================================= +Can be managed as optional_extra with ``pandas[excel]``. + +========================= ================== =============== ============================================================= +Dependency Minimum Version optional_extra Notes +========================= ================== =============== ============================================================= +xlrd 2.0.1 excel Reading Excel +xlwt 1.3.0 excel Writing Excel +xlsxwriter 1.4.3 excel Writing Excel +openpyxl 3.0.7 excel Reading / writing for xlsx files +pyxlsb 1.0.8 excel Reading for xlsb files +========================= ================== =============== ============================================================= HTML ^^^^ -========================= ================== ============================================================= -Dependency Minimum Version Notes -========================= ================== ============================================================= -BeautifulSoup4 4.9.3 HTML parser for read_html -html5lib 1.1 HTML parser for read_html -lxml 4.6.3 HTML parser for read_html -========================= ================== ============================================================= +These dependencies can be specifically installed with ``pandas[html]``. + +========================= ================== =============== ============================================================= +Dependency Minimum Version optional_extra Notes +========================= ================== =============== ============================================================= +BeautifulSoup4 4.9.3 html HTML parser for read_html +html5lib 1.1 html HTML parser for read_html +lxml 4.6.3 html HTML parser for read_html +========================= ================== =============== ============================================================= One of the following combinations of libraries is needed to use the top-level :func:`~pandas.read_html` function: @@ -361,36 +382,47 @@ top-level :func:`~pandas.read_html` function: XML ^^^ -========================= ================== ============================================================= -Dependency Minimum Version Notes -========================= ================== ============================================================= -lxml 4.5.0 XML parser for read_xml and tree builder for to_xml -========================= ================== ============================================================= +Can be managed as optional_extra with ``pandas[xml]``. + +========================= ================== =============== ============================================================= +Dependency Minimum Version optional_extra Notes +========================= ================== =============== ============================================================= +lxml 4.6.3 xml XML parser for read_xml and tree builder for to_xml +========================= ================== =============== ============================================================= SQL databases ^^^^^^^^^^^^^ -========================= ================== ============================================================= -Dependency Minimum Version Notes -========================= ================== ============================================================= -SQLAlchemy 1.4.16 SQL support for databases other than sqlite -psycopg2 2.8.6 PostgreSQL engine for sqlalchemy -pymysql 1.0.2 MySQL engine for sqlalchemy -========================= ================== ============================================================= +Can be managed as optional_extra with ``pandas[postgresql, mysql, sql-other]``, +depending on required sql compatibility. + +========================= ================== =============== ============================================================= +Dependency Minimum Version optional_extra Notes +========================= ================== =============== ============================================================= +SQLAlchemy 1.4.16 postgresql, SQL support for databases other than sqlite + mysql, + sql-other +psycopg2 2.8.6 postgresql PostgreSQL engine for sqlalchemy +pymysql 1.0.2 mysql MySQL engine for sqlalchemy +========================= ================== =============== ============================================================= Other data sources ^^^^^^^^^^^^^^^^^^ -========================= ================== ============================================================= -Dependency Minimum Version Notes -========================= ================== ============================================================= -PyTables 3.6.1 HDF5-based reading / writing -blosc 1.21.0 Compression for HDF5 -zlib Compression for HDF5 -fastparquet 0.4.0 Parquet reading / writing -pyarrow 6.0.0 Parquet, ORC, and feather reading / writing -pyreadstat 1.1.2 SPSS files (.sav) reading -========================= ================== ============================================================= +Can be managed as optional_extra with ``pandas[hdf5, parquet, feather, spss, excel]``, +depending on required compatibility. + +========================= ================== ================ ============================================================= +Dependency Minimum Version optional_extra Notes +========================= ================== ================ ============================================================= +PyTables 3.6.1 hdf5 HDF5-based reading / writing +blosc 1.21.0 hdf5 Compression for HDF5 +zlib hdf5 Compression for HDF5 +fastparquet 0.4.0 - Parquet reading / writing (pyarrow is default) +pyarrow 6.0.0 parquet, feather Parquet, ORC, and feather reading / writing +pyreadstat 1.1.2 spss SPSS files (.sav) reading +odfpy 1.4.1 excel Open document format (.odf, .ods, .odt) reading / writing +========================= ================== ================ ============================================================= .. _install.warn_orc: @@ -410,35 +442,46 @@ pyreadstat 1.1.2 SPSS files (.sav) reading Access data in the cloud ^^^^^^^^^^^^^^^^^^^^^^^^ -========================= ================== ============================================================= -Dependency Minimum Version Notes -========================= ================== ============================================================= -fsspec 2021.7.0 Handling files aside from simple local and HTTP -gcsfs 2021.7.0 Google Cloud Storage access -pandas-gbq 0.15.0 Google Big Query access -s3fs 2021.08.0 Amazon S3 access -========================= ================== ============================================================= +Can be managed as optional_extra with ``pandas[fss, aws, gcp]``, depending on required compatibility. + +========================= ================== =============== ============================================================= +Dependency Minimum Version optional_extra Notes +========================= ================== =============== ============================================================= +fsspec 2021.7.0 fss, gcp, aws Handling files aside from simple local and HTTP (required + dependency of s3fs, gcsfs). +gcsfs 2021.7.0 gcp Google Cloud Storage access +pandas-gbq 0.15.0 gcp Google Big Query access +s3fs 2021.08.0 aws Amazon S3 access +========================= ================== =============== ============================================================= Clipboard ^^^^^^^^^ -========================= ================== ============================================================= -Dependency Minimum Version Notes -========================= ================== ============================================================= -PyQt4/PyQt5 Clipboard I/O -qtpy Clipboard I/O -xclip Clipboard I/O on linux -xsel Clipboard I/O on linux -========================= ================== ============================================================= +Can be managed as optional_extra with ``pandas[clipboard]``. However, depending on operating system, system-level +packages may need to installed. + +========================= ================== =============== ============================================================= +Dependency Minimum Version optional_extra Notes +========================= ================== =============== ============================================================= +PyQt4/PyQt5 5.15.1 Clipboard I/O +qtpy 2.2.0 Clipboard I/O +========================= ================== =============== ============================================================= + +.. note:: + + For clipboard to operate on Linux one of the CLI tools ``xclip`` or ``xsel`` must be installed on your system. Compression ^^^^^^^^^^^ -========================= ================== ============================================================= -Dependency Minimum Version Notes -========================= ================== ============================================================= -brotli 0.7.0 Brotli compression -python-snappy 0.6.0 Snappy compression -Zstandard 0.15.2 Zstandard compression -========================= ================== ============================================================= +Can be managed as optional_extra with ``pandas[compression]``. +If only one specific compression lib is required, please request it as an independent requirement. + +========================= ================== =============== ============================================================= +Dependency Minimum Version optional_extra Notes +========================= ================== =============== ============================================================= +brotli 0.7.0 compression Brotli compression +python-snappy 0.6.0 compression Snappy compression +Zstandard 0.15.2 compression Zstandard compression +========================= ================== =============== ============================================================= diff --git a/doc/source/whatsnew/v2.0.0.rst b/doc/source/whatsnew/v2.0.0.rst index bef5a4a6448bef..2fdfd8139bca6c 100644 --- a/doc/source/whatsnew/v2.0.0.rst +++ b/doc/source/whatsnew/v2.0.0.rst @@ -14,10 +14,19 @@ including other versions of pandas. Enhancements ~~~~~~~~~~~~ -.. _whatsnew_200.enhancements.enhancement1: +.. _whatsnew_200.enhancements.optional_dependency_management: -enhancement1 -^^^^^^^^^^^^ +Optional dependencies version management +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Optional pandas dependencies can be managed as extras in a requirements/setup file, for example: + +.. code-block:: python + + pandas[performance, aws]>=2.0.0 + +Available optional dependencies (listed in order of appearance at `install guide `_) are +``[all, performance, computation, timezone, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql, +sql-other, html, xml, plot, output_formatting, clipboard, compression, test]`` (:issue:`39164`). .. _whatsnew_200.enhancements.enhancement2: @@ -36,6 +45,7 @@ Other enhancements - Added ``index`` parameter to :meth:`DataFrame.to_dict` (:issue:`46398`) - Added metadata propagation for binary operators on :class:`DataFrame` (:issue:`28283`) - :class:`.CategoricalConversionWarning`, :class:`.InvalidComparison`, :class:`.InvalidVersion`, :class:`.LossySetitemError`, and :class:`.NoBufferPresent` are now exposed in ``pandas.errors`` (:issue:`27656`) +- Fix ``test`` optional_extra by adding missing test package ``pytest-asyncio`` (:issue:`48361`) - :func:`DataFrame.astype` exception message thrown improved to include column name when type conversion is not possible. (:issue:`47571`) - :meth:`DataFrame.to_json` now supports a ``mode`` keyword with supported inputs 'w' and 'a'. Defaulting to 'w', 'a' can be used when lines=True and orient='records' to append record oriented json lines to an existing json file. (:issue:`35849`) diff --git a/pandas/compat/_optional.py b/pandas/compat/_optional.py index b644339a79de9f..34e3234390ba5b 100644 --- a/pandas/compat/_optional.py +++ b/pandas/compat/_optional.py @@ -9,7 +9,7 @@ from pandas.util.version import Version -# Update install.rst when updating versions! +# Update install.rst & setup.cfg when updating versions! VERSIONS = { "bs4": "4.9.3", diff --git a/setup.cfg b/setup.cfg index 9c88731f74ac8d..eede4a66d598db 100644 --- a/setup.cfg +++ b/setup.cfg @@ -53,6 +53,113 @@ test = hypothesis>=5.5.3 pytest>=6.0 pytest-xdist>=1.31 + pytest-asyncio>=0.17.0 +# optional extras for recommended dependencies +# see: doc/source/getting_started/install.rst +performance = + bottleneck>=1.3.2 + numba>=0.53.0 + numexpr>=2.7.1 +timezone = + tzdata>=2022.1 +computation = + scipy>=1.7.1 + xarray>=0.19.0 +fss = + fsspec>=2021.7.0 +aws = + boto3>=1.22.7 + s3fs>=0.4.0 +gcp = + gcsfs>=2021.05.0 + pandas-gbq>=0.15.0 +excel = + odfpy>=1.4.1 + openpyxl>=3.0.7 + pyxlsb>=1.0.8 + xlrd>=2.0.1 + xlwt>=1.3.0 + xlsxwriter>=1.4.3 +parquet = + pyarrow>=6.0.0 +feather = + pyarrow>=6.0.0 +hdf5 = + blosc>=1.20.1 + tables>=3.6.1 +spss = + pyreadstat>=1.1.2 +postgresql = + SQLAlchemy>=1.4.16 + psycopg2>=2.8.6 +mysql = + SQLAlchemy>=1.4.16 + pymysql>=1.0.2 +sql-other = + SQLAlchemy>=1.4.16 +html = + beautifulsoup4>=4.9.3 + html5lib>=1.1 + lxml>=4.6.3 +xml = + lxml>=4.6.3 +plot = + matplotlib>=3.3.2 +output_formatting = + jinja2>=3.0.0 + tabulate>=0.8.9 +clipboard= + PyQt5>=5.15.1 + qtpy>=2.2.0 +compression = + brotlipy>=0.7.0 + python-snappy>=0.6.0 + zstandard>=0.15.2 +# `all` supersets all the above options. +# Also adds the following redundant, superseded packages that are listed as supported: +# fastparquet (by pyarrow https://github.com/pandas-dev/pandas/issues/39164) +# `all ` should be kept as the complete set of pandas optional dependencies for general use. +all = + beautifulsoup4>=4.9.3 + blosc>=1.21.0 + bottleneck>=1.3.1 + boto3>=1.22.7 + brotlipy>=0.7.0 + fastparquet>=0.4.0 + fsspec>=2021.7.0 + gcsfs>=2021.05.0 + html5lib>=1.1 + hypothesis>=5.5.3 + jinja2>=3.0.0 + lxml>=4.6.3 + matplotlib>=3.3.2 + numba>=0.53.0 + numexpr>=2.7.1 + odfpy>=1.4.1 + openpyxl>=3.0.7 + pandas-gbq>=0.15.0 + psycopg2>=2.8.6 + pyarrow>=6.0.0 + pymysql>=1.0.2 + PyQt5>=5.15.1 + pyreadstat>=1.1.2 + pytest>=6.0 + pytest-xdist>=1.31 + pytest-asyncio>=0.17.0 + python-snappy>=0.6.0 + pyxlsb>=1.0.8 + qtpy>=2.2.0 + scipy>=1.7.1 + s3fs>=0.4.0 + SQLAlchemy>=1.4.16 + tables>=3.6.1 + tabulate>=0.8.9 + tzdata>=2022.1 + xarray>=0.19.0 + xlrd>=2.0.1 + xlsxwriter>=1.4.3 + xlwt>=1.3.0 + zstandard>=0.15.2 [build_ext] inplace = True