Skip to content

Commit

Permalink
DOC: Add SeriesGroupBy and DataFrameGroupBy reference pages (#48500)
Browse files Browse the repository at this point in the history
* WIP

* DOC: Add SeriesGroupBy and DataFrameGroupBy reference pages

* whatnews, fix docstring of filter

* Reorder docstring sections
  • Loading branch information
rhshadrach committed Sep 15, 2022
1 parent bbf17ea commit ac648ee
Show file tree
Hide file tree
Showing 23 changed files with 224 additions and 185 deletions.
156 changes: 94 additions & 62 deletions doc/source/reference/groupby.rst
Expand Up @@ -14,10 +14,14 @@ Indexing, iteration
.. autosummary::
:toctree: api/

GroupBy.__iter__
GroupBy.groups
GroupBy.indices
GroupBy.get_group
DataFrameGroupBy.__iter__
SeriesGroupBy.__iter__
DataFrameGroupBy.groups
SeriesGroupBy.groups
DataFrameGroupBy.indices
SeriesGroupBy.indices
DataFrameGroupBy.get_group
SeriesGroupBy.get_group

.. currentmodule:: pandas

Expand All @@ -41,57 +45,21 @@ Function application
.. autosummary::
:toctree: api/

GroupBy.apply
GroupBy.agg
SeriesGroupBy.apply
DataFrameGroupBy.apply
SeriesGroupBy.agg
DataFrameGroupBy.agg
SeriesGroupBy.aggregate
DataFrameGroupBy.aggregate
SeriesGroupBy.transform
DataFrameGroupBy.transform
GroupBy.pipe

Computations / descriptive stats
--------------------------------
.. autosummary::
:toctree: api/

GroupBy.all
GroupBy.any
GroupBy.bfill
GroupBy.backfill
GroupBy.count
GroupBy.cumcount
GroupBy.cummax
GroupBy.cummin
GroupBy.cumprod
GroupBy.cumsum
GroupBy.ffill
GroupBy.first
GroupBy.head
GroupBy.last
GroupBy.max
GroupBy.mean
GroupBy.median
GroupBy.min
GroupBy.ngroup
GroupBy.nth
GroupBy.ohlc
GroupBy.pad
GroupBy.prod
GroupBy.rank
GroupBy.pct_change
GroupBy.size
GroupBy.sem
GroupBy.std
GroupBy.sum
GroupBy.var
GroupBy.tail

The following methods are available in both ``SeriesGroupBy`` and
``DataFrameGroupBy`` objects, but may differ slightly, usually in that
the ``DataFrameGroupBy`` version usually permits the specification of an
axis argument, and often an argument indicating whether to restrict
application to columns of a specific data type.
SeriesGroupBy.pipe
DataFrameGroupBy.pipe
DataFrameGroupBy.filter
SeriesGroupBy.filter

``DataFrameGroupBy`` computations / descriptive stats
-----------------------------------------------------
.. autosummary::
:toctree: api/

Expand All @@ -100,6 +68,7 @@ application to columns of a specific data type.
DataFrameGroupBy.backfill
DataFrameGroupBy.bfill
DataFrameGroupBy.corr
DataFrameGroupBy.corrwith
DataFrameGroupBy.count
DataFrameGroupBy.cov
DataFrameGroupBy.cumcount
Expand All @@ -111,42 +80,105 @@ application to columns of a specific data type.
DataFrameGroupBy.diff
DataFrameGroupBy.ffill
DataFrameGroupBy.fillna
DataFrameGroupBy.filter
DataFrameGroupBy.hist
DataFrameGroupBy.first
DataFrameGroupBy.head
DataFrameGroupBy.idxmax
DataFrameGroupBy.idxmin
DataFrameGroupBy.last
DataFrameGroupBy.mad
DataFrameGroupBy.max
DataFrameGroupBy.mean
DataFrameGroupBy.median
DataFrameGroupBy.min
DataFrameGroupBy.ngroup
DataFrameGroupBy.nth
DataFrameGroupBy.nunique
DataFrameGroupBy.ohlc
DataFrameGroupBy.pad
DataFrameGroupBy.pct_change
DataFrameGroupBy.plot
DataFrameGroupBy.prod
DataFrameGroupBy.quantile
DataFrameGroupBy.rank
DataFrameGroupBy.resample
DataFrameGroupBy.sample
DataFrameGroupBy.sem
DataFrameGroupBy.shift
DataFrameGroupBy.size
DataFrameGroupBy.skew
DataFrameGroupBy.std
DataFrameGroupBy.sum
DataFrameGroupBy.var
DataFrameGroupBy.tail
DataFrameGroupBy.take
DataFrameGroupBy.tshift
DataFrameGroupBy.value_counts

The following methods are available only for ``SeriesGroupBy`` objects.

``SeriesGroupBy`` computations / descriptive stats
--------------------------------------------------
.. autosummary::
:toctree: api/

SeriesGroupBy.hist
SeriesGroupBy.all
SeriesGroupBy.any
SeriesGroupBy.backfill
SeriesGroupBy.bfill
SeriesGroupBy.corr
SeriesGroupBy.count
SeriesGroupBy.cov
SeriesGroupBy.cumcount
SeriesGroupBy.cummax
SeriesGroupBy.cummin
SeriesGroupBy.cumprod
SeriesGroupBy.cumsum
SeriesGroupBy.describe
SeriesGroupBy.diff
SeriesGroupBy.ffill
SeriesGroupBy.fillna
SeriesGroupBy.first
SeriesGroupBy.head
SeriesGroupBy.last
SeriesGroupBy.idxmax
SeriesGroupBy.idxmin
SeriesGroupBy.is_monotonic_increasing
SeriesGroupBy.is_monotonic_decreasing
SeriesGroupBy.mad
SeriesGroupBy.max
SeriesGroupBy.mean
SeriesGroupBy.median
SeriesGroupBy.min
SeriesGroupBy.ngroup
SeriesGroupBy.nlargest
SeriesGroupBy.nsmallest
SeriesGroupBy.nth
SeriesGroupBy.nunique
SeriesGroupBy.unique
SeriesGroupBy.is_monotonic_increasing
SeriesGroupBy.is_monotonic_decreasing

The following methods are available only for ``DataFrameGroupBy`` objects.

SeriesGroupBy.ohlc
SeriesGroupBy.pad
SeriesGroupBy.pct_change
SeriesGroupBy.prod
SeriesGroupBy.quantile
SeriesGroupBy.rank
SeriesGroupBy.resample
SeriesGroupBy.sample
SeriesGroupBy.sem
SeriesGroupBy.shift
SeriesGroupBy.size
SeriesGroupBy.skew
SeriesGroupBy.std
SeriesGroupBy.sum
SeriesGroupBy.var
SeriesGroupBy.tail
SeriesGroupBy.take
SeriesGroupBy.tshift
SeriesGroupBy.value_counts

Plotting and visualization
--------------------------
.. autosummary::
:toctree: api/

DataFrameGroupBy.corrwith
DataFrameGroupBy.boxplot
DataFrameGroupBy.hist
SeriesGroupBy.hist
DataFrameGroupBy.plot
SeriesGroupBy.plot
4 changes: 2 additions & 2 deletions doc/source/user_guide/10min.rst
Expand Up @@ -528,15 +528,15 @@ See the :ref:`Grouping section <groupby>`.
)
df
Grouping and then applying the :meth:`~pandas.core.groupby.GroupBy.sum` function to the resulting
Grouping and then applying the :meth:`~pandas.core.groupby.DataFrameGroupBy.sum` function to the resulting
groups:

.. ipython:: python
df.groupby("A")[["C", "D"]].sum()
Grouping by multiple columns forms a hierarchical index, and again we can
apply the :meth:`~pandas.core.groupby.GroupBy.sum` function:
apply the :meth:`~pandas.core.groupby.DataFrameGroupBy.sum` function:

.. ipython:: python
Expand Down
2 changes: 1 addition & 1 deletion doc/source/user_guide/groupby.rst
Expand Up @@ -632,7 +632,7 @@ Named aggregation
.. versionadded:: 0.25.0

To support column-specific aggregation *with control over the output column names*, pandas
accepts the special syntax in :meth:`GroupBy.agg`, known as "named aggregation", where
accepts the special syntax in :meth:`DataFrameGroupBy.agg` and :meth:`SeriesGroupBy.agg`, known as "named aggregation", where

- The keywords are the *output* column names
- The values are tuples whose first element is the column to select
Expand Down
6 changes: 3 additions & 3 deletions doc/source/whatsnew/v1.0.0.rst
Expand Up @@ -774,7 +774,7 @@ source, you should no longer need to install Cython into your build environment
Other API changes
^^^^^^^^^^^^^^^^^

- :class:`core.groupby.GroupBy.transform` now raises on invalid operation names (:issue:`27489`)
- :meth:`.DataFrameGroupBy.transform` and :meth:`.SeriesGroupBy.transform` now raises on invalid operation names (:issue:`27489`)
- :meth:`pandas.api.types.infer_dtype` will now return "integer-na" for integer and ``np.nan`` mix (:issue:`27283`)
- :meth:`MultiIndex.from_arrays` will no longer infer names from arrays if ``names=None`` is explicitly provided (:issue:`27292`)
- In order to improve tab-completion, pandas does not include most deprecated attributes when introspecting a pandas object using ``dir`` (e.g. ``dir(df)``).
Expand Down Expand Up @@ -1232,8 +1232,8 @@ GroupBy/resample/rolling
- Bug in :meth:`core.groupby.DataFrameGroupBy.agg` with timezone-aware datetime64 column incorrectly casting results to the original dtype (:issue:`29641`)
- Bug in :meth:`DataFrame.groupby` when using axis=1 and having a single level columns index (:issue:`30208`)
- Bug in :meth:`DataFrame.groupby` when using nunique on axis=1 (:issue:`30253`)
- Bug in :meth:`GroupBy.quantile` with multiple list-like q value and integer column names (:issue:`30289`)
- Bug in :meth:`GroupBy.pct_change` and :meth:`core.groupby.SeriesGroupBy.pct_change` causes ``TypeError`` when ``fill_method`` is ``None`` (:issue:`30463`)
- Bug in :meth:`.DataFrameGroupBy.quantile` and :meth:`.SeriesGroupBy.quantile` with multiple list-like q value and integer column names (:issue:`30289`)
- Bug in :meth:`.DataFrameGroupBy.pct_change` and :meth:`.SeriesGroupBy.pct_change` causes ``TypeError`` when ``fill_method`` is ``None`` (:issue:`30463`)
- Bug in :meth:`Rolling.count` and :meth:`Expanding.count` argument where ``min_periods`` was ignored (:issue:`26996`)

Reshaping
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.0.1.rst
Expand Up @@ -21,7 +21,7 @@ Fixed regressions
- Fixed regression in :class:`Series` multiplication when multiplying a numeric :class:`Series` with >10000 elements with a timedelta-like scalar (:issue:`31457`)
- Fixed regression in ``.groupby().agg()`` raising an ``AssertionError`` for some reductions like ``min`` on object-dtype columns (:issue:`31522`)
- Fixed regression in ``.groupby()`` aggregations with categorical dtype using Cythonized reduction functions (e.g. ``first``) (:issue:`31450`)
- Fixed regression in :meth:`GroupBy.apply` if called with a function which returned a non-pandas non-scalar object (e.g. a list or numpy array) (:issue:`31441`)
- Fixed regression in :meth:`.DataFrameGroupBy.apply` and :meth:`.SeriesGroupBy.apply` if called with a function which returned a non-pandas non-scalar object (e.g. a list or numpy array) (:issue:`31441`)
- Fixed regression in :meth:`DataFrame.groupby` whereby taking the minimum or maximum of a column with period dtype would raise a ``TypeError``. (:issue:`31471`)
- Fixed regression in :meth:`DataFrame.groupby` with an empty DataFrame grouping by a level of a MultiIndex (:issue:`31670`).
- Fixed regression in :meth:`DataFrame.apply` with object dtype and non-reducing function (:issue:`31505`)
Expand Down
6 changes: 3 additions & 3 deletions doc/source/whatsnew/v1.0.2.rst
Expand Up @@ -17,12 +17,12 @@ Fixed regressions

**Groupby**

- Fixed regression in :meth:`groupby(..).agg() <pandas.core.groupby.GroupBy.agg>` which was failing on frames with :class:`MultiIndex` columns and a custom function (:issue:`31777`)
- Fixed regression in :meth:`.DataFrameGroupBy.agg` and :meth:`.SeriesGroupBy.agg` which were failing on frames with :class:`MultiIndex` columns and a custom function (:issue:`31777`)
- Fixed regression in ``groupby(..).rolling(..).apply()`` (``RollingGroupby``) where the ``raw`` parameter was ignored (:issue:`31754`)
- Fixed regression in :meth:`rolling(..).corr() <pandas.core.window.rolling.Rolling.corr>` when using a time offset (:issue:`31789`)
- Fixed regression in :meth:`groupby(..).nunique() <pandas.core.groupby.DataFrameGroupBy.nunique>` which was modifying the original values if ``NaN`` values were present (:issue:`31950`)
- Fixed regression in ``DataFrame.groupby`` raising a ``ValueError`` from an internal operation (:issue:`31802`)
- Fixed regression in :meth:`groupby(..).agg() <pandas.core.groupby.GroupBy.agg>` calling a user-provided function an extra time on an empty input (:issue:`31760`)
- Fixed regression in :meth:`.DataFrameGroupBy.agg` and :meth:`.SeriesGroupBy.agg` calling a user-provided function an extra time on an empty input (:issue:`31760`)

**I/O**

Expand Down Expand Up @@ -104,7 +104,7 @@ Bug fixes
- Fixed bug in :meth:`DataFrame.convert_dtypes` for series with mix of integers and strings (:issue:`32117`)
- Fixed bug in :meth:`DataFrame.convert_dtypes` where ``BooleanDtype`` columns were converted to ``Int64`` (:issue:`32287`)
- Fixed bug in setting values using a slice indexer with string dtype (:issue:`31772`)
- Fixed bug where :meth:`pandas.core.groupby.GroupBy.first` and :meth:`pandas.core.groupby.GroupBy.last` would raise a ``TypeError`` when groups contained ``pd.NA`` in a column of object dtype (:issue:`32123`)
- Fixed bug where :meth:`.DataFrameGroupBy.first`, :meth:`.SeriesGroupBy.first`, :meth:`.DataFrameGroupBy.last`, and :meth:`.SeriesGroupBy.last` would raise a ``TypeError`` when groups contained ``pd.NA`` in a column of object dtype (:issue:`32123`)
- Fixed bug where :meth:`DataFrameGroupBy.mean`, :meth:`DataFrameGroupBy.median`, :meth:`DataFrameGroupBy.var`, and :meth:`DataFrameGroupBy.std` would raise a ``TypeError`` on ``Int64`` dtype columns (:issue:`32219`)

**Strings**
Expand Down
6 changes: 3 additions & 3 deletions doc/source/whatsnew/v1.0.4.rst
Expand Up @@ -16,7 +16,7 @@ including other versions of pandas.
Fixed regressions
~~~~~~~~~~~~~~~~~
- Fix regression where :meth:`Series.isna` and :meth:`DataFrame.isna` would raise for categorical dtype when ``pandas.options.mode.use_inf_as_na`` was set to ``True`` (:issue:`33594`)
- Fix regression in :meth:`GroupBy.first` and :meth:`GroupBy.last` where None is not preserved in object dtype (:issue:`32800`)
- Fix regression in :meth:`.DataFrameGroupBy.first`, :meth:`.SeriesGroupBy.first`, :meth:`.DataFrameGroupBy.last`, and :meth:`.SeriesGroupBy.last` where None is not preserved in object dtype (:issue:`32800`)
- Fix regression in DataFrame reductions using ``numeric_only=True`` and ExtensionArrays (:issue:`33256`).
- Fix performance regression in ``memory_usage(deep=True)`` for object dtype (:issue:`33012`)
- Fix regression where :meth:`Categorical.replace` would replace with ``NaN`` whenever the new value and replacement value were equal (:issue:`33288`)
Expand All @@ -26,7 +26,7 @@ Fixed regressions
- Fix regression in :meth:`DataFrame.describe` raising ``TypeError: unhashable type: 'dict'`` (:issue:`32409`)
- Fix regression in :meth:`DataFrame.replace` casts columns to ``object`` dtype if items in ``to_replace`` not in values (:issue:`32988`)
- Fix regression in :meth:`Series.groupby` would raise ``ValueError`` when grouping by :class:`PeriodIndex` level (:issue:`34010`)
- Fix regression in :meth:`GroupBy.rolling.apply` ignores args and kwargs parameters (:issue:`33433`)
- Fix regression in :meth:`DataFrameGroupBy.rolling.apply` and :meth:`SeriesGroupBy.rolling.apply` ignoring args and kwargs parameters (:issue:`33433`)
- Fix regression in error message with ``np.min`` or ``np.max`` on unordered :class:`Categorical` (:issue:`33115`)
- Fix regression in :meth:`DataFrame.loc` and :meth:`Series.loc` throwing an error when a ``datetime64[ns, tz]`` value is provided (:issue:`32395`)

Expand All @@ -40,7 +40,7 @@ Bug fixes
- Bug in :meth:`~DataFrame.to_csv` was silently failing when writing to an invalid s3 bucket. (:issue:`32486`)
- Bug in :meth:`read_parquet` was raising a ``FileNotFoundError`` when passed an s3 directory path. (:issue:`26388`)
- Bug in :meth:`~DataFrame.to_parquet` was throwing an ``AttributeError`` when writing a partitioned parquet file to s3 (:issue:`27596`)
- Bug in :meth:`GroupBy.quantile` causes the quantiles to be shifted when the ``by`` axis contains ``NaN`` (:issue:`33200`, :issue:`33569`)
- Bug in :meth:`.DataFrameGroupBy.quantile` and :meth:`.SeriesGroupBy.quantile` causes the quantiles to be shifted when the ``by`` axis contains ``NaN`` (:issue:`33200`, :issue:`33569`)

Contributors
~~~~~~~~~~~~
Expand Down
8 changes: 4 additions & 4 deletions doc/source/whatsnew/v1.1.0.rst
Expand Up @@ -1126,16 +1126,16 @@ GroupBy/resample/rolling

- Using a :class:`pandas.api.indexers.BaseIndexer` with ``count``, ``min``, ``max``, ``median``, ``skew``, ``cov``, ``corr`` will now return correct results for any monotonic :class:`pandas.api.indexers.BaseIndexer` descendant (:issue:`32865`)
- :meth:`DataFrameGroupby.mean` and :meth:`SeriesGroupby.mean` (and similarly for :meth:`~DataFrameGroupby.median`, :meth:`~DataFrameGroupby.std` and :meth:`~DataFrameGroupby.var`) now raise a ``TypeError`` if a non-accepted keyword argument is passed into it. Previously an ``UnsupportedFunctionCall`` was raised (``AssertionError`` if ``min_count`` passed into :meth:`~DataFrameGroupby.median`) (:issue:`31485`)
- Bug in :meth:`GroupBy.apply` raises ``ValueError`` when the ``by`` axis is not sorted, has duplicates, and the applied ``func`` does not mutate passed in objects (:issue:`30667`)
- Bug in :meth:`.DataFrameGroupBy.apply` and :meth:`.SeriesGroupBy.apply` raising ``ValueError`` when the ``by`` axis is not sorted, has duplicates, and the applied ``func`` does not mutate passed in objects (:issue:`30667`)
- Bug in :meth:`DataFrameGroupBy.transform` produces an incorrect result with transformation functions (:issue:`30918`)
- Bug in :meth:`Groupby.transform` was returning the wrong result when grouping by multiple keys of which some were categorical and others not (:issue:`32494`)
- Bug in :meth:`GroupBy.count` causes segmentation fault when grouped-by columns contain NaNs (:issue:`32841`)
- Bug in :meth:`.DataFrameGroupBy.transform` and :meth:`.SeriesGroupBy.transform` were returning the wrong result when grouping by multiple keys of which some were categorical and others not (:issue:`32494`)
- Bug in :meth:`.DataFrameGroupBy.count` and :meth:`.SeriesGroupBy.count` causing segmentation fault when grouped-by columns contain NaNs (:issue:`32841`)
- Bug in :meth:`DataFrame.groupby` and :meth:`Series.groupby` produces inconsistent type when aggregating Boolean :class:`Series` (:issue:`32894`)
- Bug in :meth:`DataFrameGroupBy.sum` and :meth:`SeriesGroupBy.sum` where a large negative number would be returned when the number of non-null values was below ``min_count`` for nullable integer dtypes (:issue:`32861`)
- Bug in :meth:`SeriesGroupBy.quantile` was raising on nullable integers (:issue:`33136`)
- Bug in :meth:`DataFrame.resample` where an ``AmbiguousTimeError`` would be raised when the resulting timezone aware :class:`DatetimeIndex` had a DST transition at midnight (:issue:`25758`)
- Bug in :meth:`DataFrame.groupby` where a ``ValueError`` would be raised when grouping by a categorical column with read-only categories and ``sort=False`` (:issue:`33410`)
- Bug in :meth:`GroupBy.agg`, :meth:`GroupBy.transform`, and :meth:`GroupBy.resample` where subclasses are not preserved (:issue:`28330`)
- Bug in :meth:`.DataFrameGroupBy.agg`, :meth:`.SeriesGroupBy.agg`, :meth:`.DataFrameGroupBy.transform`, :meth:`.SeriesGroupBy.transform`, :meth:`.DataFrameGroupBy.resample`, and :meth:`.SeriesGroupBy.resample` where subclasses are not preserved (:issue:`28330`)
- Bug in :meth:`SeriesGroupBy.agg` where any column name was accepted in the named aggregation of :class:`SeriesGroupBy` previously. The behaviour now allows only ``str`` and callables else would raise ``TypeError``. (:issue:`34422`)
- Bug in :meth:`DataFrame.groupby` lost the name of the :class:`Index` when one of the ``agg`` keys referenced an empty list (:issue:`32580`)
- Bug in :meth:`Rolling.apply` where ``center=True`` was ignored when ``engine='numba'`` was specified (:issue:`34784`)
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.1.4.rst
Expand Up @@ -41,7 +41,7 @@ Bug fixes
~~~~~~~~~
- Bug causing ``groupby(...).sum()`` and similar to not preserve metadata (:issue:`29442`)
- Bug in :meth:`Series.isin` and :meth:`DataFrame.isin` raising a ``ValueError`` when the target was read-only (:issue:`37174`)
- Bug in :meth:`GroupBy.fillna` that introduced a performance regression after 1.0.5 (:issue:`36757`)
- Bug in :meth:`.DataFrameGroupBy.fillna` and :meth:`.SeriesGroupBy.fillna` that introduced a performance regression after 1.0.5 (:issue:`36757`)
- Bug in :meth:`DataFrame.info` was raising a ``KeyError`` when the DataFrame has integer column names (:issue:`37245`)
- Bug in :meth:`DataFrameGroupby.apply` would drop a :class:`CategoricalIndex` when grouped on (:issue:`35792`)

Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.1.5.rst
Expand Up @@ -28,7 +28,7 @@ Fixed regressions
- Fixed regression in :meth:`DataFrame.fillna` not filling ``NaN`` after other operations such as :meth:`DataFrame.pivot` (:issue:`36495`).
- Fixed performance regression in ``df.groupby(..).rolling(..)`` (:issue:`38038`)
- Fixed regression in :meth:`MultiIndex.intersection` returning duplicates when at least one of the indexes had duplicates (:issue:`36915`)
- Fixed regression in :meth:`.GroupBy.first` and :meth:`.GroupBy.last` where ``None`` was considered a non-NA value (:issue:`38286`)
- Fixed regression in :meth:`.DataFrameGroupBy.first`, :meth:`.SeriesGroupBy.first`, :meth:`.DataFrameGroupBy.last`, and :meth:`.SeriesGroupBy.last` where ``None`` was considered a non-NA value (:issue:`38286`)

.. ---------------------------------------------------------------------------
Expand Down

0 comments on commit ac648ee

Please sign in to comment.