These are the changes in pandas 1.5.1. See release
for a full changelog including other versions of pandas.
{{ header }}
In versions of pandas prior to 1.5, groupby
with dropna=False
would still drop NA values when the grouper was a categorical dtype. A fix for this was attempted in 1.5, however it introduced a regression where passing observed=False
and dropna=False
to groupby
would result in only observed categories. It was found that the patch fixing the dropna=False
bug is incompatible with observed=False
, and decided that the best resolution is to restore the correct observed=False
behavior at the cost of reintroducing the dropna=False
bug.
python
- df = pd.DataFrame(
- {
"x": pd.Categorical([1, None], categories=[1, 2, 3]), "y": [3, 4],
}
) df
1.5.0 behavior:
In [3]: # Correct behavior, NA values are not dropped
df.groupby("x", observed=True, dropna=False).sum()
Out[3]:
y
x
1 3
NaN 4
In [4]: # Incorrect behavior, only observed categories present
df.groupby("x", observed=False, dropna=False).sum()
Out[4]:
y
x
1 3
NaN 4
1.5.1 behavior:
python
# Incorrect behavior, NA values are dropped df.groupby("x", observed=True, dropna=False).sum()
# Correct behavior, unobserved categories present (NA values still dropped) df.groupby("x", observed=False, dropna=False).sum()
- Fixed Regression in
Series.__setitem__
castingNone
toNaN
for object dtype (48665
) - Fixed Regression in
DataFrame.loc
when setting values as aDataFrame
with allTrue
indexer (48701
) - Regression in
.read_csv
causing anEmptyDataError
when using an UTF-8 file handle that was already read from (48646
) - Regression in
to_datetime
whenutc=True
andarg
contained timezone naive and aware arguments raised aValueError
(48678
) - Fixed regression in
DataFrame.loc
raisingFutureWarning
when setting an emptyDataFrame
(48480
) - Fixed regression in
DataFrame.describe
raisingTypeError
when result containsNA
(48778
) - Fixed regression in
DataFrame.plot
ignoring invalidcolormap
forkind="scatter"
(48726
) - Fixed regression in
MultiIndex.values
resettingfreq
attribute of underlyingIndex
object (49054
) - Fixed performance regression in
factorize
whenna_sentinel
is notNone
andsort=False
(48620
) - Fixed regression causing an
AttributeError
during warning emitted if the provided table name inDataFrame.to_sql
and the table name actually used in the database do not match (48733
) - Fixed regression in
to_datetime
whenarg
was a date string with nanosecond andformat
contained%f
would raise aValueError
(48767
) - Fixed regression in
testing.assert_frame_equal
raising forMultiIndex
withCategorical
andcheck_like=True
(48975
) - Fixed regression in
DataFrame.fillna
replacing wrong values fordatetime64[ns]
dtype andinplace=True
(48863
) - Fixed
.DataFrameGroupBy.size
not returning a Series whenaxis=1
(48738
) - Fixed Regression in
.DataFrameGroupBy.apply
when user defined function is called on an empty dataframe (47985
) - Fixed regression in
DataFrame.apply
when passing non-zeroaxis
via keyword argument (48656
) - Fixed regression in
Series.groupby
andDataFrame.groupby
when the grouper is a nullable data type (e.g.Int64
) or a PyArrow-backed string array, contains null values, anddropna=False
(48794
) - Fixed performance regression in
Series.isin
with mismatching dtypes (49162
) - Fixed regression in
DataFrame.to_parquet
raising when file name was specified asbytes
(48944
) - Fixed regression in
ExcelWriter
where thebook
attribute could no longer be set; however setting this attribute is now deprecated and this ability will be removed in a future version of pandas (48780
) - Fixed regression in
DataFrame.corrwith
when computing correlation on tied data withmethod="spearman"
(48826
)
- Bug in
Series.__getitem__
not falling back to positional for integer keys and booleanIndex
(48653
) - Bug in
DataFrame.to_hdf
raisingAssertionError
with boolean index (48667
) - Bug in
testing.assert_index_equal
for extension arrays with non matchingNA
raisingValueError
(48608
) - Bug in
DataFrame.pivot_table
raising unexpectedFutureWarning
when setting datetime column as index (48683
) - Bug in
DataFrame.sort_values
emitting unnecessaryFutureWarning
when called onDataFrame
with boolean sparse columns (48784
) - Bug in
.arrays.ArrowExtensionArray
with a comparison operator to an invalid object would not raise aNotImplementedError
(48833
)
- Avoid showing deprecated signatures when introspecting functions with warnings about arguments becoming keyword-only (
48692
)
v1.5.0..v1.5.1