Skip to content

Latest commit

 

History

History
122 lines (91 loc) · 6.11 KB

v1.5.1.rst

File metadata and controls

122 lines (91 loc) · 6.11 KB

What's new in 1.5.1 (October 19, 2022)

These are the changes in pandas 1.5.1. See release for a full changelog including other versions of pandas.

{{ header }}

Behavior of groupby with categorical groupers (48645)

In versions of pandas prior to 1.5, groupby with dropna=False would still drop NA values when the grouper was a categorical dtype. A fix for this was attempted in 1.5, however it introduced a regression where passing observed=False and dropna=False to groupby would result in only observed categories. It was found that the patch fixing the dropna=False bug is incompatible with observed=False, and decided that the best resolution is to restore the correct observed=False behavior at the cost of reintroducing the dropna=False bug.

python

df = pd.DataFrame(
{

"x": pd.Categorical([1, None], categories=[1, 2, 3]), "y": [3, 4],

}

) df

1.5.0 behavior:

In [3]: # Correct behavior, NA values are not dropped
        df.groupby("x", observed=True, dropna=False).sum()
Out[3]:
     y
x
1    3
NaN  4


In [4]: # Incorrect behavior, only observed categories present
        df.groupby("x", observed=False, dropna=False).sum()
Out[4]:
     y
x
1    3
NaN  4

1.5.1 behavior:

python

# Incorrect behavior, NA values are dropped df.groupby("x", observed=True, dropna=False).sum()

# Correct behavior, unobserved categories present (NA values still dropped) df.groupby("x", observed=False, dropna=False).sum()

Fixed regressions

  • Fixed Regression in Series.__setitem__ casting None to NaN for object dtype (48665)
  • Fixed Regression in DataFrame.loc when setting values as a DataFrame with all True indexer (48701)
  • Regression in .read_csv causing an EmptyDataError when using an UTF-8 file handle that was already read from (48646)
  • Regression in to_datetime when utc=True and arg contained timezone naive and aware arguments raised a ValueError (48678)
  • Fixed regression in DataFrame.loc raising FutureWarning when setting an empty DataFrame (48480)
  • Fixed regression in DataFrame.describe raising TypeError when result contains NA (48778)
  • Fixed regression in DataFrame.plot ignoring invalid colormap for kind="scatter" (48726)
  • Fixed regression in MultiIndex.values resetting freq attribute of underlying Index object (49054)
  • Fixed performance regression in factorize when na_sentinel is not None and sort=False (48620)
  • Fixed regression causing an AttributeError during warning emitted if the provided table name in DataFrame.to_sql and the table name actually used in the database do not match (48733)
  • Fixed regression in to_datetime when arg was a date string with nanosecond and format contained %f would raise a ValueError (48767)
  • Fixed regression in testing.assert_frame_equal raising for MultiIndex with Categorical and check_like=True (48975)
  • Fixed regression in DataFrame.fillna replacing wrong values for datetime64[ns] dtype and inplace=True (48863)
  • Fixed .DataFrameGroupBy.size not returning a Series when axis=1 (48738)
  • Fixed Regression in .DataFrameGroupBy.apply when user defined function is called on an empty dataframe (47985)
  • Fixed regression in DataFrame.apply when passing non-zero axis via keyword argument (48656)
  • Fixed regression in Series.groupby and DataFrame.groupby when the grouper is a nullable data type (e.g. Int64) or a PyArrow-backed string array, contains null values, and dropna=False (48794)
  • Fixed performance regression in Series.isin with mismatching dtypes (49162)
  • Fixed regression in DataFrame.to_parquet raising when file name was specified as bytes (48944)
  • Fixed regression in ExcelWriter where the book attribute could no longer be set; however setting this attribute is now deprecated and this ability will be removed in a future version of pandas (48780)
  • Fixed regression in DataFrame.corrwith when computing correlation on tied data with method="spearman" (48826)

Bug fixes

  • Bug in Series.__getitem__ not falling back to positional for integer keys and boolean Index (48653)
  • Bug in DataFrame.to_hdf raising AssertionError with boolean index (48667)
  • Bug in testing.assert_index_equal for extension arrays with non matching NA raising ValueError (48608)
  • Bug in DataFrame.pivot_table raising unexpected FutureWarning when setting datetime column as index (48683)
  • Bug in DataFrame.sort_values emitting unnecessary FutureWarning when called on DataFrame with boolean sparse columns (48784)
  • Bug in .arrays.ArrowExtensionArray with a comparison operator to an invalid object would not raise a NotImplementedError (48833)

Other

  • Avoid showing deprecated signatures when introspecting functions with warnings about arguments becoming keyword-only (48692)

Contributors

v1.5.0..v1.5.1