These are the changes in pandas 2.0.0. See release
for a full changelog including other versions of pandas.
{{ header }}
read_sas
now supports usingencoding='infer'
to correctly read and use the encoding specified by the sas file. (48048
).DataFrameGroupBy.quantile
and.SeriesGroupBy.quantile
now preserve nullable dtypes instead of casting to numpy dtypes (37493
)Series.add_suffix
,DataFrame.add_suffix
,Series.add_prefix
andDataFrame.add_prefix
support anaxis
argument. Ifaxis
is set, the default behaviour of which axis to consider can be overwritten (47819
)assert_frame_equal
now shows the first element where the DataFrames differ, analogously topytest
's output (47910
)- Added new argument
use_nullable_dtypes
toread_csv
to enable automatic conversion to nullable dtypes (36712
) - Added
index
parameter toDataFrame.to_dict
(46398
) - Added metadata propagation for binary operators on
DataFrame
(28283
) .CategoricalConversionWarning
,.InvalidComparison
,.InvalidVersion
,.LossySetitemError
, and.NoBufferPresent
are now exposed inpandas.errors
(27656
)
These are bug fixes that might have notable behavior changes.
In previous versions we cast to float when applying cumsum
and cumprod
which lead to incorrect results even if the result could be hold by int64
dtype. Additionally, the aggregation overflows consistent with numpy and the regular DataFrame.cumprod
and DataFrame.cumsum
methods when the limit of int64
is reached (37493
).
Old Behavior
In [1]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})
In [2]: df.groupby("key")["value"].cumprod()[5]
Out[2]: 5.960464477539062e+16
We return incorrect results with the 6th value.
New Behavior
python
df = pd.DataFrame({"key": ["b"] * 7, "value": 625}) df.groupby("key")["value"].cumprod()
We overflow with the 7th value, but the 6th value is still correct.
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package | Minimum Version | Required | Changed |
---|---|---|---|
mypy (dev) | 0.981 |
|
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package | Minimum Version | Changed |
---|---|---|
|
See install.dependencies
and install.optional_dependencies
for more.
- Passing
nanoseconds
greater than 999 or less than 0 inTimestamp
now raises aValueError
(48538
,48255
) read_csv
: specifying an incorrect number of columns withindex_col
of now raisesParserError
instead ofIndexError
when using the c parser.- Default value of
dtype
inget_dummies
is changed tobool
fromuint8
(45848
) DataFrame.astype
,Series.astype
, andDatetimeIndex.astype
casting datetime64 data to any of "datetime64[s]", "datetime64[ms]", "datetime64[us]" will return an object with the given resolution instead of coercing back to "datetime64[ns]" (48928
)DataFrame.astype
,Series.astype
, andDatetimeIndex.astype
casting timedelta64 data to any of "timedelta64[s]", "timedelta64[ms]", "timedelta64[us]" will return an object with the given resolution instead of coercing to "float64" dtype (48963
)- Passing a
np.datetime64
object with non-nanosecond resolution toTimestamp
will retain the input resolution if it is "s", "ms", or "ns"; otherwise it will be cast to the closest supported resolution (49008
)
- Performance improvement in
.DataFrameGroupBy.median
and.SeriesGroupBy.median
and.GroupBy.cumprod
for nullable dtypes (37493
) - Performance improvement in
MultiIndex.argsort
andMultiIndex.sort_values
(48406
) - Performance improvement in
MultiIndex.size
(48723
) - Performance improvement in
MultiIndex.union
without missing values and without duplicates (48505
) - Performance improvement in
MultiIndex.difference
(48606
) - Performance improvement in
MultiIndex
set operations with sort=None (49010
) - Performance improvement in
.DataFrameGroupBy.mean
,.SeriesGroupBy.mean
,.DataFrameGroupBy.var
, and.SeriesGroupBy.var
for extension array dtypes (37493
) - Performance improvement in
MultiIndex.isin
whenlevel=None
(48622
) - Performance improvement in
Index.union
andMultiIndex.union
when index contains duplicates (48900
) - Performance improvement for
Series.value_counts
with nullable dtype (48338
) - Performance improvement for
Series
constructor passing integer numpy array with nullable dtype (48338
) - Performance improvement for
DatetimeIndex
constructor passing a list (48609
) - Performance improvement in
merge
andDataFrame.join
when joining on a sortedMultiIndex
(48504
) - Performance improvement in
DataFrame.loc
andSeries.loc
for tuple-based indexing of aMultiIndex
(48384
) - Performance improvement for
MultiIndex.unique
(48335
) - Performance improvement in
DataFrame.join
when joining on a subset of aMultiIndex
(48611
) - Performance improvement for
MultiIndex.intersection
(48604
) - Performance improvement in
var
for nullable dtypes (48379
). - Performance improvements to
read_sas
(47403
,47405
,47656
,48502
) - Memory improvement in
RangeIndex.sort_values
(48801
) - Performance improvement in
DataFrameGroupBy
andSeriesGroupBy
whenby
is a categorical type andsort=False
(48976
)
- Bug in
Categorical.set_categories
losing dtype information (48812
)
- Bug in
pandas.infer_freq
, raisingTypeError
when inferred onRangeIndex
(47084
) - Bug in
to_datetime
was raising on invalid offsets witherrors='coerce'
andinfer_datetime_format=True
(48633
) - Bug in
DatetimeIndex
constructor failing to raise whentz=None
is explicitly specified in conjunction with timezone-awaredtype
or data (48659
) - Bug in subtracting a
datetime
scalar fromDatetimeIndex
failing to retain the originalfreq
attribute (48818
)
- Bug in
to_timedelta
raising error when input has nullable dtypeFloat64
(48796
) - Bug in
Timedelta
constructor incorrectly raising instead of returningNaT
when given anp.timedelta64("nat")
(48898
) - Bug in
Timedelta
constructor failing to raise when passed both aTimedelta
object and keywords (e.g. days, seconds) (48898
)
- Bug in
DataFrame.add
cannot apply ufunc when inputs contain mixed DataFrame type and Series type (39853
)
- Bug in constructing
Series
withint64
dtype from a string list raising instead of casting (44923
) - Bug in
DataFrame.eval
incorrectly raising anAttributeError
when there are negative values in function call (46471
) - Bug in
Series.convert_dtypes
not converting dtype to nullable dtype whenSeries
containsNA
and has dtypeobject
(48791
) - Bug where any
ExtensionDtype
subclass withkind="M"
would be interpreted as a timezone type (34986
)
- Bug in
DataFrame.reindex
filling with wrong values when indexing columns and index foruint
dtypes (48184
) - Bug in
DataFrame.reindex
casting dtype toobject
whenDataFrame
has single extension array column when re-indexingcolumns
andindex
(48190
) - Bug in
~DataFrame.describe
when formatting percentiles in the resulting index showed more decimals than needed (46362
)
- Bug in
Index.equals
raisingTypeError
whenIndex
consists of tuples that containNA
(48446
)
- Bug in
MultiIndex.argsort
raisingTypeError
when index containsNA
(48495
) - Bug in
MultiIndex.difference
losing extension array dtype (48606
) - Bug in
MultiIndex.set_levels
raisingIndexError
when setting empty level (48636
) - Bug in
MultiIndex.unique
losing extension array dtype (48335
) - Bug in
MultiIndex.intersection
losing extension array (48604
) - Bug in
MultiIndex.union
losing extension array (48498
,48505
,48900
) - Bug in
MultiIndex.union
not sorting when sort=None and index contains missing values (49010
) - Bug in
MultiIndex.append
not checking names for equality (48288
) - Bug in
MultiIndex.symmetric_difference
losing extension array (48607
)
- Bug in
read_sas
caused fragmentation ofDataFrame
and raised.errors.PerformanceWarning
(48595
)
- Bug in
Period.strftime
andPeriodIndex.strftime
, raisingUnicodeDecodeError
when a locale-specific directive was passed (46319
)
- Bug in
.ExponentialMovingWindow
withonline
not raising aNotImplementedError
for unsupported operations (48834
) - Bug in
DataFrameGroupBy.sample
raisesValueError
when the object is empty (48459
) - Bug in
Series.groupby
raisesValueError
when an entry of the index is equal to the name of the index (48567
) - Bug in
DataFrameGroupBy.resample
produces inconsistent results when passing empty DataFrame (47705
)
- Bug in
DataFrame.pivot_table
raisingTypeError
for nullable dtype andmargins=True
(48681
) - Bug in
DataFrame.pivot
not respectingNone
as column name (48293
) - Bug in
join
whenleft_on
orright_on
is or includes aCategoricalIndex
incorrectly raisingAttributeError
(48464
)
- Bug in
Series.mean
overflowing unnecessarily with nullable integers (48378
) - Bug when concatenating an empty DataFrame with an ExtensionDtype to another DataFrame with the same ExtensionDtype, the resulting dtype turned into object (
48510
)
- Fixed metadata propagation in
DataFrame.corr
andDataFrame.cov
(28283
)