Skip to content

Commit

Permalink
Change the default dtype of get_dummies to bool
Browse files Browse the repository at this point in the history
  • Loading branch information
kianelbo committed Sep 22, 2022
1 parent 707a222 commit 2ef8633
Show file tree
Hide file tree
Showing 12 changed files with 88 additions and 124 deletions.
8 changes: 0 additions & 8 deletions doc/source/user_guide/reshaping.rst
Original file line number Diff line number Diff line change
Expand Up @@ -608,7 +608,6 @@ values, can derive a :class:`DataFrame` containing ``k`` columns of 1s and 0s us
:func:`~pandas.get_dummies`:

.. ipython:: python
:okwarning:
df = pd.DataFrame({"key": list("bbacab"), "data1": range(6)})
Expand All @@ -618,7 +617,6 @@ Sometimes it's useful to prefix the column names, for example when merging the r
with the original :class:`DataFrame`:

.. ipython:: python
:okwarning:
dummies = pd.get_dummies(df["key"], prefix="key")
dummies
Expand All @@ -628,7 +626,6 @@ with the original :class:`DataFrame`:
This function is often used along with discretization functions like :func:`~pandas.cut`:

.. ipython:: python
:okwarning:
values = np.random.randn(10)
values
Expand All @@ -645,7 +642,6 @@ variables (categorical in the statistical sense, those with ``object`` or


.. ipython:: python
:okwarning:
df = pd.DataFrame({"A": ["a", "b", "a"], "B": ["c", "c", "b"], "C": [1, 2, 3]})
pd.get_dummies(df)
Expand All @@ -654,7 +650,6 @@ All non-object columns are included untouched in the output. You can control
the columns that are encoded with the ``columns`` keyword.

.. ipython:: python
:okwarning:
pd.get_dummies(df, columns=["A"])
Expand All @@ -672,7 +667,6 @@ the prefix separator. You can specify ``prefix`` and ``prefix_sep`` in 3 ways:
* dict: Mapping column name to prefix.

.. ipython:: python
:okwarning:
simple = pd.get_dummies(df, prefix="new_prefix")
simple
Expand All @@ -686,7 +680,6 @@ variable to avoid collinearity when feeding the result to statistical models.
You can switch to this mode by turn on ``drop_first``.

.. ipython:: python
:okwarning:
s = pd.Series(list("abcaa"))
Expand All @@ -697,7 +690,6 @@ You can switch to this mode by turn on ``drop_first``.
When a column contains only one level, it will be omitted in the result.

.. ipython:: python
:okwarning:
df = pd.DataFrame({"A": list("aaaaa"), "B": list("ababc")})
Expand Down
1 change: 0 additions & 1 deletion doc/source/whatsnew/v0.13.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -501,7 +501,6 @@ Enhancements
- ``NaN`` handing in get_dummies (:issue:`4446`) with ``dummy_na``

.. ipython:: python
:okwarning:
# previously, nan was erroneously counted as 2 here
# now it is not counted at all
Expand Down
1 change: 0 additions & 1 deletion doc/source/whatsnew/v0.15.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1007,7 +1007,6 @@ Other:
left untouched.

.. ipython:: python
:okwarning:
df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['c', 'c', 'b'],
'C': [1, 2, 3]})
Expand Down
1 change: 0 additions & 1 deletion doc/source/whatsnew/v0.19.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -431,7 +431,6 @@ The ``pd.get_dummies`` function now returns dummy-encoded columns as small integ
**New behavior**:

.. ipython:: python
:okwarning:
pd.get_dummies(["a", "b", "a", "c"]).dtypes
Expand Down
1 change: 0 additions & 1 deletion doc/source/whatsnew/v0.23.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -366,7 +366,6 @@ Function ``get_dummies`` now supports ``dtype`` argument
The :func:`get_dummies` now accepts a ``dtype`` argument, which specifies a dtype for the new columns. The default remains uint8. (:issue:`18330`)

.. ipython:: python
:okwarning:
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
pd.get_dummies(df, columns=['c']).dtypes
Expand Down
1 change: 0 additions & 1 deletion doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -833,7 +833,6 @@ then all the columns are dummy-encoded, and a :class:`SparseDataFrame` was retur
Now, the return type is consistently a :class:`DataFrame`.

.. ipython:: python
:okwarning:
type(pd.get_dummies(df, sparse=True))
type(pd.get_dummies(df[['B', 'C']], sparse=True))
Expand Down
3 changes: 1 addition & 2 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -932,7 +932,6 @@ Other Deprecations
- Deprecated unused arguments ``encoding`` and ``verbose`` in :meth:`Series.to_excel` and :meth:`DataFrame.to_excel` (:issue:`47912`)
- Deprecated the ``inplace`` keyword in :meth:`DataFrame.set_axis` and :meth:`Series.set_axis`, use ``obj = obj.set_axis(..., copy=False)`` instead (:issue:`48130`)
- Deprecated producing a single element when iterating over a :class:`DataFrameGroupBy` or a :class:`SeriesGroupBy` that has been grouped by a list of length 1; A tuple of length one will be returned instead (:issue:`42795`)
- Deprecated ``np.uint8`` as the default ``dtype`` for :func:`get_dummies` - in a future version, it will be changed to ``bool`` (:issue:`45848`)
- Fixed up warning message of deprecation of :meth:`MultiIndex.lesort_depth` as public method, as the message previously referred to :meth:`MultiIndex.is_lexsorted` instead (:issue:`38701`)
- Deprecated the ``sort_columns`` argument in :meth:`DataFrame.plot` and :meth:`Series.plot` (:issue:`47563`).
- Deprecated positional arguments for all but the first argument of :meth:`DataFrame.to_stata` and :func:`read_stata`, use keyword arguments instead (:issue:`48128`).
Expand Down Expand Up @@ -1192,7 +1191,6 @@ Groupby/resample/rolling
- Bug in :meth:`DataFrameGroupBy.resample` raises ``KeyError`` when getting the result from a key list which misses the resample key (:issue:`47362`)
- Bug in :meth:`DataFrame.groupby` would lose index columns when the DataFrame is empty for transforms, like fillna (:issue:`47787`)
- Bug in :meth:`DataFrame.groupby` and :meth:`Series.groupby` with ``dropna=False`` and ``sort=False`` would put any null groups at the end instead the order that they are encountered (:issue:`46584`)
-

Reshaping
^^^^^^^^^
Expand All @@ -1210,6 +1208,7 @@ Reshaping
- Bug in :meth:`concat` when ``axis=1`` and ``sort=False`` where the resulting Index was a :class:`Int64Index` instead of a :class:`RangeIndex` (:issue:`46675`)
- Bug in :meth:`wide_to_long` raises when ``stubnames`` is missing in columns and ``i`` contains string dtype column (:issue:`46044`)
- Bug in :meth:`DataFrame.join` with categorical index results in unexpected reordering (:issue:`47812`)
- Bug in :func:`get_dummies` ``np.uint8`` being the default ``dtype``, changed to ``bool`` (:issue:`45848`)

Sparse
^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.5.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Fixed regressions

Bug fixes
~~~~~~~~~
-
- Bug in :func:`get_dummies` with default ``dtype`` being ``uint8`` - the default ``dtype`` is now changed to ``bool`` (:issue:`45848`)
-

.. ---------------------------------------------------------------------------
Expand Down
16 changes: 4 additions & 12 deletions pandas/core/reshape/encoding.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,13 @@
from __future__ import annotations

from collections import defaultdict
import inspect
import itertools
from typing import Hashable
import warnings

import numpy as np

from pandas._libs.sparse import IntIndex
from pandas._typing import Dtype
from pandas.util._exceptions import find_stack_level

from pandas.core.dtypes.common import (
is_integer_dtype,
Expand Down Expand Up @@ -66,7 +63,7 @@ def get_dummies(
drop_first : bool, default False
Whether to get k-1 dummies out of k categorical levels by removing the
first level.
dtype : dtype, default np.uint8
dtype : dtype, default bool
Data type for new columns. Only a single dtype is allowed.
Returns
Expand Down Expand Up @@ -236,16 +233,11 @@ def _get_dummies_1d(
codes, levels = factorize_from_iterable(Series(data))

if dtype is None:
warnings.warn(
"In a future version of pandas the default dtype will change from "
"'uint8' to 'bool', please specify a dtype to silence this warning",
FutureWarning,
stacklevel=find_stack_level(inspect.currentframe()),
)
dtype = np.dtype(np.uint8)
dtype = bool
# error: Argument 1 to "dtype" has incompatible type "Union[ExtensionDtype, str,
# dtype[Any], Type[object]]"; expected "Type[Any]"
dtype = np.dtype(dtype) # type: ignore[arg-type]
else:
dtype = np.dtype(dtype) # type: ignore[arg-type]

if is_object_dtype(dtype):
raise ValueError("dtype=object is not a valid dtype for get_dummies")
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/frame/indexing/test_getitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def test_getitem_list_of_labels_categoricalindex_cols(self):
cats = Categorical([Timestamp("12-31-1999"), Timestamp("12-31-2000")])

expected = DataFrame(
[[1, 0], [0, 1]], dtype="uint8", index=[0, 1], columns=cats
[[1, 0], [0, 1]], dtype="bool", index=[0, 1], columns=cats
)
dummies = get_dummies(cats)
result = dummies[list(dummies.columns)]
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/frame/methods/test_sort_values.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ def test_sort_values_sparse_no_warning(self):
# GH#45618
# TODO(2.0): test will be unnecessary
ser = pd.Series(Categorical(["a", "b", "a"], categories=["a", "b", "c"]))
df = pd.get_dummies(ser, sparse=True)
df = pd.get_dummies(ser, dtype=np.uint8, sparse=True)

with tm.assert_produces_warning(None):
# No warnings about constructing Index from SparseArray
Expand Down

0 comments on commit 2ef8633

Please sign in to comment.