Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: change get_dummies default dtype to bool #48022

Merged
merged 21 commits into from
Oct 11, 2022
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
1eb5cd6
ENH: Warn when dtype is not passed to get_dummies
kianelbo Aug 10, 2022
efa678b
Edit get_dummies' dtype warning
kianelbo Aug 10, 2022
472fa28
Add whatsnew entry for issue #45848
kianelbo Aug 10, 2022
2ead750
Fix dtype warning test
kianelbo Aug 10, 2022
ddcc7d3
Suppress warnings in docs
kianelbo Aug 10, 2022
81dbb87
Edit whatsnew entry
kianelbo Aug 10, 2022
45d9c79
Merge branch 'main' into 'getdummies-default-dtype'
kianelbo Aug 23, 2022
f97df66
Fix find_stack_level in get_dummies dtype warning
kianelbo Aug 23, 2022
707a222
Merge branch 'main' into getdummies-default-dtype
kianelbo Sep 21, 2022
15aeb3e
Change the default dtype of get_dummies to bool
kianelbo Sep 22, 2022
a5f709d
Merge branch 'main' into 'getdummies-default-dtype'
kianelbo Sep 23, 2022
a246b8c
Revert dtype(bool) change
kianelbo Sep 25, 2022
7d72067
Merge branch 'main' again
kianelbo Sep 27, 2022
940bd11
Merge branch 'main' into getdummies-default-dtype
MarcoGorelli Sep 29, 2022
ee06958
Merge branch 'main' into getdummies-default-dtype
MarcoGorelli Oct 5, 2022
6e90b45
Merge branch 'main' into getdummies-default-dtype
MarcoGorelli Oct 6, 2022
ce37f33
Merge branch 'main' into getdummies-default-dtype
kianelbo Oct 7, 2022
7cef2fc
Move the changelog entry to v1.6.0.rst
kianelbo Oct 7, 2022
9285bf1
Merge branch 'main' into getdummies-default-dtype
MarcoGorelli Oct 10, 2022
d7e6490
Merge branch 'main' into getdummies-default-dtype
kianelbo Oct 11, 2022
8a93cc9
Move whatsnew entry to 'Other API changes'
kianelbo Oct 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 8 additions & 0 deletions doc/source/user_guide/reshaping.rst
Original file line number Diff line number Diff line change
Expand Up @@ -608,6 +608,7 @@ values, can derive a :class:`DataFrame` containing ``k`` columns of 1s and 0s us
:func:`~pandas.get_dummies`:

.. ipython:: python
:okwarning:

df = pd.DataFrame({"key": list("bbacab"), "data1": range(6)})

Expand All @@ -617,6 +618,7 @@ Sometimes it's useful to prefix the column names, for example when merging the r
with the original :class:`DataFrame`:

.. ipython:: python
:okwarning:

dummies = pd.get_dummies(df["key"], prefix="key")
dummies
Expand All @@ -626,6 +628,7 @@ with the original :class:`DataFrame`:
This function is often used along with discretization functions like :func:`~pandas.cut`:

.. ipython:: python
:okwarning:

values = np.random.randn(10)
values
Expand All @@ -642,6 +645,7 @@ variables (categorical in the statistical sense, those with ``object`` or


.. ipython:: python
:okwarning:

df = pd.DataFrame({"A": ["a", "b", "a"], "B": ["c", "c", "b"], "C": [1, 2, 3]})
pd.get_dummies(df)
Expand All @@ -650,6 +654,7 @@ All non-object columns are included untouched in the output. You can control
the columns that are encoded with the ``columns`` keyword.

.. ipython:: python
:okwarning:

pd.get_dummies(df, columns=["A"])

Expand All @@ -667,6 +672,7 @@ the prefix separator. You can specify ``prefix`` and ``prefix_sep`` in 3 ways:
* dict: Mapping column name to prefix.

.. ipython:: python
:okwarning:

simple = pd.get_dummies(df, prefix="new_prefix")
simple
Expand All @@ -680,6 +686,7 @@ variable to avoid collinearity when feeding the result to statistical models.
You can switch to this mode by turn on ``drop_first``.

.. ipython:: python
:okwarning:

s = pd.Series(list("abcaa"))

Expand All @@ -690,6 +697,7 @@ You can switch to this mode by turn on ``drop_first``.
When a column contains only one level, it will be omitted in the result.

.. ipython:: python
:okwarning:

df = pd.DataFrame({"A": list("aaaaa"), "B": list("ababc")})

Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.13.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -501,6 +501,7 @@ Enhancements
- ``NaN`` handing in get_dummies (:issue:`4446`) with ``dummy_na``

.. ipython:: python
:okwarning:

# previously, nan was erroneously counted as 2 here
# now it is not counted at all
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.15.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1007,6 +1007,7 @@ Other:
left untouched.

.. ipython:: python
:okwarning:

df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['c', 'c', 'b'],
'C': [1, 2, 3]})
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.19.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,7 @@ The ``pd.get_dummies`` function now returns dummy-encoded columns as small integ
**New behavior**:

.. ipython:: python
:okwarning:

pd.get_dummies(["a", "b", "a", "c"]).dtypes

Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,7 @@ Function ``get_dummies`` now supports ``dtype`` argument
The :func:`get_dummies` now accepts a ``dtype`` argument, which specifies a dtype for the new columns. The default remains uint8. (:issue:`18330`)

.. ipython:: python
:okwarning:

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
pd.get_dummies(df, columns=['c']).dtypes
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -833,6 +833,7 @@ then all the columns are dummy-encoded, and a :class:`SparseDataFrame` was retur
Now, the return type is consistently a :class:`DataFrame`.

.. ipython:: python
:okwarning:

type(pd.get_dummies(df, sparse=True))
type(pd.get_dummies(df[['B', 'C']], sparse=True))
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -876,6 +876,7 @@ Other Deprecations
- Deprecated unused arguments ``encoding`` and ``verbose`` in :meth:`Series.to_excel` and :meth:`DataFrame.to_excel` (:issue:`47912`)
- Deprecated the ``inplace`` keyword in :meth:`DataFrame.set_axis` and :meth:`Series.set_axis`, use ``obj = obj.set_axis(..., copy=False)`` instead (:issue:`48130`)
- Deprecated producing a single element when iterating over a :class:`DataFrameGroupBy` or a :class:`SeriesGroupBy` that has been grouped by a list of length 1; A tuple of length one will be returned instead (:issue:`42795`)
- Deprecated ``np.uint8`` as the default ``dtype`` for :func:`get_dummies` - in a future version, it will be changed to ``bool`` (:issue:`45848`)
- Fixed up warning message of deprecation of :meth:`MultiIndex.lesort_depth` as public method, as the message previously referred to :meth:`MultiIndex.is_lexsorted` instead (:issue:`38701`)
- Deprecated the ``inplace`` keyword in :meth:`DataFrame.set_index`, use ``df = df.set_index(..., copy=False)`` instead (:issue:`48115`)
- Deprecated the ``sort_columns`` argument in :meth:`DataFrame.plot` and :meth:`Series.plot` (:issue:`47563`).
Expand Down
9 changes: 9 additions & 0 deletions pandas/core/reshape/encoding.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
from __future__ import annotations

from collections import defaultdict
import inspect
import itertools
from typing import Hashable
import warnings

import numpy as np

from pandas._libs.sparse import IntIndex
from pandas._typing import Dtype
from pandas.util._exceptions import find_stack_level

from pandas.core.dtypes.common import (
is_integer_dtype,
Expand Down Expand Up @@ -228,6 +231,12 @@ def _get_dummies_1d(
codes, levels = factorize_from_iterable(Series(data))

if dtype is None:
warnings.warn(
"In a future version of pandas the default dtype will change from "
"'uint8' to 'bool', please specify a dtype to silence this warning",
FutureWarning,
stacklevel=find_stack_level(inspect.currentframe()),
)
dtype = np.dtype(np.uint8)
# error: Argument 1 to "dtype" has incompatible type "Union[ExtensionDtype, str,
# dtype[Any], Type[object]]"; expected "Type[Any]"
Expand Down