DEPR: array for tz-aware Series/Index #24596

TomAugspurger · 2019-01-03T17:55:54Z

This deprecates the current behvior when converting tz-aware Series
or Index to an ndarray. Previously, we converted to M8[ns], throwing
away the timezone information. In the future, we will return an
object-dtype array filled with Timestamps, each of which has the correct
tz.

In [1]: import pandas as pd; import numpy as np

In [2]: ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))

In [3]: np.asarray(ser)
/bin/ipython:1: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'.
        To accept the future behavior, pass 'dtype=object'.
        To keep the old behavior, pass 'dtype="datetime64[ns]"'.
  #!/Users/taugspurger/Envs/pandas-dev/bin/python3
Out[3]:
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')

xref #23569

closes #15750

This deprecates the current behvior when converting tz-aware Series or Index to an ndarray. Previously, we converted to M8[ns], throwing away the timezone information. In the future, we will return an object-dtype array filled with Timestamps, each of which has the correct tz. ```python In [1]: import pandas as pd; import numpy as np In [2]: ser = pd.Series(pd.date_range('2000', periods=2, tz="CET")) In [3]: np.asarray(ser) /bin/ipython:1: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'. To accept the future behavior, pass 'dtype=object'. To keep the old behavior, pass 'dtype="datetime64[ns]"'. #!/Users/taugspurger/Envs/pandas-dev/bin/python3 Out[3]: array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'], dtype='datetime64[ns]') ``` xref pandas-dev#23569

doc/source/whatsnew/v0.24.0.rst

pandas/core/arrays/datetimes.py

TomAugspurger · 2019-01-03T17:57:42Z

pandas/core/dtypes/cast.py

@@ -1020,7 +1020,7 @@ def maybe_cast_to_datetime(value, dtype, errors='raise'):
                            # datetime64tz is assumed to be naive which should
                            # be localized to the timezone.
                            is_dt_string = is_string_dtype(value)
-                            value = to_datetime(value, errors=errors)
+                            value = to_datetime(value, errors=errors).array


Need to look at this closer. maybe_cast_to_datetime seems in need of an overhaul (along with all of sanitize_array) but this at least avoids the warning.

pandas/core/groupby/groupby.py

TomAugspurger · 2019-01-03T17:59:18Z

pandas/core/indexing.py

-                elif np.array(value).ndim == 2:
+                # hasattr first, to avoid coercing to ndarray without reason.
+                # But we may be relying on the ndarray coercion to check ndim.
+                # Why not just convert to an ndarray earlier on if needed?


Hoping to clean up the type on value a bit to avoid this.

can you add a TODO for any section that we should change later

pandas/core/internals/blocks.py

pandas/core/series.py

pandas/core/groupby/groupby.py

mroeschke · 2019-01-03T18:14:11Z

xrefing #15750 as I think it's related to the eventual end goal.

jreback

haven't looked in detail

pandas/core/indexes/datetimes.py

codecov · 2019-01-03T19:56:15Z

Codecov Report

Merging #24596 into master will decrease coverage by 49.33%.
The diff coverage is 28.88%.

@@             Coverage Diff             @@
##           master   #24596       +/-   ##
===========================================
- Coverage   92.38%   43.05%   -49.34%     
===========================================
  Files         166      166               
  Lines       52478    52514       +36     
===========================================
- Hits        48483    22609    -25874     
- Misses       3995    29905    +25910

Flag	Coverage Δ
#multiple	`?`
#single	`43.05% <28.88%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexing.py	`51.83% <0%> (-42.04%)`	⬇️
pandas/core/reshape/tile.py	`11.36% <0%> (-83.47%)`	⬇️
pandas/core/groupby/groupby.py	`24.28% <0%> (-72.53%)`	⬇️
pandas/core/dtypes/cast.py	`48.59% <0%> (-40.14%)`	⬇️
pandas/core/arrays/datetimes.py	`65.78% <100%> (-32.24%)`	⬇️
pandas/core/internals/construction.py	`62.5% <100%> (-34.17%)`	⬇️
pandas/core/internals/blocks.py	`51.58% <14.28%> (-42.91%)`	⬇️
pandas/core/series.py	`49.33% <40%> (-44.21%)`	⬇️
pandas/core/indexes/datetimes.py	`48.45% <50%> (-47.78%)`	⬇️
pandas/core/dtypes/dtypes.py	`73.52% <66.66%> (-21.82%)`	⬇️
... and 133 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62506ca...ea44792. Read the comment docs.

codecov · 2019-01-03T19:56:17Z

Codecov Report

Merging #24596 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24596      +/-   ##
==========================================
- Coverage   92.37%   92.37%   -0.01%     
==========================================
  Files         166      166              
  Lines       52396    52415      +19     
==========================================
+ Hits        48403    48420      +17     
- Misses       3993     3995       +2

Flag	Coverage Δ
#multiple	`90.8% <100%> (-0.01%)`	⬇️
#single	`43.01% <37.14%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/dtypes/dtypes.py	`95.58% <100%> (+0.03%)`	⬆️
pandas/core/indexing.py	`93.87% <100%> (ø)`	⬆️
pandas/core/reshape/tile.py	`94.88% <100%> (+0.05%)`	⬆️
pandas/core/arrays/datetimes.py	`98.01% <100%> (ø)`	⬆️
pandas/core/internals/blocks.py	`94.16% <100%> (-0.06%)`	⬇️
pandas/core/nanops.py	`94.36% <100%> (ø)`	⬆️
pandas/core/groupby/groupby.py	`96.8% <100%> (ø)`	⬆️
pandas/core/dtypes/cast.py	`88.72% <100%> (ø)`	⬆️
pandas/core/indexes/datetimes.py	`96.26% <100%> (+0.03%)`	⬆️
pandas/core/internals/construction.py	`95.93% <100%> (-0.75%)`	⬇️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19f715c...50f4fbd. Read the comment docs.

doc/source/whatsnew/v0.24.0.rst

pandas/core/internals/blocks.py

pandas/core/series.py

doc/source/whatsnew/v0.24.0.rst

pandas/core/dtypes/dtypes.py

jreback · 2019-01-04T15:41:41Z

pandas/core/dtypes/dtypes.py

@@ -420,6 +421,11 @@ def _hash_categories(categories, ordered=True):
                    # find a better solution
                    hashed = hash((tuple(categories), ordered))
                    return hashed
+
+            if is_datetime64tz_dtype(categories.dtype):


can you categories.to_numpy() always?

Hmm, possibly. We'll still need the special case for datetime64tz_dtype to pass dtype=_NS_DTYPE, since Index[datetime64[ns, tz]].to_numpy() returns an ndarray of Timestamp objects.

maybe add a TODO here, this is kind of special casing

pandas/core/reshape/tile.py

TomAugspurger · 2019-01-04T15:56:01Z

I can reproduce the failures with older numpys locally. Debugging that now.

TomAugspurger · 2019-01-04T16:20:47Z

Turns out it was bottleneck

@jreback right now bn_ok_dtype excludes datetime & timedelta, but not datetimetz in

pandas/pandas/core/nanops.py

Line 147 in 19f715c

if (not is_object_dtype(dt) and not is_datetime_or_timedelta_dtype(dt)):

I assume we don't want to pass datetimetz to bottleneck, since the actual operation should be done on the same values (i8 or M8[ns]).

jreback · 2019-01-04T16:22:03Z

right should exclude anything that matches needs_i8_conversion

TomAugspurger · 2019-01-04T17:42:07Z

349f818 updates with

changes the parameter name from result to dtype to match the array interface
Expands the docstring
Adds to the API docs (along with a few others that were missing)

jreback · 2019-01-04T17:48:35Z

pandas/core/internals/blocks.py

@@ -1447,8 +1447,18 @@ def quantile(self, qs, interpolation='linear', axis=0):
        -------
        Block
        """
-        values = self.get_values()
-        values, _ = self._try_coerce_args(values, values)
+        if self.is_datetimetz:


this is getting super messy

Agreed. A proper fix is updating _try_coerce_args / get_values, which I think @jbrockmendel is working on. But this is necessary now to avoid the warning / conversion to object dtype.

None of the branches I have in progress would help here.

Allowing for DatetimeArray to be reshaped to (1, nrows) would.

add a TODO here

pandas/core/internals/blocks.py

doc/source/whatsnew/v0.24.0.rst

jreback · 2019-01-04T18:40:16Z

pandas/core/indexing.py

-                elif np.array(value).ndim == 2:
+                # hasattr first, to avoid coercing to ndarray without reason.
+                # But we may be relying on the ndarray coercion to check ndim.
+                # Why not just convert to an ndarray earlier on if needed?


can you add a TODO for any section that we should change later

jreback · 2019-01-04T18:40:25Z

pandas/core/internals/blocks.py

@@ -1447,8 +1447,18 @@ def quantile(self, qs, interpolation='linear', axis=0):
        -------
        Block
        """
-        values = self.get_values()
-        values, _ = self._try_coerce_args(values, values)
+        if self.is_datetimetz:


add a TODO here

pandas/core/series.py

jreback · 2019-01-04T18:41:51Z

pandas/tests/dtypes/test_missing.py

-    assert not array_equivalent(
-        DatetimeIndex([0, np.nan], tz='CET'), DatetimeIndex(
-            [0, np.nan], tz='US/Eastern'))
+    with catch_warnings():


what warning are you catching here?

array_equivalent calls __array__, so the new deprecation warning comes through.

We don't care about the warning here (the test doesn't care whether they're objects or datetimes), so we just ignore the warning.

I tightened up the filter.

TomAugspurger · 2019-01-04T20:26:46Z

All green. I think things are decent here.

I didn't add a TODO in
#24596 (comment). I could really make sense of what's going on there / when the best time to convert a list-like to an array-like would be.

jreback

going to merge, but would like to add a couple of TODOs where may need followups.

jreback · 2019-01-05T14:15:01Z

doc/source/whatsnew/v0.24.0.rst

+
+   np.asarray(ser, dtype='datetime64[ns]')
+
+*Future Behavior*


we usually call this current

jreback · 2019-01-05T14:15:46Z

pandas/core/dtypes/dtypes.py

@@ -420,6 +421,11 @@ def _hash_categories(categories, ordered=True):
                    # find a better solution
                    hashed = hash((tuple(categories), ordered))
                    return hashed
+
+            if is_datetime64tz_dtype(categories.dtype):


maybe add a TODO here, this is kind of special casing

jreback · 2019-01-05T14:51:16Z

thanks!

jbrockmendel · 2019-12-10T19:27:43Z

This didn't make it onto #6581, should we enforce it for 1.0?

TomAugspurger · 2019-12-30T13:50:17Z

I don't have a strong opinion.

TomAugspurger · 2019-12-30T20:16:47Z

@jbrockmendel I'll put up a PR enforcing this, just so we have #6581 cleared.

TomAugspurger added Timeseries Timezones Timezone data dtype Deprecate Functionality to remove in pandas labels Jan 3, 2019

TomAugspurger added this to the 0.24.0 milestone Jan 3, 2019

TomAugspurger commented Jan 3, 2019

View reviewed changes

mroeschke reviewed Jan 3, 2019

View reviewed changes

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

jreback requested changes Jan 3, 2019

View reviewed changes

pandas/core/indexes/datetimes.py Outdated Show resolved Hide resolved

fixup

ea44792

jorisvandenbossche reviewed Jan 3, 2019

View reviewed changes

doc/source/whatsnew/v0.24.0.rst Outdated Show resolved Hide resolved

pandas/core/internals/blocks.py Outdated Show resolved Hide resolved

pandas/core/series.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'upstream/master' into dt-array-5

3c1ffd0

TomAugspurger commented Jan 4, 2019

View reviewed changes

doc/source/whatsnew/v0.24.0.rst Show resolved Hide resolved

updates

66d1843

jreback reviewed Jan 4, 2019

View reviewed changes

doc/source/whatsnew/v0.24.0.rst Show resolved Hide resolved

jreback requested changes Jan 4, 2019

View reviewed changes

exclude datetimetz for bn

328338c

update parameter name and docstring

349f818

jreback reviewed Jan 4, 2019

View reviewed changes

jreback approved these changes Jan 4, 2019

View reviewed changes

jbrockmendel reviewed Jan 4, 2019

View reviewed changes

pandas/core/internals/blocks.py Show resolved Hide resolved

jreback requested changes Jan 4, 2019

View reviewed changes

updates

50f4fbd

jreback approved these changes Jan 5, 2019

View reviewed changes

jreback merged commit fe29123 into pandas-dev:master Jan 5, 2019

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

DEPR: __array__ for tz-aware Series/Index (pandas-dev#24596)

492e006

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

DEPR: __array__ for tz-aware Series/Index (pandas-dev#24596)

1a0b845

jsexauer mentioned this pull request Dec 10, 2019

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

TomAugspurger deleted the dt-array-5 branch December 30, 2019 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPR: array for tz-aware Series/Index #24596

DEPR: array for tz-aware Series/Index #24596

TomAugspurger commented Jan 3, 2019 •

edited

TomAugspurger Jan 3, 2019

TomAugspurger Jan 3, 2019

jreback Jan 4, 2019

mroeschke commented Jan 3, 2019

jreback left a comment

codecov bot commented Jan 3, 2019

codecov bot commented Jan 3, 2019 •

edited

jreback Jan 4, 2019

TomAugspurger Jan 4, 2019

jreback Jan 5, 2019

TomAugspurger commented Jan 4, 2019

TomAugspurger commented Jan 4, 2019

jreback commented Jan 4, 2019

TomAugspurger commented Jan 4, 2019

jreback Jan 4, 2019

TomAugspurger Jan 4, 2019

jbrockmendel Jan 4, 2019

jreback Jan 4, 2019

jreback Jan 4, 2019

jreback Jan 4, 2019

jreback Jan 4, 2019

TomAugspurger Jan 4, 2019

TomAugspurger Jan 4, 2019

TomAugspurger commented Jan 4, 2019

jreback left a comment

jreback Jan 5, 2019

jreback Jan 5, 2019

jreback commented Jan 5, 2019

jbrockmendel commented Dec 10, 2019

TomAugspurger commented Dec 30, 2019

TomAugspurger commented Dec 30, 2019

DEPR: __array__ for tz-aware Series/Index #24596

DEPR: __array__ for tz-aware Series/Index #24596

Conversation

TomAugspurger commented Jan 3, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mroeschke commented Jan 3, 2019

jreback left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 3, 2019

Codecov Report

codecov bot commented Jan 3, 2019 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Jan 4, 2019

TomAugspurger commented Jan 4, 2019

jreback commented Jan 4, 2019

TomAugspurger commented Jan 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Jan 4, 2019

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jan 5, 2019

jbrockmendel commented Dec 10, 2019

TomAugspurger commented Dec 30, 2019

TomAugspurger commented Dec 30, 2019

DEPR: array for tz-aware Series/Index #24596

DEPR: array for tz-aware Series/Index #24596

TomAugspurger commented Jan 3, 2019 •

edited

codecov bot commented Jan 3, 2019 •

edited