Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: __array__ for tz-aware Series/Index #24596

Merged
merged 7 commits into from
Jan 5, 2019

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Jan 3, 2019

This deprecates the current behvior when converting tz-aware Series
or Index to an ndarray. Previously, we converted to M8[ns], throwing
away the timezone information. In the future, we will return an
object-dtype array filled with Timestamps, each of which has the correct
tz.

In [1]: import pandas as pd; import numpy as np

In [2]: ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))

In [3]: np.asarray(ser)
/bin/ipython:1: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'.
        To accept the future behavior, pass 'dtype=object'.
        To keep the old behavior, pass 'dtype="datetime64[ns]"'.
  #!/Users/taugspurger/Envs/pandas-dev/bin/python3
Out[3]:
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')

xref #23569

closes #15750

This deprecates the current behvior when converting tz-aware Series
or Index to an ndarray. Previously, we converted to M8[ns], throwing
away the timezone information. In the future, we will return an
object-dtype array filled with Timestamps, each of which has the correct
tz.

```python
In [1]: import pandas as pd; import numpy as np

In [2]: ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))

In [3]: np.asarray(ser)
/bin/ipython:1: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'.
        To accept the future behavior, pass 'dtype=object'.
        To keep the old behavior, pass 'dtype="datetime64[ns]"'.
  #!/Users/taugspurger/Envs/pandas-dev/bin/python3
Out[3]:
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')
```

xref pandas-dev#23569
@TomAugspurger TomAugspurger added Timeseries Timezones Timezone data dtype Deprecate Functionality to remove in pandas labels Jan 3, 2019
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Jan 3, 2019
doc/source/whatsnew/v0.24.0.rst Show resolved Hide resolved
pandas/core/arrays/datetimes.py Show resolved Hide resolved
@@ -1020,7 +1020,7 @@ def maybe_cast_to_datetime(value, dtype, errors='raise'):
# datetime64tz is assumed to be naive which should
# be localized to the timezone.
is_dt_string = is_string_dtype(value)
value = to_datetime(value, errors=errors)
value = to_datetime(value, errors=errors).array
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to look at this closer. maybe_cast_to_datetime seems in need of an overhaul (along with all of sanitize_array) but this at least avoids the warning.

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved
elif np.array(value).ndim == 2:
# hasattr first, to avoid coercing to ndarray without reason.
# But we may be relying on the ndarray coercion to check ndim.
# Why not just convert to an ndarray earlier on if needed?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hoping to clean up the type on value a bit to avoid this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a TODO for any section that we should change later

pandas/core/internals/blocks.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
@mroeschke
Copy link
Member

xrefing #15750 as I think it's related to the eventual end goal.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't looked in detail

pandas/core/indexes/datetimes.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jan 3, 2019

Codecov Report

Merging #24596 into master will decrease coverage by 49.33%.
The diff coverage is 28.88%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #24596       +/-   ##
===========================================
- Coverage   92.38%   43.05%   -49.34%     
===========================================
  Files         166      166               
  Lines       52478    52514       +36     
===========================================
- Hits        48483    22609    -25874     
- Misses       3995    29905    +25910
Flag Coverage Δ
#multiple ?
#single 43.05% <28.88%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexing.py 51.83% <0%> (-42.04%) ⬇️
pandas/core/reshape/tile.py 11.36% <0%> (-83.47%) ⬇️
pandas/core/groupby/groupby.py 24.28% <0%> (-72.53%) ⬇️
pandas/core/dtypes/cast.py 48.59% <0%> (-40.14%) ⬇️
pandas/core/arrays/datetimes.py 65.78% <100%> (-32.24%) ⬇️
pandas/core/internals/construction.py 62.5% <100%> (-34.17%) ⬇️
pandas/core/internals/blocks.py 51.58% <14.28%> (-42.91%) ⬇️
pandas/core/series.py 49.33% <40%> (-44.21%) ⬇️
pandas/core/indexes/datetimes.py 48.45% <50%> (-47.78%) ⬇️
pandas/core/dtypes/dtypes.py 73.52% <66.66%> (-21.82%) ⬇️
... and 133 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62506ca...ea44792. Read the comment docs.

@codecov
Copy link

codecov bot commented Jan 3, 2019

Codecov Report

Merging #24596 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24596      +/-   ##
==========================================
- Coverage   92.37%   92.37%   -0.01%     
==========================================
  Files         166      166              
  Lines       52396    52415      +19     
==========================================
+ Hits        48403    48420      +17     
- Misses       3993     3995       +2
Flag Coverage Δ
#multiple 90.8% <100%> (-0.01%) ⬇️
#single 43.01% <37.14%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/dtypes/dtypes.py 95.58% <100%> (+0.03%) ⬆️
pandas/core/indexing.py 93.87% <100%> (ø) ⬆️
pandas/core/reshape/tile.py 94.88% <100%> (+0.05%) ⬆️
pandas/core/arrays/datetimes.py 98.01% <100%> (ø) ⬆️
pandas/core/internals/blocks.py 94.16% <100%> (-0.06%) ⬇️
pandas/core/nanops.py 94.36% <100%> (ø) ⬆️
pandas/core/groupby/groupby.py 96.8% <100%> (ø) ⬆️
pandas/core/dtypes/cast.py 88.72% <100%> (ø) ⬆️
pandas/core/indexes/datetimes.py 96.26% <100%> (+0.03%) ⬆️
pandas/core/internals/construction.py 95.93% <100%> (-0.75%) ⬇️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19f715c...50f4fbd. Read the comment docs.

doc/source/whatsnew/v0.24.0.rst Outdated Show resolved Hide resolved
pandas/core/internals/blocks.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/dtypes/dtypes.py Show resolved Hide resolved
@@ -420,6 +421,11 @@ def _hash_categories(categories, ordered=True):
# find a better solution
hashed = hash((tuple(categories), ordered))
return hashed

if is_datetime64tz_dtype(categories.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you categories.to_numpy() always?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, possibly. We'll still need the special case for datetime64tz_dtype to pass dtype=_NS_DTYPE, since Index[datetime64[ns, tz]].to_numpy() returns an ndarray of Timestamp objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a TODO here, this is kind of special casing

pandas/core/reshape/tile.py Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor Author

I can reproduce the failures with older numpys locally. Debugging that now.

@TomAugspurger
Copy link
Contributor Author

Turns out it was bottleneck

@jreback right now bn_ok_dtype excludes datetime & timedelta, but not datetimetz in

if (not is_object_dtype(dt) and not is_datetime_or_timedelta_dtype(dt)):

I assume we don't want to pass datetimetz to bottleneck, since the actual operation should be done on the same values (i8 or M8[ns]).

@jreback
Copy link
Contributor

jreback commented Jan 4, 2019

right should exclude anything that matches needs_i8_conversion

@TomAugspurger
Copy link
Contributor Author

349f818 updates with

  • changes the parameter name from result to dtype to match the array interface
  • Expands the docstring
  • Adds to the API docs (along with a few others that were missing)

@@ -1447,8 +1447,18 @@ def quantile(self, qs, interpolation='linear', axis=0):
-------
Block
"""
values = self.get_values()
values, _ = self._try_coerce_args(values, values)
if self.is_datetimetz:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is getting super messy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. A proper fix is updating _try_coerce_args / get_values, which I think @jbrockmendel is working on. But this is necessary now to avoid the warning / conversion to object dtype.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the branches I have in progress would help here.

Allowing for DatetimeArray to be reshaped to (1, nrows) would.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a TODO here

doc/source/whatsnew/v0.24.0.rst Show resolved Hide resolved
elif np.array(value).ndim == 2:
# hasattr first, to avoid coercing to ndarray without reason.
# But we may be relying on the ndarray coercion to check ndim.
# Why not just convert to an ndarray earlier on if needed?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a TODO for any section that we should change later

@@ -1447,8 +1447,18 @@ def quantile(self, qs, interpolation='linear', axis=0):
-------
Block
"""
values = self.get_values()
values, _ = self._try_coerce_args(values, values)
if self.is_datetimetz:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a TODO here

pandas/core/series.py Show resolved Hide resolved
assert not array_equivalent(
DatetimeIndex([0, np.nan], tz='CET'), DatetimeIndex(
[0, np.nan], tz='US/Eastern'))
with catch_warnings():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what warning are you catching here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_equivalent calls __array__, so the new deprecation warning comes through.

We don't care about the warning here (the test doesn't care whether they're objects or datetimes), so we just ignore the warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tightened up the filter.

@TomAugspurger
Copy link
Contributor Author

All green. I think things are decent here.

I didn't add a TODO in
#24596 (comment). I could really make sense of what's going on there / when the best time to convert a list-like to an array-like would be.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to merge, but would like to add a couple of TODOs where may need followups.


np.asarray(ser, dtype='datetime64[ns]')

*Future Behavior*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually call this current

@@ -420,6 +421,11 @@ def _hash_categories(categories, ordered=True):
# find a better solution
hashed = hash((tuple(categories), ordered))
return hashed

if is_datetime64tz_dtype(categories.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a TODO here, this is kind of special casing

@jreback jreback merged commit fe29123 into pandas-dev:master Jan 5, 2019
@jreback
Copy link
Contributor

jreback commented Jan 5, 2019

thanks!

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
@jbrockmendel
Copy link
Member

This didn't make it onto #6581, should we enforce it for 1.0?

@TomAugspurger
Copy link
Contributor Author

I don't have a strong opinion.

@TomAugspurger
Copy link
Contributor Author

@jbrockmendel I'll put up a PR enforcing this, just so we have #6581 cleared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Timeseries Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Series.values with tz-aware should return object array of Timestamps
5 participants