Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: can create IntervalDtype[float32] but not IntervalArray[float32] #45412

Open
jbrockmendel opened this issue Jan 17, 2022 · 16 comments
Open
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type Regression Functionality that used to work in a prior pandas version

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Jan 17, 2022

dtype = pd.core.dtypes.common.pandas_dtype("interval[float32, right]")

ii = pd.interval_range(1, 10)

>>> ii.astype(dtype).dtype
interval[float64, right]

Best guess is it is because we are going through Index constructor.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 17, 2022
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Jan 17, 2022
@simonjayhawkins
Copy link
Member

This is a regression from 1.2.5

first bad commit: [839c1bd] ENH: make "closed" part of IntervalDtype (#38394)

@simonjayhawkins simonjayhawkins added Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 17, 2022
@simonjayhawkins
Copy link
Member

This previously raised TypeError: data type 'interval[float32, right]' not understood, but since closed was made part of the dtype the bisect may not be appropriate.

before that commit interval[float32] didn't raise and create a float32 dtype, so may need to go back further.

@jbrockmendel
Copy link
Member Author

I doubt this ever worked since it always went through pd.Index. Starting in 2.0 it should Just Work. We could get it working in 1.5 by dispatching to NumericIndex instead of pd.Index in relevant cases.

@jbrockmendel
Copy link
Member Author

@topper-123 did one of your recent IntervalArray PRs handle this?

@topper-123
Copy link
Contributor

I can take a look at it tonight.

@topper-123
Copy link
Contributor

topper-123 commented Jan 18, 2023

I did not change IntervalDtype in my PR, but clearly this is slipping through, even though it shouldn't.

The question is if InterValDtype(np.float32) should be allowed? I see:

>>> iv_arr = pd.arrays.IntervalArray([pd.Interval(0, 1), pd.Interval(1, 5)])
>>> dtype = pd.IntervalDtype(np.float32)
>>> iv_arr.astype(dtype).dtype
interval[float64, right]  # 1.5.2
interval[float32, right]  # main

so something that shouldn't change, has changed. I think the IntervalIndex/IntervalArray constructor tests are insufficient and this should have been caught in the tests. So IMO, we should get those constructor tests up and/or improved.

Also, does IntervalDtype(np.float32) make sense? Maybe doing pd.IntervalDtype(np.float32) should emit pd.IntervalDtype(np.float64), i.e. do an automatic up-conversion to 64-bit?

@jbrockmendel
Copy link
Member Author

Maybe doing pd.IntervalDtype(np.float32) should emit pd.IntervalDtype(np.float64), i.e. do an automatic up-conversion to 64-bit?

We've gotten rid of most of the places that do this. I'd like to get rid of them all.

If an Interval[foo32] array cant be created, the dtype should raise too.

@topper-123
Copy link
Contributor

That would be backwards incompatible (not against it and it's probably ok given taht this is a 2.0 release).

@jbrockmendel
Copy link
Member Author

yah might as well go for it in 2.0

@jbrockmendel jbrockmendel added this to the 2.0 milestone Jan 23, 2023
@mroeschke
Copy link
Member

@topper-123 IIRC you fixed this already?

In [3]: >>> iv_arr = pd.arrays.IntervalArray([pd.Interval(0, 1), pd.Interval(1, 5)])
   ...: >>> dtype = pd.IntervalDtype(np.float32)
   ...: >>> iv_arr.astype(dtype).dtype
Out[3]: interval[float32, right]

@topper-123
Copy link
Contributor

I’ll look into it tonight.

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Mar 27, 2023

this was fixed by #49560

https://www.kaggle.com/code/marcogorelli/pandas-regression-example?scriptVersionId=123565724

OK to remove the 2.0 milestone?

removing it for now, don't think it's a blocker, but please do correct me if I'm wrong

@MarcoGorelli MarcoGorelli removed this from the 2.0 milestone Mar 27, 2023
@topper-123
Copy link
Contributor

topper-123 commented Mar 27, 2023

This isn't fixed yet (the 32bit dtype can still be created), but +1 to moving this from 2.0. This isn't urgent in my view.

@jbrockmendel
Copy link
Member Author

It now is possible to create a Interval[float32] array, so this may be OK?

@MarcoGorelli
Copy link
Member

I may be misunderstanding, but this looks like fine now? @topper-123 could you please clarify what's outstanding?

@jbrockmendel
Copy link
Member Author

I checked and we don't have any tests that instantiate a 32bit dtype. (only relevant if we decide to keep allowing it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants