DEPR: Remove int64 uint64 float64 index part 1 #49560

topper-123 · 2022-11-06T22:32:20Z

This PR makes progress towards removing Int64Index, UInt64Index & FloatIndex from the code base, see #42717.

In this PR I make instantiation of Index with numpy numeric dtypes return a NumericIndex with the given dtype rather than converting to the old 64 bit only numeric index types, making all numeric dtypes available for indexes.

This is just the first part of the changes planned. In follow-ups PRs I will:

remove Int64Index, Uint64Index & Float64Index from the top namespace (done, DEPR: remove Int64Index, UInt64Index, Float64Index from public namespace #49670)
Remove int64Index, Uint64Index & Float64Index from the tess (partially done in DEPR: remove Int64Index, UInt64Index, Float64Index from tests.indexes.numeric #49682 & DEPR: remove Int|Uint|Float64Index from conftest & _testing #49678, still some work needed)
remove Int64Index, Uint64Index & Float64Index from the code base entirely (also internally)
Move the functionality of NumericIndexinto the base Index and remove NumericIndex
Update docs to reflect the above changes

related PRs

A part of the problems I had in #49494 was a few corner case bugs. They're pulled out into seperate PRs (#49536 & #49540). They're present in this PR at present, but will be removed after those seperate PRs have been merged.

Notable changes from this PR:

As mentioned in #49494, one notable change from the PR is that because we now have all the numeric dtypes available for indexes, indexes created from DateTimeIndex (day, year etc.) will now be in int32, where previously they were forced to int64 (because that was the only index integer dtype available).

Using int32 is the more correct way, because the underlying DateTimeArray returns 32bit arrays. An example of this:

>>> import pandas as pd
>>> 
>>> x = pd.date_range(start='1/1/2018', end='1/08/2018')
>>> x.array.day
array([1, 2, 3, 4, 5, 6, 7, 8], dtype=int32)  # 32bit, current and will not be changed in this series of PRs
>>> x.day
Int64Index([1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')  # before this PR
NumericIndex([1, 2, 3, 4, 5, 6, 7, 8], dtype='int32')  # after this PR
Index([1, 2, 3, 4, 5, 6, 7, 8], dtype='int32')  # after follow-up PRs

The above will be included in the doc changes, when I do the docs

Progress towards #42717. See also #41272

jbrockmendel · 2022-11-09T23:57:09Z

Out of curiosity: the approach I would have taken would have been to start with changing Index.__new__ to return NumericIndex instead of subclasses. Would that be counter-productive?

topper-123 · 2022-11-10T08:54:40Z

Out of curiosity: the approach I would have taken would have been to start with changing Index.__new__ to return NumericIndex instead of subclasses. Would that be counter-productive?

Yes, possibly. My main problem is that on my computer intp means int64, while on some machines intp means int32. Also there have been some smaller bugs that I've found working on this that slowed me down.

I try again tonight or tomorrow and I will try the Index.__new__ approach if I'm still having problems. Thanks for the suggestion.

topper-123 · 2022-12-01T20:57:39Z

Alright, it's all green finally! The failure in Python Dev / actions-311-dev (macOS-latest) (pull_request) says ERROR: Could not find a version that satisfies the requirement setuptools==59.2.0, so I assume unrelated to the PR).

This is quite a big PR, with lot of dtypes changes stemming from the fact that in this PR Index is able to return indexes with dtypes int8/int16/int32/uint8/uint16/uint32/float16/float32. There are some follow-ups after this as explained in the OP, but those should be much easier than this PR, because all the tricky dtype stuff is handled in this PR.

pandas/core/indexes/base.py

topper-123 · 2023-01-11T17:40:33Z

Updated after #50195 was merged + rebased. The remaining failure looks unrelated.

jbrockmendel · 2023-01-12T01:29:26Z

pandas/core/indexes/base.py

-        if self._is_backward_compat_public_numeric_index:
-            # this block is needed so e.g. NumericIndex[int8].astype("int32") returns
-            # NumericIndex[int32] and not Int64Index with dtype int64.
+        if not self._is_backward_compat_public_numeric_index and not isinstance(


so this chunk of code is going to be removed in a follow-up right?

Yes, It will be removed in the followup #50052.

jbrockmendel · 2023-01-12T01:31:16Z

pandas/tests/arrays/sparse/test_accessor.py

@@ -212,7 +217,15 @@ def test_series_from_coo(self, dtype, dense_index):

        A = scipy.sparse.eye(3, format="coo", dtype=dtype)
        result = pd.Series.sparse.from_coo(A, dense_index=dense_index)
-        index = pd.MultiIndex.from_tuples([(0, 0), (1, 1), (2, 2)])
+
+        index_dtype = np.int64 if dense_index else np.int32


is it clear we want these to be different? if not, can you add a TODO comment to that effect

I agree, I think they should be the same, but we need to decide which dtype should be used.

We have for both 32-bit and 64-bit systems:

>>> import scipy.sparse >>> A = scipy.sparse.eye(3, format="coo", dtype=dtype) >>> A.row, A.col (array([0, 1, 2], dtype=int32), array([0, 1, 2], dtype=int32))

I.e. scipy.sparse always uses int32 for index, also on 64-bit systems. The pandas way OTOH is to default to 64-bit dtypes, if an index dtype hasn't been specified, also on 32-bit systems.

So the question is if we should use the pandas or the scipy convention for index dtype when calling pd.Series.sparse.from_coo.

My intuition is to follow the scipy way is this instance (i.e. use 32-bit), as pd.Series.sparse.from_coo is about converting an existing scipy data structure, so we should follow that data structure's conventions, where possible. What do you think?

id add a TODO comment saying roughly what you said here and then punt on it

jbrockmendel · 2023-01-12T01:32:23Z

pandas/tests/extension/decimal/test_decimal.py

@@ -445,7 +445,8 @@ def DecimalArray__my_sum(self):
    result = df.groupby("id")["decimals"].agg(lambda x: x.values.my_sum())
    tm.assert_series_equal(result, expected, check_names=False)
    s = pd.Series(DecimalArray(data))
-    result = s.groupby(np.array([0, 0, 0, 1, 1])).agg(lambda x: x.values.my_sum())
+    grouper = np.array([0, 0, 0, 1, 1], dtype=np.int64)


i think i asked elsewhere: specifying this is benign, just affects result.index.dtype or something like that?

Yes, this just affects result.index.dtype.

jbrockmendel · 2023-01-12T01:37:22Z

pandas/tests/frame/methods/test_to_csv.py

@@ -388,7 +388,7 @@ def test_to_csv_dup_cols(self, nrows):

    @pytest.mark.slow
    def test_to_csv_empty(self):
-        df = DataFrame(index=np.arange(10))
+        df = DataFrame(index=np.arange(10, dtype=np.int64))


this is bc when we do to_csv followed by read_csv it doesnt roundtrip int32?

Yes, read_csv can't know the dtypes of its read-in columns, so it gives dtype int64 to integer columns and indexes.

jbrockmendel · 2023-01-12T01:39:05Z

pandas/tests/groupby/aggregate/test_cython.py

    # add / sum
    result = df.groupby(pd.cut(df["a"], grps), observed=observed)._cython_agg_general(
        "sum", alt=None, numeric_only=True
    )
-    intervals = pd.interval_range(0, 20, freq=5)
+    intervals = pd.IntervalIndex.from_breaks(np.arange(0, 21, 5))


i thought IntervalArray converted 32bit ints to 64?

Yes, that is true. This changed line is from before IntervalIndex was ensured to be 64-bit, so this change is not needed anymore; the old and the new line are equivalent now. I'll revert this.

jbrockmendel · 2023-01-12T03:17:12Z

pandas/tests/window/test_groupby.py

        # https://github.com/twosigma/pandas/issues/53
+        dtype = np.dtype(any_int_numpy_dtype).type


can we call this something other than dtype, since it isnt one

Ok, I can call it typ.

jbrockmendel · 2023-01-12T03:20:25Z

pandas/tests/groupby/test_nunique.py

@@ -172,7 +172,7 @@ def test_nunique_preserves_column_level_names():
    # GH 23222
    test = DataFrame([1, 2, 2], columns=pd.Index(["A"], name="level_0"))
    result = test.groupby([0, 0, 0]).nunique()
-    expected = DataFrame([2], columns=test.columns)
+    expected = DataFrame([2], index=np.array([0], dtype=np.int_), columns=test.columns)


is the np.int_ here needed? ive been assuming that the index=np.array([0]) ive been seeing has been there to ensure the indexes get np.int_ dtype.

No, not needed anymore. This is from before #49815 was merged. I'll update it.

topper-123 · 2023-01-12T09:06:50Z

I'll wait for your response to the question about the pd.Series.sparse.from_coo question before I'll push an update for the PR.

jbrockmendel · 2023-01-12T17:32:11Z

pandas/tests/reshape/test_cut.py

@@ -87,7 +87,9 @@ def test_bins_from_interval_index_doc_example():
    # Make sure we preserve the bins.
    ages = np.array([10, 15, 13, 12, 23, 25, 28, 59, 60])
    c = cut(ages, bins=[0, 18, 35, 70])
-    expected = IntervalIndex.from_tuples([(0, 18), (18, 35), (35, 70)])
+    expected = IntervalIndex.from_arrays(
+        left=np.array([0, 18, 35]), right=np.array([18, 35, 70])


not a big deal but why is this necessary?

Yeah, not needed after #50195. I'll revert.

jbrockmendel · 2023-01-12T17:33:45Z

pandas/tests/groupby/test_function.py

+    [
+        ([0, 1, 2, 3], [0, 0, 1, 1]),
+        ([0], [0]),
+        *[


i find this pattern difficult to read. is there a viable alternative?

maybe add a new param @pytest.mark.parametrize("dtype", [None] + tm.ALL_INT_NUMPY_DTYPES) and then

if dtype is not None: data = np.array(data, dtype=dtype) groups = np.array(groups, dtype=dtype)

Yes, I can do that.

jbrockmendel · 2023-01-12T17:37:19Z

pandas/tests/indexes/interval/test_constructors.py

+            (timedelta_range("1 day", periods=10), "<m8[ns]"),
+        ]
+    )
+    def breaks_and_expected_subtype(self, request):


is this the same as the one above?

This can be removed entirely now. I'll update.

jbrockmendel · 2023-01-12T17:37:44Z

pandas/tests/indexes/interval/test_interval_range.py

@@ -29,7 +29,8 @@ class TestIntervalRange:
    @pytest.mark.parametrize("freq, periods", [(1, 100), (2.5, 40), (5, 20), (25, 4)])
    def test_constructor_numeric(self, closed, name, freq, periods):
        start, end = 0, 100
-        breaks = np.arange(101, step=freq)
+        dtype = np.int64 if is_integer(freq) else np.float64


same as elsewhere, isnt this done automatically now?

Yes, after #50195... I'll revert.

jbrockmendel

Some questions, no blockers. LGTM cc @mroeschke

pandas/tests/apply/test_series_apply.py

topper-123 · 2023-01-13T10:34:47Z

Updated. All comments have been addressed AFAIKT.

mroeschke · 2023-01-13T18:13:07Z

Awesome, thanks @topper-123

jbrockmendel · 2023-01-13T18:22:04Z

this is huge, thanks @topper-123

topper-123 changed the title ~~Remove int64 uint64 float64 index part 1~~ DEPR: Remove int64 uint64 float64 index part 1 Nov 6, 2022

topper-123 force-pushed the remove_int64_uint64_float64_index_part_1 branch from 844ab1b to 5ff5584 Compare November 6, 2022 23:42

topper-123 mentioned this pull request Nov 6, 2022

DEPR: move NumericIndexes into base Index, part 1 #49494

Closed

topper-123 force-pushed the remove_int64_uint64_float64_index_part_1 branch from 0dc2660 to b332214 Compare November 7, 2022 05:26

mroeschke added the Deprecate Functionality to remove in pandas label Nov 7, 2022

topper-123 mentioned this pull request Nov 8, 2022

BUG/API: Indexes on empty frames/series should be RangeIndex, are Index[object] #49572

Closed

3 tasks

MikaelUmaN mentioned this pull request Nov 8, 2022

BUG: pd.Index(dtype=np.int64) cannot be used in ops with pd.Index(dtype=Int64Dtype()) #49576

Closed

3 tasks

topper-123 force-pushed the remove_int64_uint64_float64_index_part_1 branch from 527be72 to df7afae Compare November 8, 2022 12:42

topper-123 force-pushed the remove_int64_uint64_float64_index_part_1 branch 7 times, most recently from 73bde44 to 99c82ae Compare November 16, 2022 21:53

This was referenced Nov 16, 2022

DEPR: remove Int64Index, UInt64Index, Float64Index from public namespace #49670

Merged

BUG: NumericIndex should not support float16 dtype #49536

Merged

API: read_stata with index_col=None should return RangeIndex #49745

Merged

topper-123 force-pushed the remove_int64_uint64_float64_index_part_1 branch from b421767 to 7e11517 Compare November 17, 2022 13:49

This was referenced Nov 19, 2022

TST: test UInt64Index in tests/indexes/interval/test_constructors.py #49785

Closed

TST: test UInt64Index in tests/indexes/interval/test_constructors.py #49786

Closed

TST: add tests/arrays/interval/test_constructors.py #49788

Closed

topper-123 force-pushed the remove_int64_uint64_float64_index_part_1 branch 2 times, most recently from f81f0df to 25c71c7 Compare November 20, 2022 18:55

topper-123 force-pushed the remove_int64_uint64_float64_index_part_1 branch 2 times, most recently from 9a64c7f to 8020af4 Compare December 1, 2022 03:27

jbrockmendel reviewed Dec 1, 2022

View reviewed changes

pandas/core/indexes/base.py Outdated Show resolved Hide resolved

fix groupby value_counts

4c337f6

topper-123 force-pushed the remove_int64_uint64_float64_index_part_1 branch from 6655dfe to 4c337f6 Compare January 11, 2023 15:01

jbrockmendel reviewed Jan 12, 2023

View reviewed changes

jbrockmendel approved these changes Jan 12, 2023

View reviewed changes

mroeschke reviewed Jan 12, 2023

View reviewed changes

pandas/tests/apply/test_series_apply.py Outdated Show resolved Hide resolved

mroeschke reviewed Jan 12, 2023

View reviewed changes

pandas/tests/apply/test_series_apply.py Outdated Show resolved Hide resolved

topper-123 added 4 commits January 12, 2023 18:39

fix comments

2608e9a

fix more comments

9d67e9a

fix stuff

65b1f61

fix

5c64113

mroeschke approved these changes Jan 13, 2023

View reviewed changes

mroeschke merged commit d010c4a into pandas-dev:main Jan 13, 2023

topper-123 deleted the remove_int64_uint64_float64_index_part_1 branch January 15, 2023 01:15

topper-123 mentioned this pull request Jan 21, 2023

API: Harmonize dtype for index levels for Series.sparse.from_coo #50926

Merged

MarcoGorelli mentioned this pull request Mar 27, 2023

BUG: can create IntervalDtype[float32] but not IntervalArray[float32] #45412

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPR: Remove int64 uint64 float64 index part 1 #49560

DEPR: Remove int64 uint64 float64 index part 1 #49560

topper-123 commented Nov 6, 2022 •

edited

jbrockmendel commented Nov 9, 2022

topper-123 commented Nov 10, 2022

topper-123 commented Dec 1, 2022

topper-123 commented Jan 11, 2023

jbrockmendel Jan 12, 2023

topper-123 Jan 12, 2023

jbrockmendel Jan 12, 2023

topper-123 Jan 12, 2023

jbrockmendel Jan 12, 2023

jbrockmendel Jan 12, 2023

topper-123 Jan 12, 2023

jbrockmendel Jan 12, 2023

topper-123 Jan 12, 2023 •

edited

jbrockmendel Jan 12, 2023

topper-123 Jan 12, 2023

jbrockmendel Jan 12, 2023

topper-123 Jan 12, 2023

jbrockmendel Jan 12, 2023

topper-123 Jan 12, 2023 •

edited

topper-123 commented Jan 12, 2023

jbrockmendel Jan 12, 2023

topper-123 Jan 12, 2023

jbrockmendel Jan 12, 2023

jbrockmendel Jan 12, 2023

topper-123 Jan 12, 2023

jbrockmendel Jan 12, 2023

topper-123 Jan 12, 2023

jbrockmendel Jan 12, 2023

topper-123 Jan 12, 2023

jbrockmendel left a comment

topper-123 commented Jan 13, 2023

mroeschke commented Jan 13, 2023

jbrockmendel commented Jan 13, 2023

		# https://github.com/twosigma/pandas/issues/53
		dtype = np.dtype(any_int_numpy_dtype).type

DEPR: Remove int64 uint64 float64 index part 1 #49560

DEPR: Remove int64 uint64 float64 index part 1 #49560

Conversation

topper-123 commented Nov 6, 2022 • edited

related PRs

Notable changes from this PR:

jbrockmendel commented Nov 9, 2022

topper-123 commented Nov 10, 2022

topper-123 commented Dec 1, 2022

topper-123 commented Jan 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topper-123 Jan 12, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topper-123 Jan 12, 2023 • edited

Choose a reason for hiding this comment

topper-123 commented Jan 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel left a comment

Choose a reason for hiding this comment

topper-123 commented Jan 13, 2023

mroeschke commented Jan 13, 2023

jbrockmendel commented Jan 13, 2023

topper-123 commented Nov 6, 2022 •

edited

topper-123 Jan 12, 2023 •

edited

topper-123 Jan 12, 2023 •

edited