Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: move NumericIndexes into base Index, part 1 #49494

Closed

Conversation

topper-123
Copy link
Contributor

A part of the changes in pandas 2.0 will be that the current numeric indexes (Int64Index, Uint64Index & Float64Index) will be removed and their functionality added to the base Index. In addition, the base index should be able to handle all numpy numeric dtypes (int8, int16 tc.), not just the current int64, uint64 & float64.

This PR makes progress towards that. In this PR I:

  • remove Int64Index, Uint64Index & Float64Index from the top namespace
  • Make instantiation of Index with numpy numeric dtypes return a NumericIndex with the given dtype rather thanconverting to the old 64 bit only numeric index types, making all numeric dtypes available for indexes.

This is just the first part of the complete changes planned. In follow-ups PRs I will:

  • remove Int64Index, Uint64Index & Float64Index from the code base entirely (also internally)
  • Move the functionality of NumericIndexinto the base Index and remove NumericIndex
  • Update docs to reflect the above changes

Notable changes from this PR:

One notable change from the PR is that because we now have all the numeric dtypes available for indexes, indexes created from DateTimeIndex (day, year etc.) will now be in int32, where previously they were forced to int64 (because that was the only index integer dtype available).

Using int32 is the more correct way, because the underlying DateTimeArray returns 32bit arrays. An example of this:

>>> import pandas as pd
>>> 
>>> x = pd.date_range(start='1/1/2018', end='1/08/2018')
>>> x.array.day
array([1, 2, 3, 4, 5, 6, 7, 8], dtype=int32)  # 32bit, current and will not be changed in this series of PRs
>>> x.day
Int64Index([1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')  # before this PR
NumericIndex([1, 2, 3, 4, 5, 6, 7, 8], dtype='int32')  # after this PR
Index([1, 2, 3, 4, 5, 6, 7, 8], dtype='int32')  # after follow-up PRs

xref #42717, #41272

@jbrockmendel
Copy link
Member

looks like a couple of sparse tests are failing

@topper-123
Copy link
Contributor Author

topper-123 commented Nov 3, 2022

Yeah, test suite passed locally, trying to figure out what's the difference. Probably I need to install some package locally, so tests are skipped locally.

UPDATE: everything passes locally still, so I'm a bit stomped why this fails in Github. I'll look again tomorrow.

@mroeschke mroeschke added Refactor Internal refactoring of code Deprecate Functionality to remove in pandas labels Nov 3, 2022
@@ -1582,7 +1582,7 @@ def _equal_values(self: BlockManager, other: BlockManager) -> bool:
def quantile(
self: T,
*,
qs: Float64Index,
qs: NumericIndex,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment about dtype?

@topper-123
Copy link
Contributor Author

Closing in favor of #49560.

@topper-123 topper-123 closed this Nov 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants