[ENH] Partially replace `pd.Int64Index` with `pd.Index` #2339

khrapovs · 2022-03-29T10:13:37Z

Reference Issues/PRs

Partially addresses #2332.

What does this implement/fix? Explain your changes.

I have replaced pd.Int64Index with pd.Index only in obvious and harmless places. More complicated like VALID_INDEX_TYPES in sktime.utils.validation.series are left for another ambitious PR.

Does your contribution introduce a new dependency? If yes, which one?

No

PR checklist

For all contributions

I've added myself to the list of contributors.
Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
I've added unit tests and made sure they pass locally.
The PR title starts with either [ENH], [DOC] or [BUG] indicating wether the PR topic is related to enhancement, documentation or bug

fkiraly · 2022-03-29T11:30:33Z

not entirely harmless, is it? 😄

…_INDEX_TYPES

khrapovs · 2022-03-29T11:36:52Z

not entirely harmless, is it? smile

OMG, what did I got myself into! now I see why others abandoned it :D

khrapovs · 2022-03-29T12:41:32Z

ha! how about that, @fkiraly?

# Conflicts: # sktime/utils/validation/forecasting.py

fkiraly

Take that, Int64Index!

In-principle, this look ok, but - we need to deprecate!

Why: suppose some poor user's highly valuable deployed pipelines are relying entirely on Int64Index for some reason. This PR would completely break it, wouldn't it?

Breakage occurs wherever we do input checks and the index is no longer allowed.
In those cases, we need to raise a warning (deprecation since 0.12, removal in 0.13), while being defensive in the sense of making sktime robust against the pandas deprecation...

In the many cases where we create an index, there is no deprecation necessary.

khrapovs · 2022-03-30T06:48:56Z

@fkiraly Yes, you are right. But, right now there is no way to say whether a user created the index using pd.Index or pd.Int64Index:

type(pd.Index([1, -1]))
<class 'pandas.core.indexes.numeric.Int64Index'>
type(pd.Int64Index([1, -1]))
<input>:1: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
<class 'pandas.core.indexes.numeric.Int64Index'>

Additionally, from https://pandas.pydata.org/docs/reference/api/pandas.Int64Index.html:

Deprecated since version 1.4.0: In pandas v2.0 Int64Index will be removed and NumericIndex used instead. Int64Index will remain fully functional for the duration of pandas 1.x.

But NumericIndex does not even exist yet!

fkiraly · 2022-03-30T10:41:42Z

@khrapovs, I see. That makes things difficult.
Is there any way to not make us break our interface contracts?
How about replacing checks for index types by uses of the pandas method Index.is_integer?

khrapovs · 2022-03-30T11:35:40Z

very good suggestion! I completely overlooked that such a method exists. replaced all usages of is_numeric with is_integer. thanks!

fkiraly

Nice!
I think this could almost go in, but I'm concerned about the maintainability of the type check, where we now repeatedly do sth like isinstance(obj, VALID_TYPES) or (isinstance(obj, Index) and obj.is_integer()).

"Why" is hard to understand to someone who hasn't followed our discussion, and it's also repeated in multiple places across the codebase now (which you probably found via "breakage", but will be difficult to collect again for any future maintainer).

I would hence recommend to bundle this in one function, perhaps
is_valid_fh_index(obj, type="any"), where type can be "relative" or "absolute" - what do you think?

khrapovs · 2022-04-01T07:43:46Z

@fkiraly

Very good! Added two new functions:

sktime.utils.validation.series.is_in_valid_index_types
sktime.utils.validation.series.is_integer_index

The check for RELATIVE_TYPES or ABSOLUTE_TYPES happens only a couple of times in the single file _fh.py. A combined check for either valid index or integer index only happens just two times. I do not think it makes sense to over-engineer for such rare use cases. But if you insist of a function with type argument, I would be happy to write a couple more lines of code.

# Conflicts: # sktime/forecasting/base/_fh.py

fkiraly · 2022-04-01T16:00:32Z

😱

what is this now - seems unrelated?

Maybe let's record this as a sporadic failure?

khrapovs · 2022-04-02T08:14:29Z

scream what is this now - seems unrelated?

Maybe let's record this as a sporadic failure?

Reported here: #2368

sktime/forecasting/base/_fh.py

sktime/forecasting/base/tests/test_fh.py

sktime/utils/validation/series.py

fkiraly

Ok, one last change request:

change is_in_valid_index_types as suggested to include the is_integer_index check
move it to _fh.py, next to the two other functions
then use it in the two other places where you check that manually

fkiraly · 2022-04-02T10:55:00Z

(I'm not insistent about my idea is_valid_fh_index(obj, type="any"), where type can be "relative" or "absolute" - also happy with three loose functions - but just wanted to bring it up again in case it seems more natural now)

khrapovs · 2022-04-03T07:33:42Z

change is_in_valid_index_types as suggested to include the is_integer_index check

done

move it to _fh.py, next to the two other functions

is_in_valid_index_types is very generic and does not have much connection to ForecastingHorizon other than it is used once to check fh values. I would prefer to keep it next to where VALID_INDEX_TYPES are defined. This function is used much more frequently by other modules.

then use it in the two other places where you check that manually

done

(I'm not insistent about my idea is_valid_fh_index(obj, type="any"), where type can be "relative" or "absolute" - also happy with three loose functions - but just wanted to bring it up again in case it seems more natural now)

Yes, I remember about this proposal and even now would prefer to keep the functions as they are now.

fkiraly

I'd be happy to approve now, although I leave the merge decision to you after considering my non-blocking comment below.

is_in_valid_index_types is very generic and does not have much connection to ForecastingHorizon other than it is used once to check fh values. I would prefer to keep it next to where VALID_INDEX_TYPES are defined. This function is used much more frequently by other modules.

Ih that case, I would move the other way round, i.e., rename is_relative_fh_type to is_in_valid_relative_index_types or is_in_valid_index_types_relative, and put the three functions together into the validation module because they are so very similar.

…ute_index_types to utils/validation/series.py

khrapovs · 2022-04-03T12:24:29Z

Ih that case, I would move the other way round, i.e., rename is_relative_fh_type to is_in_valid_relative_index_types or is_in_valid_index_types_relative, and put the three functions together into the validation module because they are so very similar.

Very good. I moved not only is_in_valid_relative_index_types and is_in_valid_absolute_index_types to utils/validation/series.py, but also RELATIVE_INDEX_TYPES and ABSOLUTE_INDEX_TYPES next to VALID_INDEX_TYPES in the same module. I like that!

fkiraly

Great!

I see that, in this newest version, you changed INDEX_TYPE_LOOKUP too, but did not address the consequences of that change.

I think we should address those ramifications, before we merge.

There are a number of places in which a check in INDEX_TYPE_LOOKUP.get(index_type) occurs, this should be using is_in_valid_index_types, no?

khrapovs · 2022-04-03T16:06:19Z

I see that, in this newest version, you changed INDEX_TYPE_LOOKUP too, but did not address the consequences of that change.

I think we should address those ramifications, before we merge.

There are a number of places in which a check in INDEX_TYPE_LOOKUP.get(index_type) occurs, this should be using is_in_valid_index_types, no?

Correct. Thanks for noticing. I have used is_integer_index in two places of test_fh where assert isinstance was testing index type. I have double checked other usages of INDEX_TYPE_LOOKUP and do not see anything I should change there.

sktime/forecasting/base/tests/test_fh.py

fkiraly

See above. Should it not be much shorter than the if/else?
Happy to approve if there is a good reason not to.

fkiraly

Discussion revealed a good reason to leave as is, which resolves the last change request.

* upstream/main: [DOC] Added docstring examples to load data functions (sktime#2393) [ENH] Capability inference for transformer and classifier pipelines (sktime#2367) [ENH] Proba metric grid search integration (sktime#2234) [ENH] Faster classifier example parameters (sktime#2378) [ENH] Get rid of `pd.Int64Index` (sktime#2390) [ENH] Allow `pd.Timedelta` values in `ForecastingHorizon` (sktime#2333) [ENH] Partially replace `pd.Int64Index` with `pd.Index` (sktime#2339) relax name rules for multiindex (sktime#2384)

Stanislav Khrapov added 5 commits March 29, 2022 11:48

replace Int64Index in sktime/forecasting/base/_fh.py

38323bd

replace Int64Index in sktime/utils/datetime.py

3fba403

replace Int64Index in sktime/forecasting/tests/_config.py

36e31f2

replace Int64Index in sktime/forecasting/base/adapters/_statsmodels.py

909781e

replace Int64Index in sktime/utils/_testing/series.py

480b2de

khrapovs requested review from fkiraly, mloning, aiwalter and TonyBagnall as code owners March 29, 2022 10:13

khrapovs mentioned this pull request Mar 29, 2022

[ENH] Address deprecated references in datatypes module #2082

Closed

2 tasks

remove pd.Int64Index from RELATIVE_TYPES and ABSOLUTE_TYPES and VALID…

e8d0cd0

…_INDEX_TYPES

Stanislav Khrapov added 2 commits March 29, 2022 13:47

clean up

cf3c3b0

clean up

cd6a63c

Stanislav Khrapov added 2 commits March 29, 2022 18:08

Merge branch 'upstream-main' into 2338-khrapovs-replace-int64index

1f47de6

# Conflicts: # sktime/utils/validation/forecasting.py

Merge branch 'main' into 2338-khrapovs-replace-int64index

b08bc5b

fkiraly requested changes Mar 29, 2022

View reviewed changes

Merge branch 'main' into 2338-khrapovs-replace-int64index

61681b8

use is_integer instead of is_numeric

385d5a0

Merge branch 'main' into 2338-khrapovs-replace-int64index

dcd6c1c

fkiraly requested changes Apr 1, 2022

View reviewed changes

Stanislav Khrapov added 3 commits April 1, 2022 09:04

Merge branch 'main' into 2338-khrapovs-replace-int64index

7d95e1e

new function is_integer_index

041e7b1

new function is_in_valid_index_types

c4ce5c2

Merge branch 'upstream-main' into 2338-khrapovs-replace-int64index

c4344fc

# Conflicts: # sktime/forecasting/base/_fh.py

Merge branch 'main' into 2338-khrapovs-replace-int64index

2e8b30f

fkiraly reviewed Apr 2, 2022

View reviewed changes

sktime/forecasting/base/_fh.py Outdated Show resolved Hide resolved

fkiraly reviewed Apr 2, 2022

View reviewed changes

sktime/forecasting/base/tests/test_fh.py Outdated Show resolved Hide resolved

fkiraly reviewed Apr 2, 2022

View reviewed changes

sktime/utils/validation/series.py Outdated Show resolved Hide resolved

fkiraly requested changes Apr 2, 2022

View reviewed changes

Stanislav Khrapov added 3 commits April 3, 2022 09:15

Merge branch 'main' into 2338-khrapovs-replace-int64index

caf9bc2

isinstance(x, VALID_INDEX_TYPES) or is_integer_index(x)

2d4836c

use is_in_valid_index_types everywhere

193d4de

fkiraly previously approved these changes Apr 3, 2022

View reviewed changes

Stanislav Khrapov added 2 commits April 3, 2022 14:09

Merge branch 'main' into 2338-khrapovs-replace-int64index

5780916

move functions is_in_valid_relative_index_types and is_in_valid_absol…

e853c22

…ute_index_types to utils/validation/series.py

khrapovs dismissed fkiraly’s stale review via e853c22 April 3, 2022 12:22

Merge branch 'main' into 2338-khrapovs-replace-int64index

d6e631e

fkiraly requested changes Apr 3, 2022

View reviewed changes

use is_integer_index in test_fh

6f285bc

fkiraly reviewed Apr 3, 2022

View reviewed changes

sktime/forecasting/base/tests/test_fh.py Show resolved Hide resolved

fkiraly requested changes Apr 3, 2022

View reviewed changes

Merge branch 'main' into 2338-khrapovs-replace-int64index

ad24fcb

fkiraly approved these changes Apr 4, 2022

View reviewed changes

fkiraly merged commit 4b9003c into sktime:main Apr 4, 2022

khrapovs deleted the 2338-khrapovs-replace-int64index branch April 4, 2022 14:09

lmmentel added the maintenance Continuous integration, unit testing & package distribution label Apr 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Partially replace `pd.Int64Index` with `pd.Index` #2339

[ENH] Partially replace `pd.Int64Index` with `pd.Index` #2339

khrapovs commented Mar 29, 2022 •

edited

fkiraly commented Mar 29, 2022

khrapovs commented Mar 29, 2022

khrapovs commented Mar 29, 2022

fkiraly left a comment

khrapovs commented Mar 30, 2022

fkiraly commented Mar 30, 2022 •

edited

khrapovs commented Mar 30, 2022

fkiraly left a comment •

edited

khrapovs commented Apr 1, 2022

fkiraly commented Apr 1, 2022

khrapovs commented Apr 2, 2022

fkiraly left a comment

fkiraly commented Apr 2, 2022

khrapovs commented Apr 3, 2022

fkiraly left a comment

khrapovs commented Apr 3, 2022

fkiraly left a comment

khrapovs commented Apr 3, 2022

fkiraly left a comment

fkiraly left a comment

[ENH] Partially replace pd.Int64Index with pd.Index #2339

[ENH] Partially replace pd.Int64Index with pd.Index #2339

Conversation

khrapovs commented Mar 29, 2022 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

PR checklist

For all contributions

fkiraly commented Mar 29, 2022

khrapovs commented Mar 29, 2022

khrapovs commented Mar 29, 2022

fkiraly left a comment

Choose a reason for hiding this comment

khrapovs commented Mar 30, 2022

fkiraly commented Mar 30, 2022 • edited

khrapovs commented Mar 30, 2022

fkiraly left a comment • edited

Choose a reason for hiding this comment

khrapovs commented Apr 1, 2022

fkiraly commented Apr 1, 2022

khrapovs commented Apr 2, 2022

fkiraly left a comment

Choose a reason for hiding this comment

fkiraly commented Apr 2, 2022

khrapovs commented Apr 3, 2022

fkiraly left a comment

Choose a reason for hiding this comment

khrapovs commented Apr 3, 2022

fkiraly left a comment

Choose a reason for hiding this comment

khrapovs commented Apr 3, 2022

fkiraly left a comment

Choose a reason for hiding this comment

fkiraly left a comment

Choose a reason for hiding this comment

[ENH] Partially replace `pd.Int64Index` with `pd.Index` #2339

[ENH] Partially replace `pd.Int64Index` with `pd.Index` #2339

khrapovs commented Mar 29, 2022 •

edited

fkiraly commented Mar 30, 2022 •

edited

fkiraly left a comment •

edited