New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Partially replace pd.Int64Index
with pd.Index
#2339
[ENH] Partially replace pd.Int64Index
with pd.Index
#2339
Conversation
ha! how about that, @fkiraly? |
# Conflicts: # sktime/utils/validation/forecasting.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take that, Int64Index
!
In-principle, this look ok, but - we need to deprecate!
Why: suppose some poor user's highly valuable deployed pipelines are relying entirely on Int64Index
for some reason. This PR would completely break it, wouldn't it?
Breakage occurs wherever we do input checks and the index is no longer allowed.
In those cases, we need to raise a warning (deprecation since 0.12, removal in 0.13), while being defensive in the sense of making sktime robust against the pandas
deprecation...
In the many cases where we create an index, there is no deprecation necessary.
@fkiraly Yes, you are right. But, right now there is no way to say whether a user created the index using
Additionally, from https://pandas.pydata.org/docs/reference/api/pandas.Int64Index.html:
But |
@khrapovs, I see. That makes things difficult. |
very good suggestion! I completely overlooked that such a method exists. replaced all usages of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I think this could almost go in, but I'm concerned about the maintainability of the type check, where we now repeatedly do sth like isinstance(obj, VALID_TYPES) or (isinstance(obj, Index) and obj.is_integer())
.
"Why" is hard to understand to someone who hasn't followed our discussion, and it's also repeated in multiple places across the codebase now (which you probably found via "breakage", but will be difficult to collect again for any future maintainer).
I would hence recommend to bundle this in one function, perhaps
is_valid_fh_index(obj, type="any")
, where type
can be "relative"
or "absolute"
- what do you think?
Very good! Added two new functions:
The check for |
# Conflicts: # sktime/forecasting/base/_fh.py
Reported here: #2368 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, one last change request:
- change
is_in_valid_index_types
as suggested to include theis_integer_index
check - move it to
_fh.py
, next to the two other functions - then use it in the two other places where you check that manually
(I'm not insistent about my idea |
done
done
Yes, I remember about this proposal and even now would prefer to keep the functions as they are now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to approve now, although I leave the merge decision to you after considering my non-blocking comment below.
is_in_valid_index_types
is very generic and does not have much connection toForecastingHorizon
other than it is used once to check fh values. I would prefer to keep it next to whereVALID_INDEX_TYPES
are defined. This function is used much more frequently by other modules.
Ih that case, I would move the other way round, i.e., rename is_relative_fh_type
to is_in_valid_relative_index_types
or is_in_valid_index_types_relative
, and put the three functions together into the validation
module because they are so very similar.
…ute_index_types to utils/validation/series.py
Very good. I moved not only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
I see that, in this newest version, you changed INDEX_TYPE_LOOKUP
too, but did not address the consequences of that change.
I think we should address those ramifications, before we merge.
There are a number of places in which a check in INDEX_TYPE_LOOKUP.get(index_type)
occurs, this should be using is_in_valid_index_types,
no?
Correct. Thanks for noticing. I have used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above. Should it not be much shorter than the if/else?
Happy to approve if there is a good reason not to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion revealed a good reason to leave as is, which resolves the last change request.
* upstream/main: [DOC] Added docstring examples to load data functions (sktime#2393) [ENH] Capability inference for transformer and classifier pipelines (sktime#2367) [ENH] Proba metric grid search integration (sktime#2234) [ENH] Faster classifier example parameters (sktime#2378) [ENH] Get rid of `pd.Int64Index` (sktime#2390) [ENH] Allow `pd.Timedelta` values in `ForecastingHorizon` (sktime#2333) [ENH] Partially replace `pd.Int64Index` with `pd.Index` (sktime#2339) relax name rules for multiindex (sktime#2384)
Reference Issues/PRs
Partially addresses #2332.
What does this implement/fix? Explain your changes.
I have replaced
pd.Int64Index
withpd.Index
only in obvious and harmless places. More complicated likeVALID_INDEX_TYPES
insktime.utils.validation.series
are left for another ambitious PR.Does your contribution introduce a new dependency? If yes, which one?
No
PR checklist
For all contributions