Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Implement ForecastingHorizon.to_indexer method for non-integer forecasting horizons #2376

Open
khrapovs opened this issue Apr 3, 2022 · 2 comments
Labels
enhancement Adding new functionality

Comments

@khrapovs
Copy link
Contributor

khrapovs commented Apr 3, 2022

Is your feature request related to a problem? Please describe.

In #2333 I have generalized ForecastingHorizon to accept non-integer forecasting horizons, e.g. pd.Timedelta. The only method which is not adjusted is ForecastingHorizon.to_indexer. In the referenced PR it raises NotImplementedError instead. The main reason I left it at that is that personally I do not understand the meaning of this method for non-integer indices. Let's say we have a time series with the following date index: [Jan 1, Jan 2, Jan 2, Jan 5]. The integer index here is [0, 1, 2, 3]. But if I start thinking in relative time-based terms, say fh = 1 day, and a cutoff = Jan 2, then indexing into the this time series is nothing but defining a time threshold and simply filtering out either values before or after this threshold, say Jan 3, or y[y.index < cutoff + fh]. In such arithmetic integer index is just not very helpful.

Describe the solution you'd like

If I saw the solution, I would have implemented it straight away. Unfortunately, I do not see one without digging deep into how this method is used in other places of the library.

@khrapovs khrapovs added the enhancement Adding new functionality label Apr 3, 2022
@fkiraly
Copy link
Collaborator

fkiraly commented Apr 3, 2022

I looked into this, and I think I know the answer.

to_indexer returns the loc indices of the prediction object.

Since fh pre-Staislav was iloc indexing, there are three possible things the internal interfaces ask for often:

  • absolute integer indices, these are iloc references -> to_absolute
  • relative integer indices, these are iloc differences -> to_relative
  • absolute time stamp indices, these are loc references -> to_indexer

In the case where we use loc indexing (time stamps), these are analogous to:

  • absolute time stamps, loc references
  • relative time stamps, loc differences
  • absolute time stamps, i.e., the same as the first!

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 3, 2022

I also note, in a case where we pass integer based indices, these can be ambiguous. Because integers can be iloc references as well as loc references, you have a difference whenever the data frame index is integer but not range.

I hence wonder, should the fh get a possible constructor argument and parameter indexer_mode which can be "iloc" and "loc"? For non-integer arguments it is always "loc", for integer it is by default "iloc" unless the flag is set to "loc" by the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality
Projects
None yet
Development

No branches or pull requests

2 participants