This is a recently introduced model so the API hasn't been tested extensively. There may be some bugs or slight breaking changes to fix it in the future. If you see something strange, file a Github Issue.
The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting.
Tips:
- Similar to other models in the library, [
TimeSeriesTransformerModel
] is the raw Transformer without any head on top, and [TimeSeriesTransformerForPrediction
] adds a distribution head on top of the former, which can be used for time-series forecasting. Note that this is a so-called probabilistic forecasting model, not a point forecasting model. This means that the model learns a distribution, from which one can sample. The model doesn't directly output values. - [
TimeSeriesTransformerForPrediction
] consists of 2 blocks: an encoder, which takes acontext_length
of time series values as input (calledpast_values
), and a decoder, which predicts aprediction_length
of time series values into the future (calledfuture_values
). During training, one needs to provide pairs of (past_values
andfuture_values
) to the model. - In addition to the raw (
past_values
andfuture_values
), one typically provides additional features to the model. These can be the following:past_time_features
: temporal features which the model will add topast_values
. These serve as "positional encodings" for the Transformer encoder. Examples are "day of the month", "month of the year", etc. as scalar values (and then stacked together as a vector). e.g. if a given time-series value was obtained on the 11th of August, then one could have [11, 8] as time feature vector (11 being "day of the month", 8 being "month of the year").future_time_features
: temporal features which the model will add tofuture_values
. These serve as "positional encodings" for the Transformer decoder. Examples are "day of the month", "month of the year", etc. as scalar values (and then stacked together as a vector). e.g. if a given time-series value was obtained on the 11th of August, then one could have [11, 8] as time feature vector (11 being "day of the month", 8 being "month of the year").static_categorical_features
: categorical features which are static over time (i.e., have the same value for allpast_values
andfuture_values
). An example here is the store ID or region ID that identifies a given time-series. Note that these features need to be known for ALL data points (also those in the future).static_real_features
: real-valued features which are static over time (i.e., have the same value for allpast_values
andfuture_values
). An example here is the image representation of the product for which you have the time-series values (like the ResNet embedding of a "shoe" picture, if your time-series is about the sales of shoes). Note that these features need to be known for ALL data points (also those in the future).
- The model is trained using "teacher-forcing", similar to how a Transformer is trained for machine translation. This means that, during training, one shifts the
future_values
one position to the right as input to the decoder, prepended by the last value ofpast_values
. At each time step, the model needs to predict the next target. So the set-up of training is similar to a GPT model for language, except that there's no notion ofdecoder_start_token_id
(we just use the last value of the context as initial input for the decoder). - At inference time, we give the final value of the
past_values
as input to the decoder. Next, we can sample from the model to make a prediction at the next time step, which is then fed to the decoder in order to make the next prediction (also called autoregressive generation).
This model was contributed by [kashif](<https://huggingface.co/kashif).
[[autodoc]] TimeSeriesTransformerConfig
[[autodoc]] TimeSeriesTransformerModel - forward
[[autodoc]] TimeSeriesTransformerForPrediction - forward