Skip to content

Scalable machine πŸ€– learning for time series forecasting.

License

Notifications You must be signed in to change notification settings

tiefenthaler/mlforecast

Β 
Β 

Repository files navigation

Nixtla Β  Tweet Β Slack

Machine Learning πŸ€– Forecast

Scalable machine learning for time series forecasting

CI Python PyPi conda-forge License

mlforecast is a framework to perform time series forecasting using machine learning models, with the option to scale to massive amounts of data using remote clusters.

Install

PyPI

pip install mlforecast

If you want to perform distributed training, you can instead use pip install "mlforecast[distributed]", which will also install dask. Note that you’ll also need to install either LightGBM or XGBoost.

conda-forge

conda install -c conda-forge mlforecast

Note that this installation comes with the required dependencies for the local interface. If you want to perform distributed training, you must install dask (conda install -c conda-forge dask) and either LightGBM or XGBoost.

Quick Start

Minimal Example

import lightgbm as lgb

from mlforecast import MLForecast
from sklearn.linear_model import LinearRegression

mlf = MLForecast(
    models = [LinearRegression(), lgb.LGBMRegressor()],
    lags=[1, 12],
    freq = 'M'
)
mlf.fit(df)
mlf.predict(12)

Get Started with this quick guide.

Follow this end-to-end walkthrough for best practices.

Why?

Current Python alternatives for machine learning models are slow, inaccurate and don’t scale well. So we created a library that can be used to forecast in production environments. MLForecast includes efficient feature engineering to train any machine learning model (with fit and predict methods such as sklearn) to fit millions of time series.

Features

  • Fastest implementations of feature engineering for time series forecasting in Python.
  • Out-of-the-box compatibility with Spark, Dask, and Ray.
  • Probabilistic Forecasting with Conformal Prediction.
  • Support for exogenous variables and static covariates.
  • Familiar sklearn syntax: .fit and .predict.

Missing something? Please open an issue or write us in Slack

Examples and Guides

πŸ“š End to End Walkthrough: model training, evaluation and selection for multiple time series.

πŸ”Ž Probabilistic Forecasting: use Conformal Prediction to produce prediciton intervals.

πŸ‘©β€πŸ”¬ Cross Validation: robust model’s performance evaluation.

πŸ”Œ Predict Demand Peaks: electricity load forecasting for detecting daily peaks and reducing electric bills.

πŸ“ˆ Transfer Learning: pretrain a model using a set of time series and then predict another one using that pretrained model.

🌑️ Distributed Training: use a Dask cluster to train models at scale.

How to use

The following provides a very basic overview, for a more detailed description see the documentation.

Data setup

Store your time series in a pandas dataframe in long format, that is, each row represents an observation for a specific serie and timestamp.

from mlforecast.utils import generate_daily_series

series = generate_daily_series(
    n_series=20,
    max_length=100,
    n_static_features=1,
    static_as_categorical=False,
    with_trend=True
)
series.head()
unique_id ds y static_0
0 id_00 2000-01-01 1.751917 72
1 id_00 2000-01-02 9.196715 72
2 id_00 2000-01-03 18.577788 72
3 id_00 2000-01-04 24.520646 72
4 id_00 2000-01-05 33.418028 72

Models

Next define your models. If you want to use the local interface this can be any regressor that follows the scikit-learn API. For distributed training there are LGBMForecast and XGBForecast.

import lightgbm as lgb
import xgboost as xgb
from sklearn.ensemble import RandomForestRegressor

models = [
    lgb.LGBMRegressor(),
    xgb.XGBRegressor(),
    RandomForestRegressor(random_state=0),
]

Forecast object

Now instantiate a MLForecast object with the models and the features that you want to use. The features can be lags, transformations on the lags and date features. The lag transformations are defined as numba jitted functions that transform an array, if they have additional arguments you can either supply a tuple (transform_func, arg1, arg2, …) or define new functions fixing the arguments. You can also define differences to apply to the series before fitting that will be restored when predicting.

from mlforecast import MLForecast
from mlforecast.target_transforms import Differences
from numba import njit
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean


@njit
def rolling_mean_28(x):
    return rolling_mean(x, window_size=28)


fcst = MLForecast(
    models=models,
    freq='D',
    lags=[7, 14],
    lag_transforms={
        1: [expanding_mean],
        7: [rolling_mean_28]
    },
    date_features=['dayofweek'],
    target_transforms=[Differences([1])],
)

Training

To compute the features and train the models call fit on your Forecast object.

fcst.fit(series)
MLForecast(models=[LGBMRegressor, XGBRegressor, RandomForestRegressor], freq=<Day>, lag_features=['lag7', 'lag14', 'expanding_mean_lag1', 'rolling_mean_28_lag7'], date_features=['dayofweek'], num_threads=1)

Predicting

To get the forecasts for the next n days call predict(n) on the forecast object. This will automatically handle the updates required by the features using a recursive strategy.

predictions = fcst.predict(14)
predictions
unique_id ds LGBMRegressor XGBRegressor RandomForestRegressor
0 id_00 2000-04-04 69.082830 67.761337 68.226556
1 id_00 2000-04-05 75.706024 74.588699 75.484774
2 id_00 2000-04-06 82.222473 81.058289 82.853684
3 id_00 2000-04-07 89.577638 88.735947 90.351212
4 id_00 2000-04-08 44.149095 44.981384 46.291173
... ... ... ... ... ...
275 id_19 2000-03-23 30.151270 31.814825 32.592799
276 id_19 2000-03-24 31.418104 32.653374 33.563294
277 id_19 2000-03-25 32.843567 33.586033 34.530912
278 id_19 2000-03-26 34.127210 34.541473 35.507559
279 id_19 2000-03-27 34.329202 35.450943 36.425001

280 rows Γ— 5 columns

Visualize results

import matplotlib.pyplot as plt
import pandas as pd

fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(12, 6), gridspec_kw=dict(hspace=0.3))
for i, (uid, axi) in enumerate(zip(series['unique_id'].unique(), ax.flat)):
    fltr = lambda df: df['unique_id'].eq(uid)
    pd.concat([series.loc[fltr, ['ds', 'y']], predictions.loc[fltr]]).set_index('ds').plot(ax=axi)
    axi.set(title=uid, xlabel=None)
    if i % 2 == 0:
        axi.legend().remove()
    else:
        axi.legend(bbox_to_anchor=(1.01, 1.0))
fig.savefig('figs/index.png', bbox_inches='tight')
plt.close()

Sample notebooks

How to contribute

See CONTRIBUTING.md.

About

Scalable machine πŸ€– learning for time series forecasting.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Shell 0.3%