Nixtla

Machine Learning 🤖 Forecast

Scalable machine learning for time series forecasting

mlforecast is a framework to perform time series forecasting using machine learning models, with the option to scale to massive amounts of data using remote clusters.

Install

PyPI

pip install mlforecast

If you want to perform distributed training, you can instead use pip install "mlforecast[distributed]", which will also install dask. Note that you’ll also need to install either LightGBM or XGBoost.

conda-forge

conda install -c conda-forge mlforecast

Note that this installation comes with the required dependencies for the local interface. If you want to perform distributed training, you must install dask (conda install -c conda-forge dask) and either LightGBM or XGBoost.

Quick Start

Minimal Example

import lightgbm as lgb

from mlforecast import MLForecast
from sklearn.linear_model import LinearRegression

mlf = MLForecast(
    models = [LinearRegression(), lgb.LGBMRegressor()],
    lags=[1, 12],
    freq = 'M'
)
mlf.fit(df)
mlf.predict(12)

Get Started with this quick guide.

Follow this end-to-end walkthrough for best practices.

Why?

Current Python alternatives for machine learning models are slow, inaccurate and don’t scale well. So we created a library that can be used to forecast in production environments. MLForecast includes efficient feature engineering to train any machine learning model (with fit and predict methods such as sklearn) to fit millions of time series.

Features

Fastest implementations of feature engineering for time series forecasting in Python.
Out-of-the-box compatibility with Spark, Dask, and Ray.
Probabilistic Forecasting with Conformal Prediction.
Support for exogenous variables and static covariates.
Familiar sklearn syntax: .fit and .predict.

Missing something? Please open an issue or write us in

Examples and Guides

📚 End to End Walkthrough: model training, evaluation and selection for multiple time series.

🔎 Probabilistic Forecasting: use Conformal Prediction to produce prediciton intervals.

👩‍🔬 Cross Validation: robust model’s performance evaluation.

🔌 Predict Demand Peaks: electricity load forecasting for detecting daily peaks and reducing electric bills.

📈 Transfer Learning: pretrain a model using a set of time series and then predict another one using that pretrained model.

🌡️ Distributed Training: use a Dask cluster to train models at scale.

How to use

The following provides a very basic overview, for a more detailed description see the documentation.

Data setup

Store your time series in a pandas dataframe in long format, that is, each row represents an observation for a specific serie and timestamp.

from mlforecast.utils import generate_daily_series

series = generate_daily_series(
    n_series=20,
    max_length=100,
    n_static_features=1,
    static_as_categorical=False,
    with_trend=True
)
series.head()

	unique_id	ds	y	static_0
0	id_00	2000-01-01	1.751917	72
1	id_00	2000-01-02	9.196715	72
2	id_00	2000-01-03	18.577788	72
3	id_00	2000-01-04	24.520646	72
4	id_00	2000-01-05	33.418028	72

Models

Next define your models. If you want to use the local interface this can be any regressor that follows the scikit-learn API. For distributed training there are LGBMForecast and XGBForecast.

import lightgbm as lgb
import xgboost as xgb
from sklearn.ensemble import RandomForestRegressor

models = [
    lgb.LGBMRegressor(),
    xgb.XGBRegressor(),
    RandomForestRegressor(random_state=0),
]

Forecast object

Now instantiate a MLForecast object with the models and the features that you want to use. The features can be lags, transformations on the lags and date features. The lag transformations are defined as numba jitted functions that transform an array, if they have additional arguments you can either supply a tuple (transform_func, arg1, arg2, …) or define new functions fixing the arguments. You can also define differences to apply to the series before fitting that will be restored when predicting.

from mlforecast import MLForecast
from mlforecast.target_transforms import Differences
from numba import njit
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean


@njit
def rolling_mean_28(x):
    return rolling_mean(x, window_size=28)


fcst = MLForecast(
    models=models,
    freq='D',
    lags=[7, 14],
    lag_transforms={
        1: [expanding_mean],
        7: [rolling_mean_28]
    },
    date_features=['dayofweek'],
    target_transforms=[Differences([1])],
)

Training

To compute the features and train the models call fit on your Forecast object.

fcst.fit(series)

MLForecast(models=[LGBMRegressor, XGBRegressor, RandomForestRegressor], freq=<Day>, lag_features=['lag7', 'lag14', 'expanding_mean_lag1', 'rolling_mean_28_lag7'], date_features=['dayofweek'], num_threads=1)

Predicting

To get the forecasts for the next n days call predict(n) on the forecast object. This will automatically handle the updates required by the features using a recursive strategy.

predictions = fcst.predict(14)
predictions

	unique_id	ds	LGBMRegressor	XGBRegressor	RandomForestRegressor
0	id_00	2000-04-04	69.082830	67.761337	68.226556
1	id_00	2000-04-05	75.706024	74.588699	75.484774
2	id_00	2000-04-06	82.222473	81.058289	82.853684
3	id_00	2000-04-07	89.577638	88.735947	90.351212
4	id_00	2000-04-08	44.149095	44.981384	46.291173
...	...	...	...	...	...
275	id_19	2000-03-23	30.151270	31.814825	32.592799
276	id_19	2000-03-24	31.418104	32.653374	33.563294
277	id_19	2000-03-25	32.843567	33.586033	34.530912
278	id_19	2000-03-26	34.127210	34.541473	35.507559
279	id_19	2000-03-27	34.329202	35.450943	36.425001

280 rows × 5 columns

Visualize results

import matplotlib.pyplot as plt
import pandas as pd

fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(12, 6), gridspec_kw=dict(hspace=0.3))
for i, (uid, axi) in enumerate(zip(series['unique_id'].unique(), ax.flat)):
    fltr = lambda df: df['unique_id'].eq(uid)
    pd.concat([series.loc[fltr, ['ds', 'y']], predictions.loc[fltr]]).set_index('ds').plot(ax=axi)
    axi.set(title=uid, xlabel=None)
    if i % 2 == 0:
        axi.legend().remove()
    else:
        axi.legend(bbox_to_anchor=(1.01, 1.0))
fig.savefig('figs/index.png', bbox_inches='tight')
plt.close()

Sample notebooks

How to contribute

See CONTRIBUTING.md.

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
.github		.github
action_files		action_files
figs		figs
mlforecast		mlforecast
nbs		nbs
.gitattributes		.gitattributes
.gitignore		.gitignore
.mypy.ini		.mypy.ini
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yml		environment.yml
local_environment.yml		local_environment.yml
pyproject.toml		pyproject.toml
settings.ini		settings.ini
setup.py		setup.py

License

tiefenthaler/mlforecast

Folders and files

Latest commit

History

Repository files navigation