Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ETS multiplicative error models log-likelihood calculation #9137

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

ltsaprounis
Copy link

Related to #7331

tl;dr: This PR fixes a small issue in the likelihood calculation of ETS models with multiplicative errors, that caused them to have low ICs and unreasonable forecasts.


It appears that the problem raised in #7331 (ETS(MAM) giving unreasonable forecasts) is still there. The fix for the issue had the desired effect for the example time series in the issue but unfortunately the exploding forecasts can still appear for different time series like the one below:

Code to reproduce the plots
from statsmodels.tsa.exponential_smoothing.ets import ETSModel
import numpy as np
from matplotlib import pyplot as plt

data = np.array([
    214307.1, 331110.3, 324617.4, 230818.8, 275551.5, 232633.5,
    281989.2, 245336.4, 296976. , 207572.7, 252719.4, 210795. ,
    142705.8, 146742.3, 261613.5, 276959.1, 365010. ,  48734.7,
    308919.9, 319401. , 150199.2, 103010.1, 109861.8,  89486.1,
    393645. , 421279.5, 347387.4, 269928. , 171278.7, 158568.9,
    280526.4, 195104.4, 230998.2, 226602.9, 267526.8, 258632.7
])

plt.plot(data)
plt.title("Original time series")
plt.show()

res = ETSModel(
    endog=data,
    seasonal_periods=12,
    error="mul",
    seasonal="mul",
    trend="add",
).fit()
pred = res.forecast(steps=12)
pred_fill = np.zeros_like(data)
pred_fill[:] = np.nan
plt.plot(data, label="endog")
plt.plot(np.concatenate([pred_fill, pred]), label="forecast")
plt.legend()
plt.title("ETS(MAM) forecast - statsmodels (main)")
plt.show()

image
image

Moreover, the log-likelihood of the unreasonable forecasts is unreasonably high making the AutoETS implementations (e.g. sktime, aeon) that are based on ETS model selection using information criteria (e.g. AIC) select the unreasonable forecasts.
 
The problem comes from the Log-Likelihood calculation for ETS models with multiplicative errors, and more specifically the fail-safe mechanism for negative or zero yhat values here:

if self.error == "mul":
# GH-7331: in some cases, yhat can become negative, so that a
# multiplicative model is no longer well-defined. To avoid these
# parameterizations, we clip negative values to very small positive
# values so that the log-transformation yields very large negative
# values.
yhat[yhat <= 0] = 1 / (1e-8 * (1 + np.abs(yhat[yhat <= 0])))
logL -= np.sum(np.log(yhat))

 
This way is not consistent with the literature (see screenshot below):
image
screenshot from the online version of Svetunkov, I. (2023). Forecasting and Analytics with the Augmented Dynamic Adaptive Model (ADAM) (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9781003452652
 

In addition, it's not consistent with any other ETS implementations, where they all seem to follow the formula above for the last term of the log-likelihood, namely log(| yhat |). Below is the relevant code from the other OSS implementations I checked:

All the implementations listed above give similar Log-Likelihood values and AIC as well as reasonable forecasts. The AIC for ETS and this time series is best for the EST(ANN) model hence all the automatic model selection methods from the packages above select this one. Below are the forecasts from these different packages for ETS(MAM) for the same time "problematic" series:

(R) forecast
library(forecast)
library(ggplot2)

data = ts(
  c(
    214307.1, 331110.3, 324617.4, 230818.8, 275551.5, 232633.5,
    281989.2, 245336.4, 296976. , 207572.7, 252719.4, 210795. ,
    142705.8, 146742.3, 261613.5, 276959.1, 365010. ,  48734.7,
    308919.9, 319401. , 150199.2, 103010.1, 109861.8,  89486.1,
    393645. , 421279.5, 347387.4, 269928. , 171278.7, 158568.9,
    280526.4, 195104.4, 230998.2, 226602.9, 267526.8, 258632.7
  ),
  freq=12
)

fcast_ets_mam = forecast::ets(data, model="MAM", ic="aic") %>% forecast(h=12)
autoplot(fcast_ets_mam)
summary(fcast_ets_mam)

image

(R) smooth
library(smooth)
library(ggplot2)

data = ts(
  c(
    214307.1, 331110.3, 324617.4, 230818.8, 275551.5, 232633.5,
    281989.2, 245336.4, 296976. , 207572.7, 252719.4, 210795. ,
    142705.8, 146742.3, 261613.5, 276959.1, 365010. ,  48734.7,
    308919.9, 319401. , 150199.2, 103010.1, 109861.8,  89486.1,
    393645. , 421279.5, 347387.4, 269928. , 171278.7, 158568.9,
    280526.4, 195104.4, 230998.2, 226602.9, 267526.8, 258632.7
  ),
  freq=12
)

smooth_es_fit = smooth::es(data, model="MAM")
summary(smooth_es_fit)
smooth_es_fit %>% forecast(h=12) |> plot(main="Smooth ETS(MAM)")

image

(Python) nixtla - statsforecast
from statsforecast.models import AutoETS as StatsforecastETS
import numpy as np
from matplotlib import pyplot as plt

data = np.array([
    214307.1, 331110.3, 324617.4, 230818.8, 275551.5, 232633.5,
    281989.2, 245336.4, 296976. , 207572.7, 252719.4, 210795. ,
    142705.8, 146742.3, 261613.5, 276959.1, 365010. ,  48734.7,
    308919.9, 319401. , 150199.2, 103010.1, 109861.8,  89486.1,
    393645. , 421279.5, 347387.4, 269928. , 171278.7, 158568.9,
    280526.4, 195104.4, 230998.2, 226602.9, 267526.8, 258632.7
])
s_fcaster = StatsforecastETS(model="MAM", season_length=12)
s_fcaster.fit(data)
pred = s_fcaster.predict(h=12)["mean"]
pred_fill = np.empty_like(data)
pred_fill[:] = np.nan
pred = np.concatenate([pred_fill, pred])
plt.plot(data, label="endog")
plt.plot(pred, label="forecast")
plt.title("Nixtla/statsforecast - ETS(MAM)")
plt.show()
print(s_fcaster.__dict__)

image

 

This PR changes the code for the log-likelihood for multiplicative error models so that it's consistent with the literature and the other ETS implementations. This results in similar forecasts like above:

image

Here are the values for each the log-likelihood and the AIC for ETS(MAM) and that series for each implementation, including the change in statsmodels proposed in this PR.

ETS Implementation Log-Likelihood AIC
statsmodels - main -79.417 194.835
(R) forecast -470.094 974.188
(R) smooth -453.326 936.652
nixtla - statsforecast -467.193 968.386
statsmodels - this PR -453.251 942.503

PS: Many many many thanks to @config-i1 for helping me to figure this one out! 🙏

@ltsaprounis ltsaprounis changed the title Ets bug BUG: ETS multiplicative error models log-likelihood calculation Feb 2, 2024
Copy link
Member

@bashtage bashtage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea but can simplify a bit.

# are replaced with 10^-32 (a very small number) and we take
# the absolute of yhat to avoid computing the log of negative
# numbers.
yhat[yhat == 0] = 1e-32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really should leave the <=. == 0 is likely too strict, and if it really is ==0, then <=0 will catch all cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun corner cases. FWIW, I don't think you need the clipping at all anymore. It's almost certainly not being triggered here, and if you do switch it to clip the negatives, then it becomes unstable again.

# the absolute of yhat to avoid computing the log of negative
# numbers.
yhat[yhat == 0] = 1e-32
logL -= np.sum(np.log(np.abs(yhat)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think reflecting negatives is a good idea. Better to replace with small numbers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that the absolute value is not there to avoid negative numbers, but instead arises mathematically:

If $\epsilon_t \sim \mathcal{N}(0, \sigma^2)$ then $y_t = \mu_{y,t}(1+\epsilon_t) \sim \mathcal{N}(\mu_{y,t}, \mu_{y,t}^2 \sigma^2)$. We then end up with the following log-likelihood:

$\ell(\boldsymbol{\theta}, {\sigma}^2 | \mathbf{y}) = -\frac{1}{2} \sum_{t=1}^T \log(2 \pi \mu_{y,t}^2 \sigma^2) -\frac{1}{2} \sum_{t=1}^T \frac{\epsilon_t^2}{\sigma^2}$ . As you see, we have $\mu_{y,t}^2$ in the formula above, which after simplifications can be written as: $-\frac{T}{2} \log(2 \pi \sigma^2) -\frac{1}{2} \sum_{t=1}^T \frac{\epsilon_t^2}{\sigma^2} - \sum_{t=1}^T \log |\mu_{y,t}|$.

So, it is necessary to have it there for the correct calculation of the log-likelihood. The derivations are based on the Section 11.1 here: https://openforecast.org/adam/ADAMETSEstimationLikelihood.html

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing the PR @bashtage and thanks for the additional explanation on the likelihood derivation @config-i1.

Based on @config-i1's explanation and the other implementations, I think it makes sense to keep the np.abs(yhat) do you agree @bashtage?

@pep8speaks
Copy link

pep8speaks commented Mar 19, 2024

Hello @ltsaprounis! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-03-19 11:20:34 UTC

@ltsaprounis
Copy link
Author

Please note that the absolute value is not there to avoid negative numbers, but instead arises mathematically:

If ϵt∼N(0,σ2) then yt=μy,t(1+ϵt)∼N(μy,t,μy,t2σ2). We then end up with the following log-likelihood:

ℓ(θ,σ2|y)=−12∑t=1Tlog⁡(2πμy,t2σ2)−12∑t=1Tϵt2σ2 . As you see, we have μy,t2 in the formula above, which after simplifications can be written as: −T2log⁡(2πσ2)−12∑t=1Tϵt2σ2−∑t=1Tlog⁡|μy,t|.

So, it is necessary to have it there for the correct calculation of the log-likelihood. The derivations are based on the Section 11.1 here: https://openforecast.org/adam/ADAMETSEstimationLikelihood.html

Made some changes to the comment for the log-likelihood calculation based on Ivan's comments above. Let me know what you think @bashtage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants