BUG: ETS multiplicative error models log-likelihood calculation #9137

ltsaprounis · 2024-02-02T17:33:24Z

Related to #7331

tl;dr: This PR fixes a small issue in the likelihood calculation of ETS models with multiplicative errors, that caused them to have low ICs and unreasonable forecasts.

It appears that the problem raised in #7331 (ETS(MAM) giving unreasonable forecasts) is still there. The fix for the issue had the desired effect for the example time series in the issue but unfortunately the exploding forecasts can still appear for different time series like the one below:

Code to reproduce the plots

from statsmodels.tsa.exponential_smoothing.ets import ETSModel
import numpy as np
from matplotlib import pyplot as plt

data = np.array([
    214307.1, 331110.3, 324617.4, 230818.8, 275551.5, 232633.5,
    281989.2, 245336.4, 296976. , 207572.7, 252719.4, 210795. ,
    142705.8, 146742.3, 261613.5, 276959.1, 365010. ,  48734.7,
    308919.9, 319401. , 150199.2, 103010.1, 109861.8,  89486.1,
    393645. , 421279.5, 347387.4, 269928. , 171278.7, 158568.9,
    280526.4, 195104.4, 230998.2, 226602.9, 267526.8, 258632.7
])

plt.plot(data)
plt.title("Original time series")
plt.show()

res = ETSModel(
    endog=data,
    seasonal_periods=12,
    error="mul",
    seasonal="mul",
    trend="add",
).fit()
pred = res.forecast(steps=12)
pred_fill = np.zeros_like(data)
pred_fill[:] = np.nan
plt.plot(data, label="endog")
plt.plot(np.concatenate([pred_fill, pred]), label="forecast")
plt.legend()
plt.title("ETS(MAM) forecast - statsmodels (main)")
plt.show()

Moreover, the log-likelihood of the unreasonable forecasts is unreasonably high making the AutoETS implementations (e.g. sktime, aeon) that are based on ETS model selection using information criteria (e.g. AIC) select the unreasonable forecasts.

The problem comes from the Log-Likelihood calculation for ETS models with multiplicative errors, and more specifically the fail-safe mechanism for negative or zero yhat values here:

statsmodels/statsmodels/tsa/exponential_smoothing/ets.py

Lines 1162 to 1169 in 23faea3

    
           if self.error == "mul": 
        
               # GH-7331: in some cases, yhat can become negative, so that a 
        
               # multiplicative model is no longer well-defined. To avoid these 
        
               # parameterizations, we clip negative values to very small positive 
        
               # values so that the log-transformation yields very large negative 
        
               # values. 
        
               yhat[yhat <= 0] = 1 / (1e-8 * (1 + np.abs(yhat[yhat <= 0]))) 
        
               logL -= np.sum(np.log(yhat))

This way is not consistent with the literature (see screenshot below):

screenshot from the online version of Svetunkov, I. (2023). Forecasting and Analytics with the Augmented Dynamic Adaptive Model (ADAM) (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9781003452652

In addition, it's not consistent with any other ETS implementations, where they all seem to follow the formula above for the last term of the log-likelihood, namely log(| yhat |). Below is the relevant code from the other OSS implementations I checked:

All the implementations listed above give similar Log-Likelihood values and AIC as well as reasonable forecasts. The AIC for ETS and this time series is best for the EST(ANN) model hence all the automatic model selection methods from the packages above select this one. Below are the forecasts from these different packages for ETS(MAM) for the same time "problematic" series:

(R) forecast

library(forecast)
library(ggplot2)

data = ts(
  c(
    214307.1, 331110.3, 324617.4, 230818.8, 275551.5, 232633.5,
    281989.2, 245336.4, 296976. , 207572.7, 252719.4, 210795. ,
    142705.8, 146742.3, 261613.5, 276959.1, 365010. ,  48734.7,
    308919.9, 319401. , 150199.2, 103010.1, 109861.8,  89486.1,
    393645. , 421279.5, 347387.4, 269928. , 171278.7, 158568.9,
    280526.4, 195104.4, 230998.2, 226602.9, 267526.8, 258632.7
  ),
  freq=12
)

fcast_ets_mam = forecast::ets(data, model="MAM", ic="aic") %>% forecast(h=12)
autoplot(fcast_ets_mam)
summary(fcast_ets_mam)

(R) smooth

library(smooth)
library(ggplot2)

data = ts(
  c(
    214307.1, 331110.3, 324617.4, 230818.8, 275551.5, 232633.5,
    281989.2, 245336.4, 296976. , 207572.7, 252719.4, 210795. ,
    142705.8, 146742.3, 261613.5, 276959.1, 365010. ,  48734.7,
    308919.9, 319401. , 150199.2, 103010.1, 109861.8,  89486.1,
    393645. , 421279.5, 347387.4, 269928. , 171278.7, 158568.9,
    280526.4, 195104.4, 230998.2, 226602.9, 267526.8, 258632.7
  ),
  freq=12
)

smooth_es_fit = smooth::es(data, model="MAM")
summary(smooth_es_fit)
smooth_es_fit %>% forecast(h=12) |> plot(main="Smooth ETS(MAM)")

(Python) nixtla - statsforecast

from statsforecast.models import AutoETS as StatsforecastETS
import numpy as np
from matplotlib import pyplot as plt

data = np.array([
    214307.1, 331110.3, 324617.4, 230818.8, 275551.5, 232633.5,
    281989.2, 245336.4, 296976. , 207572.7, 252719.4, 210795. ,
    142705.8, 146742.3, 261613.5, 276959.1, 365010. ,  48734.7,
    308919.9, 319401. , 150199.2, 103010.1, 109861.8,  89486.1,
    393645. , 421279.5, 347387.4, 269928. , 171278.7, 158568.9,
    280526.4, 195104.4, 230998.2, 226602.9, 267526.8, 258632.7
])
s_fcaster = StatsforecastETS(model="MAM", season_length=12)
s_fcaster.fit(data)
pred = s_fcaster.predict(h=12)["mean"]
pred_fill = np.empty_like(data)
pred_fill[:] = np.nan
pred = np.concatenate([pred_fill, pred])
plt.plot(data, label="endog")
plt.plot(pred, label="forecast")
plt.title("Nixtla/statsforecast - ETS(MAM)")
plt.show()
print(s_fcaster.__dict__)

This PR changes the code for the log-likelihood for multiplicative error models so that it's consistent with the literature and the other ETS implementations. This results in similar forecasts like above:

Here are the values for each the log-likelihood and the AIC for ETS(MAM) and that series for each implementation, including the change in statsmodels proposed in this PR.

ETS Implementation	Log-Likelihood	AIC
statsmodels - main	-79.417	194.835
(R) forecast	-470.094	974.188
(R) smooth	-453.326	936.652
nixtla - statsforecast	-467.193	968.386
statsmodels - this PR	-453.251	942.503

PS: Many many many thanks to @config-i1 for helping me to figure this one out! 🙏

bashtage

Good idea but can simplify a bit.

bashtage · 2024-02-07T09:24:34Z

statsmodels/tsa/exponential_smoothing/ets.py

+            # are replaced with 10^-32 (a very small number) and we take
+            # the absolute of yhat to avoid computing the log of negative
+            # numbers.
+            yhat[yhat == 0] = 1e-32


Really should leave the <=. == 0 is likely too strict, and if it really is ==0, then <=0 will catch all cases.

Fun corner cases. FWIW, I don't think you need the clipping at all anymore. It's almost certainly not being triggered here, and if you do switch it to clip the negatives, then it becomes unstable again.

bashtage · 2024-02-07T09:25:00Z

statsmodels/tsa/exponential_smoothing/ets.py

+            # the absolute of yhat to avoid computing the log of negative
+            # numbers.
+            yhat[yhat == 0] = 1e-32
+            logL -= np.sum(np.log(np.abs(yhat)))


I don't think reflecting negatives is a good idea. Better to replace with small numbers.

Please note that the absolute value is not there to avoid negative numbers, but instead arises mathematically:

If $\epsilon_t \sim \mathcal{N}(0, \sigma^2)$ then $y_t = \mu_{y,t}(1+\epsilon_t) \sim \mathcal{N}(\mu_{y,t}, \mu_{y,t}^2 \sigma^2)$. We then end up with the following log-likelihood:

$\ell(\boldsymbol{\theta}, {\sigma}^2 | \mathbf{y}) = -\frac{1}{2} \sum_{t=1}^T \log(2 \pi \mu_{y,t}^2 \sigma^2) -\frac{1}{2} \sum_{t=1}^T \frac{\epsilon_t^2}{\sigma^2}$ . As you see, we have $\mu_{y,t}^2$ in the formula above, which after simplifications can be written as: $-\frac{T}{2} \log(2 \pi \sigma^2) -\frac{1}{2} \sum_{t=1}^T \frac{\epsilon_t^2}{\sigma^2} - \sum_{t=1}^T \log |\mu_{y,t}|$.

So, it is necessary to have it there for the correct calculation of the log-likelihood. The derivations are based on the Section 11.1 here: https://openforecast.org/adam/ADAMETSEstimationLikelihood.html

Thanks for reviewing the PR @bashtage and thanks for the additional explanation on the likelihood derivation @config-i1.

Based on @config-i1's explanation and the other implementations, I think it makes sense to keep the np.abs(yhat) do you agree @bashtage?

pep8speaks · 2024-03-19T11:15:56Z

Hello @ltsaprounis! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-03-19 11:20:34 UTC

ltsaprounis · 2024-03-19T11:18:43Z

Please note that the absolute value is not there to avoid negative numbers, but instead arises mathematically:

If ϵt∼N(0,σ2) then yt=μy,t(1+ϵt)∼N(μy,t,μy,t2σ2). We then end up with the following log-likelihood:

ℓ(θ,σ2|y)=−12∑t=1Tlog⁡(2πμy,t2σ2)−12∑t=1Tϵt2σ2 . As you see, we have μy,t2 in the formula above, which after simplifications can be written as: −T2log⁡(2πσ2)−12∑t=1Tϵt2σ2−∑t=1Tlog⁡|μy,t|.

So, it is necessary to have it there for the correct calculation of the log-likelihood. The derivations are based on the Section 11.1 here: https://openforecast.org/adam/ADAMETSEstimationLikelihood.html

Made some changes to the comment for the log-likelihood calculation based on Ivan's comments above. Let me know what you think @bashtage.

ltsaprounis added 6 commits January 31, 2024 22:35

fixed likelihood overwrite for negative in-sample

626bcb3

update comment and change to -np.inf

f7f8e18

abs for yhat

a1031c3

replace zeros with 10^-18

834b0e4

change to 10^-32

cb4602a

update comment

da2daf0

ltsaprounis changed the title ~~Ets bug~~ BUG: ETS multiplicative error models log-likelihood calculation Feb 2, 2024

bashtage requested changes Feb 7, 2024

View reviewed changes

update comment for multiplicative error models

1f905b0

remove trailing whitespace

370c013

ltsaprounis requested a review from bashtage March 19, 2024 11:18

add missing parenthesis

97fc786

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: ETS multiplicative error models log-likelihood calculation #9137

BUG: ETS multiplicative error models log-likelihood calculation #9137

ltsaprounis commented Feb 2, 2024

bashtage left a comment

bashtage Feb 7, 2024

jseabold May 16, 2024

bashtage Feb 7, 2024

config-i1 Feb 7, 2024

ltsaprounis Feb 13, 2024

pep8speaks commented Mar 19, 2024 •

edited

ltsaprounis commented Mar 19, 2024

	if self.error == "mul":
	# GH-7331: in some cases, yhat can become negative, so that a
	# multiplicative model is no longer well-defined. To avoid these
	# parameterizations, we clip negative values to very small positive
	# values so that the log-transformation yields very large negative
	# values.
	yhat[yhat <= 0] = 1 / (1e-8 * (1 + np.abs(yhat[yhat <= 0])))
	logL -= np.sum(np.log(yhat))

BUG: ETS multiplicative error models log-likelihood calculation #9137

Are you sure you want to change the base?

BUG: ETS multiplicative error models log-likelihood calculation #9137

Conversation

ltsaprounis commented Feb 2, 2024

bashtage left a comment

Choose a reason for hiding this comment

bashtage Feb 7, 2024

Choose a reason for hiding this comment

jseabold May 16, 2024

Choose a reason for hiding this comment

bashtage Feb 7, 2024

Choose a reason for hiding this comment

config-i1 Feb 7, 2024

Choose a reason for hiding this comment

ltsaprounis Feb 13, 2024

Choose a reason for hiding this comment

pep8speaks commented Mar 19, 2024 • edited

Comment last updated at 2024-03-19 11:20:34 UTC

ltsaprounis commented Mar 19, 2024

pep8speaks commented Mar 19, 2024 •

edited