ENH: fix vonmises fit for bad guess of location parameter #18190

dschmitz89 · 2023-03-22T20:54:50Z

Reference issue

Follow up of #18128

What does this implement/fix?

For a very bad guess of the location parameter, the maximum likelihood equation of the concentration parameter $\kappa$ does not have a solution. In that case, the maximum likelihood estimate is $\kappa=0$. No formal proof (see below) but intuitively, if the location does not fit at all, it is better to assume a uniform distribution on the cirlce. This PR catches the Rootfinding error and returns the tiniest floating point value.

Additional information

With the data of the test case and loc=pi/2, the kappa equation does not have a root. See below.

Generate the plot

import numpy as np
from scipy import stats
from scipy import special as sc
import matplotlib.pyplot as plt

rvs = [-0.92923506, -0.32498224,  0.13054989, -0.97252014,  2.79658071,
       -0.89110948,  1.22520295,  1.44398065,  2.49163859,  1.50315096,
        3.05437696, -2.73126329, -3.06272048,  1.64647173,  1.94509247,
       -1.14328023,  0.8499056 ,  2.36714682, -1.6823179 , -0.88359996]

data = np.asarray(rvs)

fixed = -0.5 * np.pi
r_fixed = np.sum(np.cos(fixed - data))/len(data)
r_free = 1 - stats.circvar(data,low=-np.pi, high=np.pi)

kappas = np.logspace(-20, 20, 1000)
plt.semilogx(kappas, sc.i1e(kappas)/sc.i0e(kappas) - r_fixed, label="fixed loc")
plt.semilogx(kappas, sc.i1e(kappas)/sc.i0e(kappas) - r_free, label="free loc")
plt.legend()
plt.show()

Proof

In #18128 it was mentioned that the partial derivative will of the log likelihood will tell if the MLE is the left end of the parameter range (0) or the right end ($\infty$). This result is not unambiguous though as the derivative still depends on $\kappa$. Or I might be doing a mistake here?

$$ \begin{equation} logp(x_1, ... ,x_n| \mu, \kappa) = \sum_{i=1}^N\kappa\cos(x_i-\mu) - \ln(2\pi)-\ln(I_0(\kappa)) \end{equation} $$

The partial derivative:

$$
\begin{equation}
\frac{\partial logp}{\partial\kappa}(x_1, ..., ,x_n| \mu, \kappa) = \sum_{i=1}^N \underbrace{\cos(x_i-\mu)}{\in [-1, 1]} - \underbrace{\frac{I_0'(\kappa)}{I_0(\kappa)}}{\in [0, 1]}
\end{equation}
$$

mdhaber · 2023-03-22T21:23:02Z

It doesn't matter much here because terms with $\kappa$ can be isolated, but please remember the summation symbols and subscript on $x_i$. Without them, the equations look simpler than they are.

In #18128 it was mentioned that the partial derivative will of the log likelihood will tell if the MLE is the left end of the parameter range (0) or the right end ($\infty$). This result is not unambiguous though as the derivative still depends on $\kappa$.

If you can show that the partial derivative of the LL w.r.t. $\kappa$ is positive for all valid $\kappa$, it is appropriate to give $\kappa=\infty$ (or finfo().max, possibly with a warning that this is a "feature" of MLE) because the likelihood can be increased without bound by increasing $\kappa$.
If you can show that the partial derivative of the LL w.r.t. $\kappa$ is negative for all valid $\kappa$, it is appropriate to give $\kappa=0$ (or finfo().tiny, possibly with a warning) because the likelihood can be increased by decreasing $\kappa$ until the $\kappa > 0$ constraint becomes active. (This is a constrained optimization problem.)
The fact that the math leads you to $0$ or $\infty$, which are not valid values of $\kappa$, just means that MLE itself is not well-behaved here. MLE runs into these sorts of problems all the time. You could raise an error or return NaN, but I think returning a value accompanied by a warning is the most informative.
(If you're not sure of the sign of the partial derivative either way, then yes, it would be ambiguous - but read on.)

~~If the root finding fails, we have good reason to believe that it doesn't cross zero. So you could just evaluate the partial derivative at any $\kappa$ and use the sign to determine which case it is.~~

If we're uncomfortable with using the root finding failure to mean that it definitely doesn't cross zero, you still might be able to make a decision. You also know that $\sum_{i=1}^N \cos(x_i-\mu)$ is bounded below by $-N$ and above by $N$. If you can figure out bounds on $I_0'(\cdot)/I_0(\cdot)$, it might be possible to determine whether this partial derivative is positive or negative.

Update: $I_0'(\kappa)/I_0(\kappa)$ is always between 0 and 1 for $\kappa > 0$. $\mu$ and $x_i$ are given, so you just need to evaluate $\sum_{i=1}^N \cos(x_i-\mu)$ to determine whether the equation has a solution or not, and if not, what the sign is.

mdhaber · 2023-03-23T00:21:20Z

For instance, in the test problem, r = np.sum(np.cos(loc - data))/len(data) is negative. So sc.i1e(kappa)/sc.i0e(kappa) - r is positive, regardless of kappa. According to https://github.com/scipy/scipy/pull/18128#issuecomment-1474462330, solve_for_kappa is the negative of the partial derivative of the LL w.r.t. $\kappa$, so the partial derivative of the LL w.r.t. $\kappa$ is always negative. Therefore, by reducing $\kappa$ as far as possible (to zero), you maximize the LL.

We can confirm this:

from scipy import stats
data = [-0.92923506, -0.32498224, 0.13054989, -0.97252014, 2.79658071,
        -0.89110948, 1.22520295, 1.44398065, 2.49163859, 1.50315096,
        3.05437696, -2.73126329, -3.06272048, 1.64647173, 1.94509247,
        -1.14328023, 0.8499056, 2.36714682, -1.6823179, -0.88359996]
loc = -0.5 * np.pi
stats.vonmises.nnlf((1, loc, 1), data)  # 43.06642193588772
stats.vonmises.nnlf((.1, loc, 1), data)  # 36.96656945662016
stats.vonmises.nnlf((1e-300, loc, 1), data)  # 36.75754132818691

Indeed, the negative log-likelihood (NLLF, erroneously denoted nnlf) is minimized by setting kappa as small as possible.

You can apply this reasoning whenever r is negative.

It looks like r can't be greater than 1, because np.sum(np.cos(loc - data)) must be <=len(data), so np.sum(np.cos(loc - data))/len(data) <= 1. I guess we don't need to consider the other case after all.

So rather than a try-except, you can decide whether or not to call root_scalar based on the sign of r. If it's negative, just return 0 (or tiny, if you prefer). Otherwise, there is definitely a root of the likelihood equation, and you can return that.

Please check me on all this! Too many sign changes.

* main: (52 commits) ENH: stats.vonmises.fit: treat case in which location likelihood equation has no solution(scipy#18190) MAINT: stats.kendalltau: avoid overflow (scipy#18193) DOC: Optimize: Fix for side bar rendering on top of Hessian (scipy#18189) MAINT: optimize.linprog: fix bound checks for integrality > 1 (scipy#18160) MAINT: Windows distutils cdist/pdist shims (scipy#18169) BUG: interpolate: add x-y length validation for `make_smoothing_spline`. (scipy#18188) ENH: Added `_sf` method for anglit distribution (scipy#17832) (scipy#18178) DOC: Fixed missing curly bracket in scipy.css DOC: Improving wording and docs for legacy directive DOC: Move legacy directive to not be first in the file DOC: Legacy directive custom styling DOC: Add optional argument to Legacy directive DOC: Ignore legacy directive in refguide_check DOC: Documenting the usage of the legacy directive DOC: Add legacy directive for documentation MAINT: stats.ecdf: store number at risk just before events (scipy#18187) DOC: cite pip issue about multiple `--config-settings` (scipy#18174) DOC: update links for ARPACK to point to ARPACK-NG (scipy#18173) MAINT: stats.logistic.fit: simplify MAINT: optimize.root_scalar: return gracefully when callable returns NaN (scipy#18172) ...

fancidev · 2024-02-17T06:34:59Z

@dschmitz89 As mentioned in another ticket, a few small notes about vonmise.fit:

1/ The MLE tries to solve $r_0(\kappa):=I_1(\kappa)/I_0(\kappa)=r$ for $0 \le \kappa < +\infty$. It can be shown that the function $r_0(\kappa)$ is monotonic increasing from 0 (inclusive) to 1 (exclusive), so there is a unique root if and only if $0 \le r < 1$. (The code should probably check for $r=1$.) A simple and useful bound of the function $r_0(x)$ is given by Amos (1974, Equation 11). From those bounds one can bound of the solution $\kappa^\ast$ by

$$ \frac{2r}{\sqrt{1-r^2}}\le \kappa^\ast \le \frac{2r}{1-r^2} $$

2/ One can shown $\hat{\kappa}=0$ is indeed the MLE if $r \le 0$ because the partial derivative of the log likelihood function with respect to $\kappa$ is always negative.

3/ The code avoids $\kappa=0$ and returns a tiny positive number instead. But it seems for this distribution, the PDF etc work perfectly fine if $\kappa=0$, so it might as well allow that to simplify some corner cases.

dschmitz89 · 2024-02-17T07:54:51Z

@fancidev : The bounds for the root finding problem are a nice finding. PR very much welcome :). Just be aware that for values larger than the current right end of the bracket, i1e/i0e always returns 1, so that should stay the maximum in case the computed bound is larger. Implementing a special function for $\frac{I_{v+1}(x)}{I_v(x)}$ according to the Amos paper would be the best solution in the long run but there are more important things happening in scipy.special world at the moment.

About the $\kappa=0$ case: this edge case handling has proven to be tricky. See for example this issue regarding the Von Mises distribution: https://github.com/scipy/scipy/issues/18166#issuecomment-1474230620

As the distribution infrastructure is being overhauled at the moment, I would like to leave that as it is at the moment. The new infrastructure might make it easier to dispatch to another distribution if it arises as a special case of one distribution.

fancidev · 2024-02-18T02:33:17Z

@dschmitz89 Let me make a PR for the bounds. I’ll leave the $r=1$ case as is but add a comment. The situation $r=1$ happens when all observations are equal (to the mean), in which case the “precision” parameter $\kappa$ is infinite. This is the analog to fitting a normal distribution to data where all observations are equal; the current SciPy implementation sets scale to 0 in that case.

I’ll attempt a separate PR to close gh-18166 as it seems able to fix with a simple fix.

dschmitz89 added 2 commits March 22, 2023 19:41

ENH: fix fit for bad guess of floc

7e394a2

remove dictionary

5981ceb

dschmitz89 requested a review from mdhaber March 22, 2023 21:00

mdhaber mentioned this pull request Mar 22, 2023

ENH: Added _sf method for anglit distribution (#17832) #18178

Merged

j-bowhay added scipy.stats enhancement A new feature or improvement labels Mar 22, 2023

remove try except block

e1c8880

mdhaber merged commit 979102e into scipy:main Mar 25, 2023

j-bowhay added this to the 1.11.0 milestone Mar 31, 2023

dschmitz89 deleted the vonmises_fit_edgecases branch July 18, 2023 21:48

fancidev mentioned this pull request Feb 18, 2024

ENH: use non-trivial bounds to solve for kappa of vonmises MLE. #20102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: fix vonmises fit for bad guess of location parameter #18190

ENH: fix vonmises fit for bad guess of location parameter #18190

dschmitz89 commented Mar 22, 2023 •

edited

mdhaber commented Mar 22, 2023 •

edited

mdhaber commented Mar 23, 2023 •

edited

fancidev commented Feb 17, 2024 •

edited

dschmitz89 commented Feb 17, 2024 •

edited

fancidev commented Feb 18, 2024

ENH: fix vonmises fit for bad guess of location parameter #18190

ENH: fix vonmises fit for bad guess of location parameter #18190

Conversation

dschmitz89 commented Mar 22, 2023 • edited

Reference issue

What does this implement/fix?

Additional information

mdhaber commented Mar 22, 2023 • edited

mdhaber commented Mar 23, 2023 • edited

fancidev commented Feb 17, 2024 • edited

dschmitz89 commented Feb 17, 2024 • edited

fancidev commented Feb 18, 2024

dschmitz89 commented Mar 22, 2023 •

edited

mdhaber commented Mar 22, 2023 •

edited

mdhaber commented Mar 23, 2023 •

edited

fancidev commented Feb 17, 2024 •

edited

dschmitz89 commented Feb 17, 2024 •

edited