Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: fix vonmises fit for bad guess of location parameter #18190

Merged
merged 3 commits into from
Mar 25, 2023

Conversation

dschmitz89
Copy link
Contributor

@dschmitz89 dschmitz89 commented Mar 22, 2023

Reference issue

Follow up of #18128

What does this implement/fix?

For a very bad guess of the location parameter, the maximum likelihood equation of the concentration parameter $\kappa$ does not have a solution. In that case, the maximum likelihood estimate is $\kappa=0$. No formal proof (see below) but intuitively, if the location does not fit at all, it is better to assume a uniform distribution on the cirlce. This PR catches the Rootfinding error and returns the tiniest floating point value.

Additional information

With the data of the test case and loc=pi/2, the kappa equation does not have a root. See below.

image

Generate the plot
import numpy as np
from scipy import stats
from scipy import special as sc
import matplotlib.pyplot as plt

rvs = [-0.92923506, -0.32498224,  0.13054989, -0.97252014,  2.79658071,
       -0.89110948,  1.22520295,  1.44398065,  2.49163859,  1.50315096,
        3.05437696, -2.73126329, -3.06272048,  1.64647173,  1.94509247,
       -1.14328023,  0.8499056 ,  2.36714682, -1.6823179 , -0.88359996]

data = np.asarray(rvs)

fixed = -0.5 * np.pi
r_fixed = np.sum(np.cos(fixed - data))/len(data)
r_free = 1 - stats.circvar(data,low=-np.pi, high=np.pi)

kappas = np.logspace(-20, 20, 1000)
plt.semilogx(kappas, sc.i1e(kappas)/sc.i0e(kappas) - r_fixed, label="fixed loc")
plt.semilogx(kappas, sc.i1e(kappas)/sc.i0e(kappas) - r_free, label="free loc")
plt.legend()
plt.show()

Proof

In #18128 it was mentioned that the partial derivative will of the log likelihood will tell if the MLE is the left end of the parameter range (0) or the right end ($\infty$). This result is not unambiguous though as the derivative still depends on $\kappa$. Or I might be doing a mistake here?

$$ \begin{equation} logp(x_1, ... ,x_n| \mu, \kappa) = \sum_{i=1}^N\kappa\cos(x_i-\mu) - \ln(2\pi)-\ln(I_0(\kappa)) \end{equation} $$

The partial derivative:

$$
\begin{equation}
\frac{\partial logp}{\partial\kappa}(x_1, ..., ,x_n| \mu, \kappa) = \sum_{i=1}^N \underbrace{\cos(x_i-\mu)}{\in [-1, 1]} - \underbrace{\frac{I_0'(\kappa)}{I_0(\kappa)}}{\in [0, 1]}
\end{equation}
$$

@dschmitz89 dschmitz89 requested a review from mdhaber March 22, 2023 21:00
@mdhaber
Copy link
Contributor

mdhaber commented Mar 22, 2023

It doesn't matter much here because terms with $\kappa$ can be isolated, but please remember the summation symbols and subscript on $x_i$. Without them, the equations look simpler than they are.

In #18128 it was mentioned that the partial derivative will of the log likelihood will tell if the MLE is the left end of the parameter range (0) or the right end ($\infty$). This result is not unambiguous though as the derivative still depends on $\kappa$.

If you can show that the partial derivative of the LL w.r.t. $\kappa$ is positive for all valid $\kappa$, it is appropriate to give $\kappa=\infty$ (or finfo().max, possibly with a warning that this is a "feature" of MLE) because the likelihood can be increased without bound by increasing $\kappa$.
If you can show that the partial derivative of the LL w.r.t. $\kappa$ is negative for all valid $\kappa$, it is appropriate to give $\kappa=0$ (or finfo().tiny, possibly with a warning) because the likelihood can be increased by decreasing $\kappa$ until the $\kappa > 0$ constraint becomes active. (This is a constrained optimization problem.)
The fact that the math leads you to $0$ or $\infty$, which are not valid values of $\kappa$, just means that MLE itself is not well-behaved here. MLE runs into these sorts of problems all the time. You could raise an error or return NaN, but I think returning a value accompanied by a warning is the most informative.
(If you're not sure of the sign of the partial derivative either way, then yes, it would be ambiguous - but read on.)

If the root finding fails, we have good reason to believe that it doesn't cross zero. So you could just evaluate the partial derivative at any $\kappa$ and use the sign to determine which case it is.

If we're uncomfortable with using the root finding failure to mean that it definitely doesn't cross zero, you still might be able to make a decision. You also know that $\sum_{i=1}^N \cos(x_i-\mu)$ is bounded below by $-N$ and above by $N$. If you can figure out bounds on $I_0'(\cdot)/I_0(\cdot)$, it might be possible to determine whether this partial derivative is positive or negative.

Update: $I_0'(\kappa)/I_0(\kappa)$ is always between 0 and 1 for $\kappa > 0$. $\mu$ and $x_i$ are given, so you just need to evaluate $\sum_{i=1}^N \cos(x_i-\mu)$ to determine whether the equation has a solution or not, and if not, what the sign is.

@mdhaber
Copy link
Contributor

mdhaber commented Mar 23, 2023

For instance, in the test problem, r = np.sum(np.cos(loc - data))/len(data) is negative. So sc.i1e(kappa)/sc.i0e(kappa) - r is positive, regardless of kappa. According to https://github.com/scipy/scipy/pull/18128#issuecomment-1474462330, solve_for_kappa is the negative of the partial derivative of the LL w.r.t. $\kappa$, so the partial derivative of the LL w.r.t. $\kappa$ is always negative. Therefore, by reducing $\kappa$ as far as possible (to zero), you maximize the LL.

We can confirm this:

from scipy import stats
data = [-0.92923506, -0.32498224, 0.13054989, -0.97252014, 2.79658071,
        -0.89110948, 1.22520295, 1.44398065, 2.49163859, 1.50315096,
        3.05437696, -2.73126329, -3.06272048, 1.64647173, 1.94509247,
        -1.14328023, 0.8499056, 2.36714682, -1.6823179, -0.88359996]
loc = -0.5 * np.pi
stats.vonmises.nnlf((1, loc, 1), data)  # 43.06642193588772
stats.vonmises.nnlf((.1, loc, 1), data)  # 36.96656945662016
stats.vonmises.nnlf((1e-300, loc, 1), data)  # 36.75754132818691

Indeed, the negative log-likelihood (NLLF, erroneously denoted nnlf) is minimized by setting kappa as small as possible.

You can apply this reasoning whenever r is negative.

It looks like r can't be greater than 1, because np.sum(np.cos(loc - data)) must be <=len(data), so np.sum(np.cos(loc - data))/len(data) <= 1. I guess we don't need to consider the other case after all.

So rather than a try-except, you can decide whether or not to call root_scalar based on the sign of r. If it's negative, just return 0 (or tiny, if you prefer). Otherwise, there is definitely a root of the likelihood equation, and you can return that.

Please check me on all this! Too many sign changes.

@mdhaber mdhaber merged commit 979102e into scipy:main Mar 25, 2023
doronbehar added a commit to doronbehar/scipy that referenced this pull request Mar 27, 2023
* main: (52 commits)
  ENH: stats.vonmises.fit: treat case in which location likelihood equation has no solution(scipy#18190)
  MAINT: stats.kendalltau: avoid overflow (scipy#18193)
  DOC: Optimize: Fix for side bar rendering on top of Hessian (scipy#18189)
  MAINT: optimize.linprog: fix bound checks for integrality > 1 (scipy#18160)
  MAINT: Windows distutils cdist/pdist shims (scipy#18169)
  BUG: interpolate: add x-y length validation for `make_smoothing_spline`. (scipy#18188)
  ENH: Added `_sf` method for anglit distribution (scipy#17832) (scipy#18178)
  DOC: Fixed missing curly bracket in scipy.css
  DOC: Improving wording and docs for legacy directive
  DOC: Move legacy directive to not be first in the file
  DOC: Legacy directive custom styling
  DOC: Add optional argument to Legacy directive
  DOC: Ignore legacy directive in refguide_check
  DOC: Documenting the usage of the legacy directive
  DOC: Add legacy directive for documentation
  MAINT: stats.ecdf: store number at risk just before events (scipy#18187)
  DOC: cite pip issue about multiple `--config-settings` (scipy#18174)
  DOC: update links for ARPACK to point to ARPACK-NG (scipy#18173)
  MAINT: stats.logistic.fit: simplify
  MAINT: optimize.root_scalar: return gracefully when callable returns NaN (scipy#18172)
  ...
@j-bowhay j-bowhay added this to the 1.11.0 milestone Mar 31, 2023
@dschmitz89 dschmitz89 deleted the vonmises_fit_edgecases branch July 18, 2023 21:48
@fancidev
Copy link
Contributor

fancidev commented Feb 17, 2024

@dschmitz89 As mentioned in another ticket, a few small notes about vonmise.fit:

1/ The MLE tries to solve $r_0(\kappa):=I_1(\kappa)/I_0(\kappa)=r$ for $0 \le \kappa &lt; +\infty$. It can be shown that the function $r_0(\kappa)$ is monotonic increasing from 0 (inclusive) to 1 (exclusive), so there is a unique root if and only if $0 \le r &lt; 1$. (The code should probably check for $r=1$.) A simple and useful bound of the function $r_0(x)$ is given by Amos (1974, Equation 11). From those bounds one can bound of the solution $\kappa^\ast$ by

$$ \frac{2r}{\sqrt{1-r^2}}\le \kappa^\ast \le \frac{2r}{1-r^2} $$

2/ One can shown $\hat{\kappa}=0$ is indeed the MLE if $r \le 0$ because the partial derivative of the log likelihood function with respect to $\kappa$ is always negative.

3/ The code avoids $\kappa=0$ and returns a tiny positive number instead. But it seems for this distribution, the PDF etc work perfectly fine if $\kappa=0$, so it might as well allow that to simplify some corner cases.

@dschmitz89
Copy link
Contributor Author

dschmitz89 commented Feb 17, 2024

@fancidev : The bounds for the root finding problem are a nice finding. PR very much welcome :). Just be aware that for values larger than the current right end of the bracket, i1e/i0e always returns 1, so that should stay the maximum in case the computed bound is larger. Implementing a special function for $\frac{I_{v+1}(x)}{I_v(x)}$ according to the Amos paper would be the best solution in the long run but there are more important things happening in scipy.special world at the moment.

About the $\kappa=0$ case: this edge case handling has proven to be tricky. See for example this issue regarding the Von Mises distribution: https://github.com/scipy/scipy/issues/18166#issuecomment-1474230620

As the distribution infrastructure is being overhauled at the moment, I would like to leave that as it is at the moment. The new infrastructure might make it easier to dispatch to another distribution if it arises as a special case of one distribution.

@fancidev
Copy link
Contributor

@dschmitz89 Let me make a PR for the bounds. I’ll leave the $r=1$ case as is but add a comment. The situation $r=1$ happens when all observations are equal (to the mean), in which case the “precision” parameter $\kappa$ is infinite. This is the analog to fitting a normal distribution to data where all observations are equal; the current SciPy implementation sets scale to 0 in that case.

I’ll attempt a separate PR to close gh-18166 as it seems able to fix with a simple fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants