ENH: add von Mises-Fisher distribution #17624

dschmitz89 · 2022-12-17T20:02:15Z

Reference issue

What does this implement/fix?

Adds the von Mises-Fisher distribution to scipy.stats. This distribution is the most common analogue of the normal distribution on the unit sphere, likely the most important directional distribution. It is for example available in geomstats and tensorflow.

The random variate generation is basically a numpyfied version of the geomstats implementation.

Additional information

This still has some sharp edges:

for some reason the doc formatting via doccer does not work, thats why it is commented out at the moment. This would be the most urgent thing to look at first.

I implemented a fit method for the distribution. So far, no other multivariate distribution has such a method so this should be reviewed carefully.

For a rough idea what this distribution looks like: these are plots of the PDF (top row) and 20 random variates (bottom) for three different concentrations.

CC @nguigs : since you authored this distribution for geomstats and I heavily reused parts of it, thought I would ask if you would be willing to help out with review here. Its a quite big PR, so every bit would help. Thanks in advance.

dschmitz89 · 2023-03-22T07:26:23Z

Maling list entry: https://mail.python.org/archives/list/scipy-dev@python.org/message/FCAHRSPS2KG37PZ47WRNVQIATQSNGJZ5/

So far, one positive reply.

tupui · 2023-03-22T09:15:24Z

Thanks for linking the discussion. I don't see any replies for the API discussion, but it's ok per our process.

tupui

@dschmitz89 Sorry for the delay here. I had another look pulling the changes and playing with it. I think it's in good shape and did not notice strange things. I took the liberty to push some cleanup.

In terms of tests, I think we are missing the 2D case against a reference. Assuming mpmath is ok, then I would suggest adding a test for 2D here.

Something else to consider are testing extreme cases of kappa to have either a concentration around a single point or a perfectly uniform sample. Here we still need to give some guidelines for at least the max value. e.g. I can easily use 1e10 but 1e20 hangs (consider erroring out as well).

On that note, kappa=0 returns NaN with some warnings. I don't think we want that. I would either error out with a nice message, or fallback to a perfectly uniform sample on the hypersphere.

Last point for me would be to document the maths. The 3D and rejection sampling code are a bit obscure as they are. Adding some reference and explanation would help maintain this in the future.

tupui · 2023-04-06T11:53:52Z

@chrisb83 I think you worked on vonmises_gen, can I interest you in having a look at this PR?

I think it's in good shape and could be merged once my points above have been addressed. So I am not asking you to review line by line, more check that we did not miss anything obvious with the infra, API or the maths.

dschmitz89 · 2023-04-06T20:40:34Z

@dschmitz89 Sorry for the delay here. I had another look pulling the changes and playing with it. I think it's in good shape and did not notice strange things. I took the liberty to push some cleanup.

Thanks!

In terms of tests, I think we are missing the 2D case against a reference. Assuming mpmath is ok, then I would suggest adding a test for 2D here.

This should be covered by the test against the 2D von Mises distribution: see here.

Something else to consider are testing extreme cases of kappa to have either a concentration around a single point or a perfectly uniform sample. Here we still need to give some guidelines for at least the max value. e.g. I can easily use 1e10 but 1e20 hangs (consider erroring out as well).

Thanks for the hint. I was able to squeeze out a higher range for possible values for $\kappa$ when sampling with $d>3$. For even higher values, what is needed is an accurate method to evaluate $\frac{1-x}{1+x}$ at $x\approx 0$ (occurence in the code here). If anyone has an idea, I would gladly implement it.

On that note, kappa=0 returns NaN with some warnings. I don't think we want that. I would either error out with a nice message, or fallback to a perfectly uniform sample on the hypersphere.

The case of $\kappa=0$ is treated extra now: it errors out with a custom message.

Last point for me would be to document the maths. The 3D and rejection sampling code are a bit obscure as they are. Adding some reference and explanation would help maintain this in the future.

I added some more comments. The rejection sampler is a bit involved though and I did not really dig through the reference fully.

WarrenWeckesser · 2023-04-06T21:17:26Z

For even higher values, what is needed is an accurate method to evaluate $\frac{1-x}{1+x}$ at $x\approx 0$ (occurence in the code here)

I suspect that particular ratio is not the problem. But in the line that follows, you compute np.log(1. - node ** 2). That subtraction will lose precision when node is near 1, which corresponds to envelop_param (i.e. $x$ in the comment above) being small. You can rewrite that expression to maintain precision:

$$\log\left(1 - \left(\frac{1-x}{1+x}\right)^2\right) = \log\left(\frac{4x}{(1 + x)^2}\right) = \log(4) + \log(x) - 2\textrm{log1p}(x)$$

dschmitz89 · 2023-04-06T21:49:05Z

For even higher values, what is needed is an accurate method to evaluate 1−x1+x at x≈0 (occurence in the code here)

I suspect that particular ratio is not the problem. But in the line that follows, you compute np.log(1. - node ** 2). That subtraction will lose precision when node is near 1, which corresponds to envelop_param (i.e. x in the comment above) being small. You can rewrite that expression to maintain precision:

log⁡(1−(1−x1+x)2)=log⁡(4x(1+x)2)=log⁡(4)+log⁡(x)−2log1p(x)

Thanks, this helped a lot. I also reformulated the other logarithmic term containing $\frac{1-x}{1+x}$, now sampling is easily possible for $\kappa=10^{30}$. The fitting method does not have such a high range due to the issues with special.ive from #18088 though.

chrisb83

@tupui I did not work with this distribution before, so I cannot help much. I just did a quick review of the documentation which looks very nice. I have left a few minor comments.

this is definitely a nice addition to SciPy.

chrisb83 · 2023-04-09T09:09:53Z

scipy/stats/_multivariate.py

+    ----------
+    mu : array_like
+        Mean direction of the distribution. Must be a one-dimensional unit
+        vector of norm 1.


Euclidean norm?

Yes. Do you think this should be added to the docstrings?

chrisb83 · 2023-04-09T09:10:38Z

scipy/stats/_multivariate.py

+        vector of norm 1.
+    kappa : float
+        Concentration parameter. Must be positive.
+    seed : {None, int, np.random.RandomState, np.random.Generator}, optional


do we use the keyword seed for multivariate dists? rvs has random_state.

Indeed, throughout multivariate, it is called seed, unlike for univariate distributions.

chrisb83 · 2023-04-09T09:25:01Z

scipy/stats/_multivariate.py

+            samples = np.squeeze(samples)
+        return samples
+
+    def _rejection_sampling(self, dim, kappa, size, random_state):


out of curiosity: do you know how well rejection sampling works for this distribution? i guess it gets slower as the number of dimensions increases?

Yup, sampling time approximately increases linearly with the number of dimensions. Generating 10,000 samples in 20 dimensions still only takes about 7 ms on my machine.

dschmitz89 · 2023-04-09T20:48:42Z

Short summary of the state of this PR from my side: implementation and documetation are done in my opinion. One limitation is that only random variate generation works for $\kappa > 1e9$, the other methods fail due to a problem with the ive (see #18088 ).

tupui

Thank you everyone, approving on my side.

I would propose to ask one last time on the mailing list and if there are no further comments, we would merge one week later.

dschmitz89 · 2023-04-10T10:50:12Z

Thank you everyone, approving on my side.

I would propose to ask one last time on the mailing list and if there are no further comments, we would merge one week later.

Sent: https://mail.python.org/archives/list/scipy-dev@python.org/message/HE737ZZUWJYRBJ7KQEYOBHJFNHQQH7KY/

stefanv

Nice contribution, thank you!

I noticed that one comment from another viewer on the pdf method was resolved without the text changing; just making double sure that's correct, since the docstring still mentions "log".

scipy/stats/_multivariate.py

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

…into vonmisesfisher

tupui · 2023-04-17T11:57:19Z

Alright, a week it is an comments have been addressed. Thank you again @dschmitz89 for the PR and pushing this through the finish line. And thanks everyone else for helping out 🎉

dschmitz89 added 26 commits December 2, 2022 21:01

First draft of VMF

7766830

Add VMF in init

679b540

First tests for VMF

5446aec

More tests

26d31ce

Test entropy

12f75fd

Broadcasting checks

ca789d3

LINT

2a30809

Add fit method

e4d17cd

Merge branch 'main' into vonmisesfisher

4dd1394

Fix rejection sampler and test fit

b9aa9af

Fix rejection sampler and test fit

961ccf6

Extended fit testing

1c233ae

Extended fit testing

e5dfd52

More tests

544bbbd

Lint

5418e0f

Merge branch 'main' into vonmisesfisher

b896e05

Fix merge conflicts

e31e4f2

Switch to numpy for beta rvs generation

8662154

Remove lists from rejection sampler

885faf4

Lint

739161b

Add some comments

d4ed369

Improve fitting test

9b53b1b

Merge branch 'main' into vonmisesfisher

f4bfdfe

First draft of parameter documentation

b0c44b9

Add comment deriving the log norm factor

b555bd9

Comment out doccer autoformatting of docs for now

d223982

dschmitz89 changed the title ~~Vonmisesfisher~~ ENH: add von Mises-Fisher distribution Dec 17, 2022

j-bowhay added scipy.stats enhancement A new feature or improvement labels Dec 18, 2022

Fix reference syntax

d8f3635

dschmitz89 and others added 3 commits March 26, 2023 15:32

Merge branch 'main' into vonmisesfisher

f69b235

Fix mpmath test code and rename dimension -> dimensionality

cde02eb

MAINT/DOC: various doc and cleanup

20fce00

tupui requested changes Apr 6, 2023

View reviewed changes

dschmitz89 added 5 commits April 6, 2023 20:35

Merge branch 'main' into vonmisesfisher

774d9ca

error out for kappa=0

2cd7223

merge main

9e22180

DOC: more comments in random variate generation

8ae8279

ENH: increase available range for random variate generation for dim>3

2a31ca4

ENH: increase available range for kappa

eae361b

chrisb83 reviewed Apr 9, 2023

View reviewed changes

tupui approved these changes Apr 10, 2023

View reviewed changes

stefanv reviewed Apr 10, 2023

View reviewed changes

scipy/stats/_multivariate.py Show resolved Hide resolved

scipy/stats/_multivariate.py Outdated Show resolved Hide resolved

tupui and others added 4 commits April 11, 2023 11:17

DOC: fix docstring. [skip ci]

bf52459

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

Merge branch 'main' into vonmisesfisher

e0d0e83

DOC: mention that 1/kappa is a measure of variance [skip ci]

d0f3020

Merge branch 'vonmisesfisher' of https://github.com/dschmitz89/scipy …

0603dba

…into vonmisesfisher

tupui merged commit 2242bf8 into scipy:main Apr 17, 2023
26 checks passed

dschmitz89 mentioned this pull request Apr 26, 2023

ENH: add fit method to multivariate_normal #18361

Merged

dschmitz89 deleted the vonmisesfisher branch July 18, 2023 21:47

dschmitz89 mentioned this pull request Jul 23, 2023

ENH: Jones and Faddy Skew-T distribution #18948

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add von Mises-Fisher distribution #17624

ENH: add von Mises-Fisher distribution #17624

dschmitz89 commented Dec 17, 2022 •

edited

dschmitz89 commented Mar 22, 2023

tupui commented Mar 22, 2023

tupui left a comment •

edited

tupui commented Apr 6, 2023

dschmitz89 commented Apr 6, 2023

WarrenWeckesser commented Apr 6, 2023

dschmitz89 commented Apr 6, 2023

chrisb83 left a comment

chrisb83 Apr 9, 2023

dschmitz89 Apr 9, 2023

chrisb83 Apr 9, 2023

dschmitz89 Apr 9, 2023

chrisb83 Apr 9, 2023

dschmitz89 Apr 9, 2023

dschmitz89 commented Apr 9, 2023

tupui left a comment

dschmitz89 commented Apr 10, 2023

stefanv left a comment •

edited

tupui commented Apr 17, 2023

ENH: add von Mises-Fisher distribution #17624

ENH: add von Mises-Fisher distribution #17624

Conversation

dschmitz89 commented Dec 17, 2022 • edited

Reference issue

What does this implement/fix?

Additional information

dschmitz89 commented Mar 22, 2023

tupui commented Mar 22, 2023

tupui left a comment • edited

Choose a reason for hiding this comment

tupui commented Apr 6, 2023

dschmitz89 commented Apr 6, 2023

WarrenWeckesser commented Apr 6, 2023

dschmitz89 commented Apr 6, 2023

chrisb83 left a comment

Choose a reason for hiding this comment

chrisb83 Apr 9, 2023

Choose a reason for hiding this comment

dschmitz89 Apr 9, 2023

Choose a reason for hiding this comment

chrisb83 Apr 9, 2023

Choose a reason for hiding this comment

dschmitz89 Apr 9, 2023

Choose a reason for hiding this comment

chrisb83 Apr 9, 2023

Choose a reason for hiding this comment

dschmitz89 Apr 9, 2023

Choose a reason for hiding this comment

dschmitz89 commented Apr 9, 2023

tupui left a comment

Choose a reason for hiding this comment

dschmitz89 commented Apr 10, 2023

stefanv left a comment • edited

Choose a reason for hiding this comment

tupui commented Apr 17, 2023

dschmitz89 commented Dec 17, 2022 •

edited

tupui left a comment •

edited

stefanv left a comment •

edited