-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add von Mises-Fisher distribution #17624
Conversation
Maling list entry: https://mail.python.org/archives/list/scipy-dev@python.org/message/FCAHRSPS2KG37PZ47WRNVQIATQSNGJZ5/ So far, one positive reply. |
Thanks for linking the discussion. I don't see any replies for the API discussion, but it's ok per our process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dschmitz89 Sorry for the delay here. I had another look pulling the changes and playing with it. I think it's in good shape and did not notice strange things. I took the liberty to push some cleanup.
In terms of tests, I think we are missing the 2D case against a reference. Assuming mpmath is ok, then I would suggest adding a test for 2D here.
Something else to consider are testing extreme cases of kappa
to have either a concentration around a single point or a perfectly uniform sample. Here we still need to give some guidelines for at least the max value. e.g. I can easily use 1e10 but 1e20 hangs (consider erroring out as well).
On that note, kappa=0
returns NaN with some warnings. I don't think we want that. I would either error out with a nice message, or fallback to a perfectly uniform sample on the hypersphere.
Last point for me would be to document the maths. The 3D and rejection sampling code are a bit obscure as they are. Adding some reference and explanation would help maintain this in the future.
@chrisb83 I think you worked on I think it's in good shape and could be merged once my points above have been addressed. So I am not asking you to review line by line, more check that we did not miss anything obvious with the infra, API or the maths. |
Thanks!
This should be covered by the test against the 2D von Mises distribution: see here.
Thanks for the hint. I was able to squeeze out a higher range for possible values for
The case of
I added some more comments. The rejection sampler is a bit involved though and I did not really dig through the reference fully. |
I suspect that particular ratio is not the problem. But in the line that follows, you compute |
Thanks, this helped a lot. I also reformulated the other logarithmic term containing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tupui I did not work with this distribution before, so I cannot help much. I just did a quick review of the documentation which looks very nice. I have left a few minor comments.
this is definitely a nice addition to SciPy.
---------- | ||
mu : array_like | ||
Mean direction of the distribution. Must be a one-dimensional unit | ||
vector of norm 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Euclidean norm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Do you think this should be added to the docstrings?
vector of norm 1. | ||
kappa : float | ||
Concentration parameter. Must be positive. | ||
seed : {None, int, np.random.RandomState, np.random.Generator}, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we use the keyword seed
for multivariate dists? rvs
has random_state
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, throughout multivariate
, it is called seed
, unlike for univariate distributions.
samples = np.squeeze(samples) | ||
return samples | ||
|
||
def _rejection_sampling(self, dim, kappa, size, random_state): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity: do you know how well rejection sampling works for this distribution? i guess it gets slower as the number of dimensions increases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, sampling time approximately increases linearly with the number of dimensions. Generating 10,000 samples in 20 dimensions still only takes about 7 ms on my machine.
Short summary of the state of this PR from my side: implementation and documetation are done in my opinion. One limitation is that only random variate generation works for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you everyone, approving on my side.
I would propose to ask one last time on the mailing list and if there are no further comments, we would merge one week later.
Sent: https://mail.python.org/archives/list/scipy-dev@python.org/message/HE737ZZUWJYRBJ7KQEYOBHJFNHQQH7KY/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice contribution, thank you!
I noticed that one comment from another viewer on the pdf
method was resolved without the text changing; just making double sure that's correct, since the docstring still mentions "log".
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
…into vonmisesfisher
Alright, a week it is an comments have been addressed. Thank you again @dschmitz89 for the PR and pushing this through the finish line. And thanks everyone else for helping out 🎉 |
Reference issue
Closes gh-17463
What does this implement/fix?
Adds the von Mises-Fisher distribution to
scipy.stats
. This distribution is the most common analogue of the normal distribution on the unit sphere, likely the most important directional distribution. It is for example available in geomstats and tensorflow.The random variate generation is basically a numpyfied version of the geomstats implementation.
Additional information
This still has some sharp edges:
doccer
does not work, thats why it is commented out at the moment. This would be the most urgent thing to look at first.I implemented a
fit
method for the distribution. So far, no other multivariate distribution has such a method so this should be reviewed carefully.For a rough idea what this distribution looks like: these are plots of the PDF (top row) and 20 random variates (bottom) for three different concentrations.
CC @nguigs : since you authored this distribution for geomstats and I heavily reused parts of it, thought I would ask if you would be willing to help out with review here. Its a quite big PR, so every bit would help. Thanks in advance.