Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add the negative binomial distribution to rand_distr. #1296

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

WarrenWeckesser
Copy link
Collaborator

No description provided.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The negative binomial makes no sense for r=0. But we can generalise to p=0, since our output is floating-point which has a representation for infinity?

The code style looks good.

As for the implementation, all I can say is that it correlates with the mentioned reference, 10.1007/978-1-4613-8643-8. I tried comparing with 10.1002/0471715816 (Johnson 2005, Univariate Discrete Distributions), but my understanding of statistics is lacking. Perhaps @saona-raimundo would be willing to take a look?

@WarrenWeckesser
Copy link
Collaborator Author

But we can generalise to p=0, since our output is floating-point which has a representation for infinity?

If p=0, the probability mass function of the distribution would be $p_k = 0$ for all $k$. This is not a valid probability distribution. I don't think returning inf in this case is a meaningful generalization.

@saona-raimundo
Copy link
Collaborator

Hi!
I am okay with p=0 in this case, especially considering the interpretation of the Negative Binomial

When `r` is an integer, the negative binomial distribution can be interpreted as the distribution of the number of failures in a sequence of Bernoulli trials that continue until `r` successes occur.

Note that Wikipedia accepts p = 0, even if the probability mass function does not make sense in that case.

Regarding the implementation, I confirm it corresponds to the citation.
Although testing the shape of the density is hard, see #357 for a discussion, have you checked that the behaviour of the implementation is the expected one?

@dhardy, I will check your reference 10.1002/0471715816 to see if there are improved algorithms proposed.

// and saved in the NegativeBinomial instance, because it
// depends on just the parameters `r` and `p`. We have to
// create a new Poisson instance for each variate generated.
Poisson::<F>::new(gamma.sample(rng)).unwrap().sample(rng)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of unwrap, one should take care of the case where gamma.sample(rng) returns a float which should not be accepted.
I suggest introducing a loop which samples until one gets a finite sample.
The Float trait has the method is_finite for this.

Copy link
Collaborator

@vks vks May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this happen? The Gamma distribution should return strictly positive values, so Poisson::new should never fail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my point was about handling infinity as the result of simulating the gamma variable.

Nothing ensures that a Gamma samples always a positive and finite float.
Not is the signature of the sample method nor in its documentation.

At some point there was a discussion about who should handle infinity out of the simulation: the library or the user. I thought the decision was that the user should handle infinity floats, maybe I am wrong. This is why I suggest handling a possible infinite value here with a loop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I need to fix this. If p is extremely small (e.g. 1e-40), then the scale passed to Gamma is huge (1e+40), and with such a scale, Gamma will generate samples that are infinity.

A simple loop would not be safe if we don't have a bound on how frequently infinity is generated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was not even thinking on extreme values, just the unwrap.
The thing is, if gamma samples infinity, then the Poisson "should" also infinity.
Then, instead of a loop, one should introduce an if checking if the gamma samples infinity.
If it does, return infinity directly, if it does not, sample the Poisson (created with new and unwrap).

@saona-raimundo
Copy link
Collaborator

The reference 10.1002/0471715816 "Univariate Discrete Distributions" is really nice!
From pages 221-222, this is the description for the simulation of the negative binomial random variable.

The negative binomial with an integer parameter k = N can be generated as
the sum of N geometric rv’s. Except for low values of N (say N = 2, 3, 4), this
method cannot be advocated as it requires many uniforms for a single output
negative binomial rv. This argument applies a fortiori to the use of the sum of a
Poisson number of logarithmic rv’s.

The method generally recommended for generating negative binomial rv’s
with changing parameters is to generate Poisson rv’s with random parameters
drawn from a gamma distribution [see, e.g., algorithm NB3 in Fishman (1978)].
For fixed parameters the use of a fast general method, such as indexed table
look-up, alias, or frequency table, is recommended.

The reference Fishman, G. S. (1978). Principles of Discrete Event Simulation, New York: Wiley.
is not that easy to find online, but we can assume that they refer to the same method implemented in the PR.

If I am not mistaken, rand_distr does not generally implement distributions by table look-ups, so I think the PR is the way to go.

@vks
Copy link
Collaborator

vks commented Apr 25, 2023

For the normal distribution, we are using tables (Ziggurat algorithm).

However, I agree that the approach here is fine.

@dhardy
Copy link
Member

dhardy commented Apr 25, 2023

Great, and thanks for the review. Then are we agreed to merge this (once the above is corrected)? I didn't review in detail.

Copy link
Collaborator

@vks vks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! We just need to update the changelog.

@dhardy
Copy link
Member

dhardy commented May 3, 2023

Looks ready to merge @WarrenWeckesser?

@WarrenWeckesser
Copy link
Collaborator Author

@dhardy, I'm still looking into the issue that @saona-raimundo raised here. I'm checking how extreme values of the parameters might break things.

@dhardy
Copy link
Member

dhardy commented Feb 8, 2024

@WarrenWeckesser are you still working on this?

@WarrenWeckesser
Copy link
Collaborator Author

I've been away from this (and most of my other open source work) for much of last year, but I haven't forgotten about it. I have a project that I need to finish up before I can get back to this. That might take a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants