Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add negative binomial distribution #113

Merged
merged 7 commits into from Jul 6, 2020

Conversation

danyx23
Copy link
Contributor

@danyx23 danyx23 commented Jun 18, 2020

I'm working on some epidemiology related code and needed the negative binomial distribution so I implemented it, following the MathNet.Numerics implementation.

I added test cases along the binomial distribution that is already implemented but currently have one issue left in that test_discrete fails if the comments are removed that disable it at the moment. It seems that the assumptions that are built into check_sum_pmf_is_cdf are not met with this distribution (maybe because the negative binomial distribution is well defined for non-integer steps of x? My statistics knowledge is too weak to say).

Please let me know if you like this PR in principle. If you do I'll try to come up with a way to test sampling from the distribution that does not rely on check_sum_pmf_is_cdf as it is currently implemented. Or do you have an idea for how best to test this?

Copy link
Collaborator

@boxtown boxtown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the work, looks good in general! Just a couple comments.

Also the failing test_discrete tests are most likely due to the internal implementations of pmf and cdf. We should treat the input as discrete, I believe numpy solves this by floor-ing the input which I think is the behavior we'd want here as well

if p.is_nan() || p < 0.0 || p > 1.0 || r.is_nan() || r < 0.0 {
Err(StatsError::BadParams)
} else {
Ok(NegativeBinomial { p: p, r: r })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be able to short-hand this NegativeBinomial { p, r })

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know about this one, thanks!

/// let r = NegativeBinomial::new(5.0, 0.5).unwrap();
/// assert_eq!(r.p(), 0.5);
/// ```
pub fn p(&self) -> f64 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now but I'm definitely kicking myself for using 1 letter variables like p etc. for these methods names in other distributions. Just making a note for myself to potentially update these to be more readable in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but then again all of math literature uses single letter variables :). If you feel like changing this for the next major release let me know and I can help.


impl Distribution<u64> for NegativeBinomial {
fn sample<R: Rng + ?Sized>(&self, r: &mut R) -> u64 {
let lambda_gen = Gamma::new(self.r, self.p).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use gamma::sample_unchecked here, we don't need to instantiate a new Gamma distribution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

let lambda_gen = Gamma::new(self.r, self.p).unwrap();
let lambda = lambda_gen.sample(r);

let c = (-lambda).exp();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my preference here would be to expose poisson::sample_unchecked and use that similar to how Numpy does it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is definitely more elegant. However, Poisson returns an f64 instead of an u64 - is there a reason for this? I can floor the value but it kind of seems like an error in Poisson although fixing it would be a breaking change. Is there a reason for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sampling the Poisson distributions typically requires evaluating continuous functions, so f64 has to be used internally.

}

impl Univariate<u64, f64> for NegativeBinomial {
/// Calulcates the cumulative distribution function for the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calculates

///
/// where `I_(x)(a, b)` is the regularized incomplete beta function
fn cdf(&self, x: f64) -> f64 {
// NOTE: this differes from the MathNet.Numerics implementation that throws
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this behavior is fine and is consistent with the implementation of Binomial, we can probably remove the comment though

/// ```
fn ln_pmf(&self, k: u64) -> f64 {
let k_as_float = k as f64;
let x = gamma::ln_gamma(self.r + k_as_float)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we able to re-use factorial::ln_binomial here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know - my math knowledge is not deep enough to answer this


#[cfg_attr(rustfmt, rustfmt_skip)]
#[cfg(test)]
mod test {
Copy link
Collaborator

@boxtown boxtown Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to indent this inner block

/// ```
#[derive(Debug, Copy, Clone, PartialEq)]
pub struct NegativeBinomial {
r: f64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be a u64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is deliberately a real value: https://en.wikipedia.org/wiki/Negative_binomial_distribution#Alternative_formulations - most libraries allow this and there are alternative formulations (e.g. specifying the negative binomial using mean and probability) that require r to be real.

@danyx23
Copy link
Contributor Author

danyx23 commented Jun 30, 2020

Thanks for your feedback! I addressed the things I could immediately resolve. I'll look into the failing unit test now and see if I can resolve this issue by flooring as you suggest.

I like your suggestion for using the gamma and poisson distributions but as I said above, it seems strange that poisson returns an f64. Is there a reason for this? The current floor + cast is not so nice and I am a bit worried about the numeric stability of it at larger values although I haven't considered in depth if this really can be a problem.

@danyx23
Copy link
Contributor Author

danyx23 commented Jun 30, 2020

Floor did the trick, thanks! I think this completes the PR from my side. Let me know how to proceed about the Poisson sample_unchecked handling and if you want me to squash all my commits into one and force push this branch again so it only shows up as a single commit of if you are fine with merging as is once everything else is done. Thanks!

Copy link
Collaborator

@boxtown boxtown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boxtown boxtown merged commit 660eb9e into statrs-dev:master Jul 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants