New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add negative binomial distribution #113

Merged

boxtown merged 7 commits into statrs-dev:master from danyx23:negative-binomial

Jul 6, 2020

Contributor

danyx23 commented Jun 18, 2020

I'm working on some epidemiology related code and needed the negative binomial distribution so I implemented it, following the MathNet.Numerics implementation.

I added test cases along the binomial distribution that is already implemented but currently have one issue left in that test_discrete fails if the comments are removed that disable it at the moment. It seems that the assumptions that are built into check_sum_pmf_is_cdf are not met with this distribution (maybe because the negative binomial distribution is well defined for non-integer steps of x? My statistics knowledge is too weak to say).

Please let me know if you like this PR in principle. If you do I'll try to come up with a way to test sampling from the distribution that does not rely on check_sum_pmf_is_cdf as it is currently implemented. Or do you have an idea for how best to test this?

danyx23 added 4 commits

June 18, 2020 21:24


          distributions: add negative binomial distribution

92107b6


          distributions::negative_binomial: rustfmt

8bf35f6


          distributions::negative_binomial: minor doc fixes

4cf2b02


          distributions::negative_binomial: choose more interesting CDF test va…

5fbac94

…lues

boxtown reviewed

View reviewed changes

Collaborator

boxtown left a comment

Appreciate the work, looks good in general! Just a couple comments.

Also the failing test_discrete tests are most likely due to the internal implementations of pmf and cdf. We should treat the input as discrete, I believe numpy solves this by floor-ing the input which I think is the behavior we'd want here as well

src/distribution/negative_binomial.rs Outdated

+                      if p.is_nan() || p < 0.0 || p > 1.0 || r.is_nan() || r < 0.0 {
+                          Err(StatsError::BadParams)
+                      } else {
+                          Ok(NegativeBinomial { p: p, r: r })

Collaborator

boxtown Jun 23, 2020

should be able to short-hand this NegativeBinomial { p, r })

Contributor Author

danyx23 Jun 30, 2020

Didn't know about this one, thanks!

src/distribution/negative_binomial.rs

+                  /// let r = NegativeBinomial::new(5.0, 0.5).unwrap();
+                  /// assert_eq!(r.p(), 0.5);
+                  /// ```
+                  pub fn p(&self) -> f64 {

Collaborator

boxtown Jun 23, 2020

This is fine for now but I'm definitely kicking myself for using 1 letter variables like p etc. for these methods names in other distributions. Just making a note for myself to potentially update these to be more readable in the future

Contributor Author

danyx23 Jun 30, 2020

Yes but then again all of math literature uses single letter variables :). If you feel like changing this for the next major release let me know and I can help.

src/distribution/negative_binomial.rs Outdated

+              impl Distribution<u64> for NegativeBinomial {
+                  fn sample<R: Rng + ?Sized>(&self, r: &mut R) -> u64 {
+                      let lambda_gen = Gamma::new(self.r, self.p).unwrap();

Collaborator

boxtown Jun 23, 2020

let's use gamma::sample_unchecked here, we don't need to instantiate a new Gamma distribution

Contributor Author

danyx23 Jun 30, 2020

good point

src/distribution/negative_binomial.rs Outdated

+                      let lambda_gen = Gamma::new(self.r, self.p).unwrap();
+                      let lambda = lambda_gen.sample(r);
+                      let c = (-lambda).exp();

Collaborator

boxtown Jun 23, 2020

I think my preference here would be to expose poisson::sample_unchecked and use that similar to how Numpy does it.

Contributor Author

danyx23 Jun 30, 2020

Ah, this is definitely more elegant. However, Poisson returns an f64 instead of an u64 - is there a reason for this? I can floor the value but it kind of seems like an error in Poisson although fixing it would be a breaking change. Is there a reason for this?

Contributor

vks Jun 30, 2020

Sampling the Poisson distributions typically requires evaluating continuous functions, so f64 has to be used internally.

src/distribution/negative_binomial.rs Outdated

+              }
+              impl Univariate<u64, f64> for NegativeBinomial {
+                  /// Calulcates the cumulative distribution function for the

Collaborator

boxtown Jun 23, 2020

Calculates

src/distribution/negative_binomial.rs Outdated

+                  ///
+                  /// where `I_(x)(a, b)` is the regularized incomplete beta function
+                  fn cdf(&self, x: f64) -> f64 {
+                      // NOTE: this differes from the MathNet.Numerics implementation that throws

Collaborator

boxtown Jun 23, 2020

I think this behavior is fine and is consistent with the implementation of Binomial, we can probably remove the comment though

src/distribution/negative_binomial.rs

+                  /// ```
+                  fn ln_pmf(&self, k: u64) -> f64 {
+                      let k_as_float = k as f64;
+                      let x = gamma::ln_gamma(self.r + k_as_float)

Collaborator

boxtown Jun 23, 2020

are we able to re-use factorial::ln_binomial here?

Contributor Author

danyx23 Jun 30, 2020

I don't know - my math knowledge is not deep enough to answer this

src/distribution/negative_binomial.rs

+              #[cfg_attr(rustfmt, rustfmt_skip)]
+              #[cfg(test)]
+              mod test {

Collaborator

boxtown Jun 23, 2020 •

edited

I think we need to indent this inner block

src/distribution/negative_binomial.rs

+              /// ```
+              #[derive(Debug, Copy, Clone, PartialEq)]
+              pub struct NegativeBinomial {
+                  r: f64,

Collaborator

boxtown Jun 23, 2020

this should probably be a u64

Contributor Author

danyx23 Jun 30, 2020

No this is deliberately a real value: https://en.wikipedia.org/wiki/Negative_binomial_distribution#Alternative_formulations - most libraries allow this and there are alternative formulations (e.g. specifying the negative binomial using mean and probability) that require r to be real.


          Address PR comments

ca55f01

Contributor Author

danyx23 commented Jun 30, 2020

Thanks for your feedback! I addressed the things I could immediately resolve. I'll look into the failing unit test now and see if I can resolve this issue by flooring as you suggest.

I like your suggestion for using the gamma and poisson distributions but as I said above, it seems strange that poisson returns an f64. Is there a reason for this? The current floor + cast is not so nice and I am a bit worried about the numeric stability of it at larger values although I haven't considered in depth if this really can be a problem.

danyx23 added 2 commits

June 30, 2020 22:34


          distributions::negative_binomial: floor x in CDF calculation which al…

a2b2cb6

…lows the test_discrete tests to pass


          distributions::negative_binomial: Minor cleanup

fce2450

Contributor Author

danyx23 commented Jun 30, 2020

Floor did the trick, thanks! I think this completes the PR from my side. Let me know how to proceed about the Poisson sample_unchecked handling and if you want me to squash all my commits into one and force push this branch again so it only shows up as a single commit of if you are fine with merging as is once everything else is done. Thanks!

boxtown approved these changes

View reviewed changes

Collaborator

boxtown left a comment

lgtm

boxtown merged commit 660eb9e into statrs-dev:master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment