New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add negative binomial distribution #113
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate the work, looks good in general! Just a couple comments.
Also the failing test_discrete
tests are most likely due to the internal implementations of pmf
and cdf
. We should treat the input as discrete, I believe numpy solves this by floor
-ing the input which I think is the behavior we'd want here as well
if p.is_nan() || p < 0.0 || p > 1.0 || r.is_nan() || r < 0.0 { | ||
Err(StatsError::BadParams) | ||
} else { | ||
Ok(NegativeBinomial { p: p, r: r }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be able to short-hand this NegativeBinomial { p, r })
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't know about this one, thanks!
/// let r = NegativeBinomial::new(5.0, 0.5).unwrap(); | ||
/// assert_eq!(r.p(), 0.5); | ||
/// ``` | ||
pub fn p(&self) -> f64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for now but I'm definitely kicking myself for using 1 letter variables like p
etc. for these methods names in other distributions. Just making a note for myself to potentially update these to be more readable in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but then again all of math literature uses single letter variables :). If you feel like changing this for the next major release let me know and I can help.
|
||
impl Distribution<u64> for NegativeBinomial { | ||
fn sample<R: Rng + ?Sized>(&self, r: &mut R) -> u64 { | ||
let lambda_gen = Gamma::new(self.r, self.p).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use gamma::sample_unchecked
here, we don't need to instantiate a new Gamma
distribution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
let lambda_gen = Gamma::new(self.r, self.p).unwrap(); | ||
let lambda = lambda_gen.sample(r); | ||
|
||
let c = (-lambda).exp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my preference here would be to expose poisson::sample_unchecked
and use that similar to how Numpy does it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is definitely more elegant. However, Poisson returns an f64 instead of an u64 - is there a reason for this? I can floor the value but it kind of seems like an error in Poisson although fixing it would be a breaking change. Is there a reason for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sampling the Poisson distributions typically requires evaluating continuous functions, so f64
has to be used internally.
} | ||
|
||
impl Univariate<u64, f64> for NegativeBinomial { | ||
/// Calulcates the cumulative distribution function for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calculates
/// | ||
/// where `I_(x)(a, b)` is the regularized incomplete beta function | ||
fn cdf(&self, x: f64) -> f64 { | ||
// NOTE: this differes from the MathNet.Numerics implementation that throws |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this behavior is fine and is consistent with the implementation of Binomial, we can probably remove the comment though
/// ``` | ||
fn ln_pmf(&self, k: u64) -> f64 { | ||
let k_as_float = k as f64; | ||
let x = gamma::ln_gamma(self.r + k_as_float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we able to re-use factorial::ln_binomial
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know - my math knowledge is not deep enough to answer this
|
||
#[cfg_attr(rustfmt, rustfmt_skip)] | ||
#[cfg(test)] | ||
mod test { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to indent this inner block
/// ``` | ||
#[derive(Debug, Copy, Clone, PartialEq)] | ||
pub struct NegativeBinomial { | ||
r: f64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably be a u64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is deliberately a real value: https://en.wikipedia.org/wiki/Negative_binomial_distribution#Alternative_formulations - most libraries allow this and there are alternative formulations (e.g. specifying the negative binomial using mean and probability) that require r to be real.
Thanks for your feedback! I addressed the things I could immediately resolve. I'll look into the failing unit test now and see if I can resolve this issue by flooring as you suggest. I like your suggestion for using the gamma and poisson distributions but as I said above, it seems strange that poisson returns an f64. Is there a reason for this? The current floor + cast is not so nice and I am a bit worried about the numeric stability of it at larger values although I haven't considered in depth if this really can be a problem. |
…lows the test_discrete tests to pass
Floor did the trick, thanks! I think this completes the PR from my side. Let me know how to proceed about the Poisson sample_unchecked handling and if you want me to squash all my commits into one and force push this branch again so it only shows up as a single commit of if you are fine with merging as is once everything else is done. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I'm working on some epidemiology related code and needed the negative binomial distribution so I implemented it, following the MathNet.Numerics implementation.
I added test cases along the binomial distribution that is already implemented but currently have one issue left in that test_discrete fails if the comments are removed that disable it at the moment. It seems that the assumptions that are built into check_sum_pmf_is_cdf are not met with this distribution (maybe because the negative binomial distribution is well defined for non-integer steps of x? My statistics knowledge is too weak to say).
Please let me know if you like this PR in principle. If you do I'll try to come up with a way to test sampling from the distribution that does not rely on check_sum_pmf_is_cdf as it is currently implemented. Or do you have an idea for how best to test this?