Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add CI and str to Dunnett's test #18150

Merged
merged 20 commits into from
Mar 30, 2023
Merged

ENH: add CI and str to Dunnett's test #18150

merged 20 commits into from
Mar 30, 2023

Conversation

tupui
Copy link
Member

@tupui tupui commented Mar 14, 2023

Follow up of #18107

Adds confidence interval around the mean based on the "allowance" and also provide a cleaner str.

@tupui tupui added scipy.stats enhancement A new feature or improvement labels Mar 14, 2023
@tupui tupui added this to the 1.11.0 milestone Mar 14, 2023
@tupui tupui requested a review from mdhaber March 14, 2023 16:30
scipy/stats/_multicomp.py Show resolved Hide resolved
scipy/stats/tests/test_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/_multicomp.py Outdated Show resolved Hide resolved
Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we've discovered, testing with multiple problems is important when we're dealing with this noisy function and can't use tight tolerances. Please add a parameterized test over the four test cases we used for p-values.

scipy/stats/_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/_multicomp.py Show resolved Hide resolved
scipy/stats/_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/_multicomp.py Show resolved Hide resolved
scipy/stats/_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_multicomp.py Outdated Show resolved Hide resolved
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
Copy link
Member Author

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matt, I will add the parametrisation 👍

scipy/stats/_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_multicomp.py Outdated Show resolved Hide resolved
@rkern
Copy link
Member

rkern commented Mar 23, 2023

I've done some investigation of alternatives, and using minimize_scalar() like this tends to have the most error and function evaluations. Since the problem is one of root-finding rather than minimization per se, recasting it as a minimization problem seems to just consume more resources for less payoff. Using the brentq() root-finder, off-the-shelf, seems to consistently do better on both counts. It's not specifically adapted for stochastic functions, but as mentioned in the aforementioned thread, if the tolerance is chosen appropriately to match the noise level, it usually won't do too bad. Unfortunately, there is no real guidance to use to convert between the two. But in practice, it seems to work okay on this problem for reasonable choices of the tolerance.

Even better is a variant of the noisy bisection algorithm given in that blog post using a secant instead of pure bisection. One of the real benefits of this algorithm is that it is self-adaptive to the noise level of function and will converge as soon as the random noise violates the monotonicity constraint. This is a sensible place to stop, so it doesn't need extra checking after the fact. If a typical use of minimize_scalar() uses about 18 evaluations, brentq will use about 10, and the noisy secant will use about 7.

There are a few things that we can bring from the Bornkamp paper even if we don't implement the local-linear regression root-finder. I do have an implementation of it, and it can lower the error a fair bit (rigorously), but it does so at the cost of additional function evaluations, at least for the convergence tolerances used in the paper and mvtnorm. Relaxing those a little gets more reasonable function evaluations, but no real improvements over the noisy secant method.

One thing we can bring over is the use of the ppf() of a relevant distribution to make the curve more of a straight line with more of a slope. When we get to high confidence levels like 0.99 or 0.999, we are well into the tail of the CDF where it gets hard to invert, even when it's smooth and deterministic. Using stats.t.ppf(x, df=df) makes this curve nearly a straight line with a nice not-close-to-zero slope that is easier to root-find on, particularly with a secant method. Bornkamp uses either norm.ppf() or cauchy.ppf() (when df<=7), but most likely just because he's reusing R's GLM capabilities that have those as named link functions. t.ppf(x, df=df) smoothly interpolates between those cases for us. This really helps out with the noisy secant method, but is relatively neutral with minimize_scalar(), and improves brentq() a little bit.

The second thing is to use the bracketing interval Bornkamp derives. This helps place put us in the right region rather than having to search for one using the defaults. Specifically, if we are looking for x that gives a CDF value of p in our multivariate_t.cdf(x, df=df), then it must be between stats.t.ppf(p, df=df) and stats.t.ppf(p ** (1/ndim), df=df). These correspond to the cases when all of the dimensions in the correlation matrix are perfectly correlated (all elements 1) or perfectly independent (all non-diagonals 0). All other cases are intermediate. Having these bounds opens up the use of the superior root-finding methods, which require starting bounds that bracket the root.

I am happy to clean up my investigation notebook if there is interest.

@tupui
Copy link
Member Author

tupui commented Mar 23, 2023

@rkern thank you for investigating.

What I could do here is to use brentq, but I need the bracket then. Feel free to make a code suggestion here.

For the rest, I would suggest leaving it for a follow up if you are interesting in doing that.

@tupui
Copy link
Member Author

tupui commented Mar 23, 2023

Ok if I am not mistaking, I addressed all suggestions.

scipy/stats/_multicomp.py Outdated Show resolved Hide resolved
@tupui
Copy link
Member Author

tupui commented Mar 23, 2023

I would not care about this lint error. It's a false positive because we name this differently in the docstrings. But if you want I can add a # noqa: D417 on the line

scipy/stats/_multicomp.py:189:5: D417 Missing argument description in the docstring: `*samples`

@mdhaber
Copy link
Contributor

mdhaber commented Mar 23, 2023

I am happy to clean up my investigation notebook if there is interest.

Yes, definitely would be interested in a PR! The PPF transform idea and use of analytical bounds with an existing bracketing root finder sound like very simple improvements to this. I'd be curious to see how much the tolerances in the tests can be tightened with that alone (and I understand that it would also improve speed). After that, noisy bisection/secant ideas also sound good. I've thought before that optimize might benefit from rootfinders that take advantage of certain assumptions (e.g. these rely on monotonicity), but I'd understand if we don't want to expand SciPy's scope in this direction right now.

scipy/stats/_multicomp.py Outdated Show resolved Hide resolved
@mdhaber
Copy link
Contributor

mdhaber commented Mar 23, 2023

OK, added the test. Please correct the one-sided confidence interval direction and adjust the tests (I flipped them so that I could check that the rest of the test was correct). Then I'll come back and compare the math against the reference and check the pretty-printing.

@@ -24,10 +27,167 @@
class DunnettResult:
statistic: np.ndarray
pvalue: np.ndarray
_alternative: Literal['two-sided', 'less', 'greater']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These private attributes are showing up in __repr__. I'd suggest hiding them as in EmpiricalDistributionFunction'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm are you sure? If we say __repr__ should allow to build back the object, then we would need (at least some of) the private attributes.

I don't have strong feelings on this one, I am not sure since we have a pretty __str__

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we say repr should allow to build back the object

I think of that as the ideal distinction between __repr__ and __str__ if there is to be one. I would prefer to break that rule than to expose private attributes, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me. I personally very rarely use this behaviour to recreate objects. And I am not sure people would want to do that in this case. We can try what you propose and can re-evaluate if we get user requests.

Copy link
Contributor

@mdhaber mdhaber Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can't re-create the object anyway because it's private. Er... doing so is not supported.

Comment on lines 61 to 62
"\nOne-sided alternative (less): sample i's mean exceed "
"the control's mean by an amount at least Lower CI"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether I'm a fan of this. R prints something like it, right? My concern is that it purports to interpret what a confidence interval is, but it doesn't really give the detail that is needed for a correct understanding. You could ask me to get over it, or we could wordsmith this to death, or we could just remove it : )

If we keep it, I would prefer "the difference between the mean of sample i and the mean of the control is..." I think that the order of subtraction is unambiguous, but it is easier to parse a negative difference than a negative amount for "exceeds".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually used the wording from Dunnett's article. I am always struggling with what alternative means and thought it could help users to be "more confident" about what they are doing. I've seen something a bit similar in Stata (or something like that) where the output of a test is written in a plain text fashion. You could argue though that we are targeting more integrators than end users and this goes a bit beyond our scope.

I am fine with both removing or adjusting the messaging. Can you make a code suggestion (and you can push it as I am ok with both options here)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

His wording was fine, but you would then need to add a caveat like he does every time he makes statements like these: "The joint statement consisting of the ... conclusions has a confidence coefficient of 95%." Without this, it looks like a statement of 100% certainty. And I'm not sure if we want to write that caveat verbatim, so we would need to wordsmith that.

Copy link
Member Author

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matt! I will make the change for the repr and CI in the doc.

scipy/stats/_multicomp.py Show resolved Hide resolved
@@ -24,10 +27,167 @@
class DunnettResult:
statistic: np.ndarray
pvalue: np.ndarray
_alternative: Literal['two-sided', 'less', 'greater']
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me. I personally very rarely use this behaviour to recreate objects. And I am not sure people would want to do that in this case. We can try what you propose and can re-evaluate if we get user requests.

Comment on lines 61 to 62
"\nOne-sided alternative (less): sample i's mean exceed "
"the control's mean by an amount at least Lower CI"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually used the wording from Dunnett's article. I am always struggling with what alternative means and thought it could help users to be "more confident" about what they are doing. I've seen something a bit similar in Stata (or something like that) where the output of a test is written in a plain text fashion. You could argue though that we are targeting more integrators than end users and this goes a bit beyond our scope.

I am fine with both removing or adjusting the messaging. Can you make a code suggestion (and you can push it as I am ok with both options here)?

scipy/stats/_multicomp.py Show resolved Hide resolved
scipy/stats/_multicomp.py Outdated Show resolved Hide resolved
Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a few adjustments, this looks good. I'll push an update that implements the suggestions here, and when CI is finished I can merge.

scipy/stats/tests/test_multicomp.py Show resolved Hide resolved
scipy/stats/tests/test_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/_multicomp.py Show resolved Hide resolved
scipy/stats/_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_multicomp.py Outdated Show resolved Hide resolved
Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, attributes of DunnettResult should be documented, but I'll leave that to a follow-up.

image

scipy/stats/_multicomp.py Outdated Show resolved Hide resolved
scipy/stats/_multicomp.py Outdated Show resolved Hide resolved
@mdhaber mdhaber merged commit bc73228 into scipy:main Mar 30, 2023
@mdhaber
Copy link
Contributor

mdhaber commented Mar 30, 2023

Thanks @tupui. Please add documentation of DunnettResult attributes in a follow-up. See #18150 (comment) - it sounds like it's worth experimenting with the ppf transform and bracketing rootfinder @rkern mentioned. Those changes sound be pretty simple to make, but they could make this a lot faster and more robust.

@tupui tupui deleted the dunnett_ci branch March 30, 2023 11:33
@tupui
Copy link
Member Author

tupui commented Mar 30, 2023

Thanks Matt, I will make the follow up.

For the optimization part, I would set a lower priority vs other topics since the current solution is reasonable.

@mdhaber
Copy link
Contributor

mdhaber commented Apr 4, 2023

Re: #18150 (comment)

I got this working (at least for alternative='less' and alternative='greater'. I think some modification is needed for alternative='two-sided'.

@rkern any chance that

then it must be between stats.t.ppf(p, df=df) and stats.t.ppf(p ** (1/ndim), df=df)

is supposed to be

then it must be between stats.t.ppf(p, df=df) and stats.t.ppf(1 - (1-p)**(1/ndim), df=df)

? That's how I got a valid bracket in my (sloppy) tests 6449a15.

Could you point out the part of the paper where this is derived? What tolerance did you use with the bracketing rootfinders? (If you linked to your notebook, it got buried.)

@rkern
Copy link
Member

rkern commented Apr 4, 2023

Specifically, if we are looking for x that gives a CDF value of p in our multivariate_t.cdf(x, df=df)

For what I wrote, p is the area under the relevant integral for the confidence level, not the p-value. For a one-sided CI, that's p = confidence_level (flipping the sign later if needed). In the two-sided case, p = confidence_level + (1 - confidence_level) / 2. This lets us work with integrals from -inf to some_t_value and thus use ppf().

There's no formal derivation written out per se, but one can think through the two extreme cases. When all dimensions are completely correlated, the ndim-d distribution is really just a 1-d distribution rotated into ndim-d space, so the integral reduces to the 1-d case, giving stats.t.ppf(p, df=df) as the relevant bound. That one's easy.

In the completely independent case, the ndim-d t-distribution is just the product of ndim 1-d t-distributions. You can move each dimension's integrand outside of all of the other integrals, so the result of integrating the ndim-d t-distribution is just the 1-d distribution's integral to the power of ndim. So if we want a final ndim-d integral to give us p, we can look up the bound that gets that by passing p**(1/ndim) to the 1-d t-distribution's ppf().

@mdhaber
Copy link
Contributor

mdhaber commented Apr 4, 2023

Ah ok, thanks. Makes sense. I'll play with that a bit more.

@mdhaber
Copy link
Contributor

mdhaber commented Apr 4, 2023

But it would still be great if you'd submit a PR with those few changes (PPF transform and use of analytical bounds with an existing bracketing root).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants