ENH: add CI and str to Dunnett's test #18150

tupui · 2023-03-14T16:30:24Z

Follow up of #18107

Adds confidence interval around the mean based on the "allowance" and also provide a cleaner str.

scipy/stats/_multicomp.py

scipy/stats/tests/test_multicomp.py

[skip circle]

scipy/stats/_multicomp.py

mdhaber

As we've discovered, testing with multiple problems is important when we're dealing with this noisy function and can't use tight tolerances. Please add a parameterized test over the four test cases we used for p-values.

scipy/stats/_multicomp.py

scipy/stats/tests/test_multicomp.py

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

tupui

Thanks Matt, I will add the parametrisation 👍

scipy/stats/_multicomp.py

scipy/stats/tests/test_multicomp.py

rkern · 2023-03-23T17:35:57Z

I've done some investigation of alternatives, and using minimize_scalar() like this tends to have the most error and function evaluations. Since the problem is one of root-finding rather than minimization per se, recasting it as a minimization problem seems to just consume more resources for less payoff. Using the brentq() root-finder, off-the-shelf, seems to consistently do better on both counts. It's not specifically adapted for stochastic functions, but as mentioned in the aforementioned thread, if the tolerance is chosen appropriately to match the noise level, it usually won't do too bad. Unfortunately, there is no real guidance to use to convert between the two. But in practice, it seems to work okay on this problem for reasonable choices of the tolerance.

Even better is a variant of the noisy bisection algorithm given in that blog post using a secant instead of pure bisection. One of the real benefits of this algorithm is that it is self-adaptive to the noise level of function and will converge as soon as the random noise violates the monotonicity constraint. This is a sensible place to stop, so it doesn't need extra checking after the fact. If a typical use of minimize_scalar() uses about 18 evaluations, brentq will use about 10, and the noisy secant will use about 7.

There are a few things that we can bring from the Bornkamp paper even if we don't implement the local-linear regression root-finder. I do have an implementation of it, and it can lower the error a fair bit (rigorously), but it does so at the cost of additional function evaluations, at least for the convergence tolerances used in the paper and mvtnorm. Relaxing those a little gets more reasonable function evaluations, but no real improvements over the noisy secant method.

One thing we can bring over is the use of the ppf() of a relevant distribution to make the curve more of a straight line with more of a slope. When we get to high confidence levels like 0.99 or 0.999, we are well into the tail of the CDF where it gets hard to invert, even when it's smooth and deterministic. Using stats.t.ppf(x, df=df) makes this curve nearly a straight line with a nice not-close-to-zero slope that is easier to root-find on, particularly with a secant method. Bornkamp uses either norm.ppf() or cauchy.ppf() (when df<=7), but most likely just because he's reusing R's GLM capabilities that have those as named link functions. t.ppf(x, df=df) smoothly interpolates between those cases for us. This really helps out with the noisy secant method, but is relatively neutral with minimize_scalar(), and improves brentq() a little bit.

The second thing is to use the bracketing interval Bornkamp derives. This helps place put us in the right region rather than having to search for one using the defaults. Specifically, if we are looking for x that gives a CDF value of p in our multivariate_t.cdf(x, df=df), then it must be between stats.t.ppf(p, df=df) and stats.t.ppf(p ** (1/ndim), df=df). These correspond to the cases when all of the dimensions in the correlation matrix are perfectly correlated (all elements 1) or perfectly independent (all non-diagonals 0). All other cases are intermediate. Having these bounds opens up the use of the superior root-finding methods, which require starting bounds that bracket the root.

I am happy to clean up my investigation notebook if there is interest.

tupui · 2023-03-23T17:50:48Z

@rkern thank you for investigating.

What I could do here is to use brentq, but I need the bracket then. Feel free to make a code suggestion here.

For the rest, I would suggest leaving it for a follow up if you are interesting in doing that.

scipy/stats/tests/test_multicomp.py

tupui · 2023-03-23T18:50:41Z

Ok if I am not mistaking, I addressed all suggestions.

scipy/stats/_multicomp.py

tupui · 2023-03-23T19:18:03Z

I would not care about this lint error. It's a false positive because we name this differently in the docstrings. But if you want I can add a # noqa: D417 on the line

scipy/stats/_multicomp.py:189:5: D417 Missing argument description in the docstring: `*samples`

mdhaber · 2023-03-23T19:34:33Z

I am happy to clean up my investigation notebook if there is interest.

Yes, definitely would be interested in a PR! The PPF transform idea and use of analytical bounds with an existing bracketing root finder sound like very simple improvements to this. I'd be curious to see how much the tolerances in the tests can be tightened with that alone (and I understand that it would also improve speed). After that, noisy bisection/secant ideas also sound good. I've thought before that optimize might benefit from rootfinders that take advantage of certain assumptions (e.g. these rely on monotonicity), but I'd understand if we don't want to expand SciPy's scope in this direction right now.

scipy/stats/tests/test_multicomp.py

scipy/stats/_multicomp.py

[skip ci]

scipy/stats/tests/test_multicomp.py

mdhaber · 2023-03-23T21:34:17Z

OK, added the test. Please correct the one-sided confidence interval direction and adjust the tests (I flipped them so that I could check that the rest of the test was correct). Then I'll come back and compare the math against the reference and check the pretty-printing.

mdhaber · 2023-03-27T21:21:18Z

scipy/stats/_multicomp.py

@@ -24,10 +27,167 @@
 class DunnettResult:
    statistic: np.ndarray
    pvalue: np.ndarray
+    _alternative: Literal['two-sided', 'less', 'greater']


These private attributes are showing up in __repr__. I'd suggest hiding them as in EmpiricalDistributionFunction'.

mmm are you sure? If we say __repr__ should allow to build back the object, then we would need (at least some of) the private attributes.

I don't have strong feelings on this one, I am not sure since we have a pretty __str__

If we say repr should allow to build back the object

I think of that as the ideal distinction between __repr__ and __str__ if there is to be one. I would prefer to break that rule than to expose private attributes, though.

Fine by me. I personally very rarely use this behaviour to recreate objects. And I am not sure people would want to do that in this case. We can try what you propose and can re-evaluate if we get user requests.

They can't re-create the object anyway because it's private. Er... doing so is not supported.

scipy/stats/_multicomp.py

mdhaber · 2023-03-27T21:35:56Z

scipy/stats/_multicomp.py

+                "\nOne-sided alternative (less): sample i's mean exceed "
+                "the control's mean by an amount at least Lower CI"


I'm not sure whether I'm a fan of this. R prints something like it, right? My concern is that it purports to interpret what a confidence interval is, but it doesn't really give the detail that is needed for a correct understanding. You could ask me to get over it, or we could wordsmith this to death, or we could just remove it : )

If we keep it, I would prefer "the difference between the mean of sample i and the mean of the control is..." I think that the order of subtraction is unambiguous, but it is easier to parse a negative difference than a negative amount for "exceeds".

I actually used the wording from Dunnett's article. I am always struggling with what alternative means and thought it could help users to be "more confident" about what they are doing. I've seen something a bit similar in Stata (or something like that) where the output of a test is written in a plain text fashion. You could argue though that we are targeting more integrators than end users and this goes a bit beyond our scope.

I am fine with both removing or adjusting the messaging. Can you make a code suggestion (and you can push it as I am ok with both options here)?

His wording was fine, but you would then need to add a caveat like he does every time he makes statements like these: "The joint statement consisting of the ... conclusions has a confidence coefficient of 95%." Without this, it looks like a statement of 100% certainty. And I'm not sure if we want to write that caveat verbatim, so we would need to wordsmith that.

scipy/stats/_multicomp.py

scipy/stats/tests/test_multicomp.py

tupui

Thanks Matt! I will make the change for the repr and CI in the doc.

scipy/stats/_multicomp.py

tupui · 2023-03-28T12:47:01Z

scipy/stats/_multicomp.py

@@ -24,10 +27,167 @@
 class DunnettResult:
    statistic: np.ndarray
    pvalue: np.ndarray
+    _alternative: Literal['two-sided', 'less', 'greater']


Fine by me. I personally very rarely use this behaviour to recreate objects. And I am not sure people would want to do that in this case. We can try what you propose and can re-evaluate if we get user requests.

tupui · 2023-03-28T12:50:28Z

scipy/stats/_multicomp.py

+                "\nOne-sided alternative (less): sample i's mean exceed "
+                "the control's mean by an amount at least Lower CI"


I actually used the wording from Dunnett's article. I am always struggling with what alternative means and thought it could help users to be "more confident" about what they are doing. I've seen something a bit similar in Stata (or something like that) where the output of a test is written in a plain text fashion. You could argue though that we are targeting more integrators than end users and this goes a bit beyond our scope.

I am fine with both removing or adjusting the messaging. Can you make a code suggestion (and you can push it as I am ok with both options here)?

scipy/stats/_multicomp.py

[skip cirrus] [skip azp]

[skip actions] [skip azp] [skip circle]

mdhaber

With a few adjustments, this looks good. I'll push an update that implements the suggestions here, and when CI is finished I can merge.

scipy/stats/tests/test_multicomp.py

scipy/stats/_multicomp.py

scipy/stats/tests/test_multicomp.py

mdhaber

Also, attributes of DunnettResult should be documented, but I'll leave that to a follow-up.

scipy/stats/_multicomp.py

[skip ci]

mdhaber · 2023-03-30T07:02:52Z

Thanks @tupui. Please add documentation of DunnettResult attributes in a follow-up. See #18150 (comment) - it sounds like it's worth experimenting with the ppf transform and bracketing rootfinder @rkern mentioned. Those changes sound be pretty simple to make, but they could make this a lot faster and more robust.

tupui · 2023-03-30T11:36:26Z

Thanks Matt, I will make the follow up.

For the optimization part, I would set a lower priority vs other topics since the current solution is reasonable.

mdhaber · 2023-04-04T01:31:38Z

Re: #18150 (comment)

I got this working (at least for alternative='less' and alternative='greater'. I think some modification is needed for alternative='two-sided'.

@rkern any chance that

then it must be between stats.t.ppf(p, df=df) and stats.t.ppf(p ** (1/ndim), df=df)

is supposed to be

then it must be between stats.t.ppf(p, df=df) and stats.t.ppf(1 - (1-p)**(1/ndim), df=df)

? That's how I got a valid bracket in my (sloppy) tests 6449a15.

Could you point out the part of the paper where this is derived? What tolerance did you use with the bracketing rootfinders? (If you linked to your notebook, it got buried.)

rkern · 2023-04-04T01:53:08Z

Specifically, if we are looking for x that gives a CDF value of p in our multivariate_t.cdf(x, df=df)

For what I wrote, p is the area under the relevant integral for the confidence level, not the p-value. For a one-sided CI, that's p = confidence_level (flipping the sign later if needed). In the two-sided case, p = confidence_level + (1 - confidence_level) / 2. This lets us work with integrals from -inf to some_t_value and thus use ppf().

There's no formal derivation written out per se, but one can think through the two extreme cases. When all dimensions are completely correlated, the ndim-d distribution is really just a 1-d distribution rotated into ndim-d space, so the integral reduces to the 1-d case, giving stats.t.ppf(p, df=df) as the relevant bound. That one's easy.

In the completely independent case, the ndim-d t-distribution is just the product of ndim 1-d t-distributions. You can move each dimension's integrand outside of all of the other integrals, so the result of integrating the ndim-d t-distribution is just the 1-d distribution's integral to the power of ndim. So if we want a final ndim-d integral to give us p, we can look up the bound that gets that by passing p**(1/ndim) to the 1-d t-distribution's ppf().

mdhaber · 2023-04-04T01:53:48Z

Ah ok, thanks. Makes sense. I'll play with that a bit more.

mdhaber · 2023-04-04T01:59:55Z

But it would still be great if you'd submit a PR with those few changes (PPF transform and use of analytical bounds with an existing bracketing root).

ENH: add CI and str to Dunnett's test

5539957

tupui added scipy.stats enhancement A new feature or improvement labels Mar 14, 2023

tupui added this to the 1.11.0 milestone Mar 14, 2023

tupui requested a review from mdhaber March 14, 2023 16:30

rkern reviewed Mar 14, 2023

View reviewed changes

scipy/stats/_multicomp.py Show resolved Hide resolved

scipy/stats/tests/test_multicomp.py Outdated Show resolved Hide resolved

tupui added 2 commits March 20, 2023 21:10

TST: compare CI against R

da44517

ENH: add warning for convergence of CI.

7eee8b0

[skip circle]

tupui force-pushed the dunnett_ci branch from ba3b93a to 7eee8b0 Compare March 20, 2023 20:27

MAINT/TST: rework the tolerance and fix rng in CI.

b746f1e

[skip circle]

mdhaber reviewed Mar 22, 2023

View reviewed changes

scipy/stats/_multicomp.py Outdated Show resolved Hide resolved

mdhaber reviewed Mar 23, 2023

View reviewed changes

DOC/STY: wording and some styling. [skip ci]

d586b77

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

tupui commented Mar 23, 2023

View reviewed changes

MAINT: simplify check and clean tests

3b1f3a9

tupui added 2 commits March 23, 2023 19:06

FIX: allowance computation taking into account number of samples in sqrt

f0453ee

TST: add another example and restructure cases

5be95ab

tupui commented Mar 23, 2023

View reviewed changes

scipy/stats/tests/test_multicomp.py Outdated Show resolved Hide resolved

TST: remove duplicate sample definition and clean str test

19c1252

tupui commented Mar 23, 2023

View reviewed changes

scipy/stats/_multicomp.py Outdated Show resolved Hide resolved

mdhaber reviewed Mar 23, 2023

View reviewed changes

scipy/stats/tests/test_multicomp.py Show resolved Hide resolved

mdhaber reviewed Mar 23, 2023

View reviewed changes

scipy/stats/_multicomp.py Outdated Show resolved Hide resolved

mdhaber added 2 commits March 23, 2023 14:05

TST: stats.dunnett: condense tests, add tests

d53c8a4

[skip ci]

TST: stats.dunnett: simplify changes to existing tests

92b8362

[skip ci]

mdhaber reviewed Mar 23, 2023

View reviewed changes

scipy/stats/tests/test_multicomp.py Outdated Show resolved Hide resolved

mdhaber reviewed Mar 27, 2023

View reviewed changes

scipy/stats/_multicomp.py Show resolved Hide resolved

mdhaber reviewed Mar 27, 2023

View reviewed changes

scipy/stats/_multicomp.py Show resolved Hide resolved

mdhaber reviewed Mar 27, 2023

View reviewed changes

scipy/stats/tests/test_multicomp.py Show resolved Hide resolved

tupui commented Mar 28, 2023

View reviewed changes

tupui and others added 8 commits March 28, 2023 17:44

Merge remote-tracking branch 'upstream/main' into dunnett_ci

cf84cc5

MAINT/DOC: hide private attributes from str and improve docstrings.

f5e0ae0

[skip cirrus] [skip azp]

Merge remote-tracking branch 'origin/dunnett_ci' into dunnett_ci

0923c40

FIX: invert less and greater in CI.

083cefb

[skip actions] [skip azp] [skip circle]

FIX: CI for single sample case

d029ed1

TST: test CI alternatives with large differences.

c4a853e

[skip actions] [skip azp] [skip circle]

TST: group low/high samples into single test for CI alternative.

bed425e

[skip actions] [skip azp] [skip circle]

MAINT: stats.dunnett: tweak tests and pretty-print

4b41138

mdhaber approved these changes Mar 29, 2023

View reviewed changes

mdhaber reviewed Mar 30, 2023

View reviewed changes

scipy/stats/_multicomp.py Outdated Show resolved Hide resolved

scipy/stats/_multicomp.py Outdated Show resolved Hide resolved

DOC: stats.dunnett: adjust confidence_interval documentation.

7945e1a

[skip ci]

mdhaber merged commit bc73228 into scipy:main Mar 30, 2023

tupui deleted the dunnett_ci branch March 30, 2023 11:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add CI and str to Dunnett's test #18150

ENH: add CI and str to Dunnett's test #18150

tupui commented Mar 14, 2023

mdhaber left a comment

tupui left a comment

rkern commented Mar 23, 2023

tupui commented Mar 23, 2023

tupui commented Mar 23, 2023

tupui commented Mar 23, 2023 •

edited

mdhaber commented Mar 23, 2023 •

edited

mdhaber commented Mar 23, 2023 •

edited

mdhaber Mar 27, 2023

tupui Mar 27, 2023

mdhaber Mar 27, 2023

tupui Mar 28, 2023

mdhaber Mar 30, 2023 •

edited

mdhaber Mar 27, 2023

tupui Mar 28, 2023

mdhaber Mar 30, 2023

tupui left a comment

tupui Mar 28, 2023

tupui Mar 28, 2023

mdhaber left a comment

mdhaber left a comment

mdhaber commented Mar 30, 2023

tupui commented Mar 30, 2023

mdhaber commented Apr 4, 2023 •

edited

rkern commented Apr 4, 2023

mdhaber commented Apr 4, 2023 •

edited

mdhaber commented Apr 4, 2023

		"\nOne-sided alternative (less): sample i's mean exceed "
		"the control's mean by an amount at least Lower CI"

ENH: add CI and str to Dunnett's test #18150

ENH: add CI and str to Dunnett's test #18150

Conversation

tupui commented Mar 14, 2023

mdhaber left a comment

Choose a reason for hiding this comment

tupui left a comment

Choose a reason for hiding this comment

rkern commented Mar 23, 2023

tupui commented Mar 23, 2023

tupui commented Mar 23, 2023

tupui commented Mar 23, 2023 • edited

mdhaber commented Mar 23, 2023 • edited

mdhaber commented Mar 23, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdhaber Mar 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tupui left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdhaber left a comment

Choose a reason for hiding this comment

mdhaber left a comment

Choose a reason for hiding this comment

mdhaber commented Mar 30, 2023

tupui commented Mar 30, 2023

mdhaber commented Apr 4, 2023 • edited

rkern commented Apr 4, 2023

mdhaber commented Apr 4, 2023 • edited

mdhaber commented Apr 4, 2023

tupui commented Mar 23, 2023 •

edited

mdhaber commented Mar 23, 2023 •

edited

mdhaber commented Mar 23, 2023 •

edited

mdhaber Mar 30, 2023 •

edited

mdhaber commented Apr 4, 2023 •

edited

mdhaber commented Apr 4, 2023 •

edited