ENH: stats.ttest_ind: add degrees of freedom and confidence interval #18210

mdhaber · 2023-03-29T02:25:15Z

Reference issue

Closes gh-15906
Followup of gh-16902, gh-16835

What does this implement/fix?

This adds a df attribute and confidence_interval method to the result object returned by ttest_ind.
It also wraps ttest_ind with the _axis_nan_policy decorator, adding support for masked arrays and axis tuples, cleaning up edge cases, etc.

Additional information

Adding df and confidence_interval for a permutation t-test is possible but not in scope. Theoretically, adding df is easy because it is the same as for a regular t-test, but because the calculation of df is buried deep within a different branch when doing a permutation t-test, returning it without recomputing it would require a lot of plumbing. Rather than burden the reviewer with additional complexity, I've documented the limitation. Adding a permutation test confidence interval is a valid feature request but best left up to the gh-18067 effort.

While writing this PR, I found two potential bugs. Investigating/fixing these is not in scope of this PR because they seem to be bugs in main (but I will open issues to track them once I get a better understanding of them):

Our equal_var=True trimmed t-test with unequal sample sizes does not seem to match the results of the R multicon library yuenContrast function. More specifically, the degrees of freedom added by this PR do match, but the statistic and p-value that we already report in main don't match. I've xfailed the relevant test. Update: I'm no longer concerned by this. See ENH: support trimming in ttest_ind #13696 (comment).
I tried to run ttest_ind's confidence_interval method (that is, a wrapper like lambda x, y, *args, **kwargs: stats.ttest_ind(x, y, *args, **kwargs).confidence_level() through the _axis_nan_policy suite of tests, but I got some failures with nan_policy='propagate'. I think this is a problem with TTestResult, which is already in main, not this PR, because I get the same sorts of failures with ttest_rel. Update: fixed by MAINT: stats.TTestResult: fix NaN bug in ttest confidence intervals #18222 and successfully added test here.

…val`

mdhaber

Some self-review to help the reviewer.

mdhaber · 2023-03-29T02:27:30Z

scipy/stats/_stats_py.py

@@ -7118,50 +7118,8 @@ def ttest_ind_from_stats(mean1, std1, nobs1, mean2, std2, nobs2,
    return Ttest_indResult(*res)


-def _ttest_nans(a, b, axis, namedtuple_type):


This is no longer used, and the _axis_nan_policy approach makes it obsolete, so it can be removed. TBH, it looks like _broadcast_shapes_with_dropped_axis and _shape_with_dropped_axis can be removed, too.

I agree. Feel free to remove those.

I removed _broadcast_shapes and replaced it with np.broadcast_shapes, too. (np.broadcast_shapes was new in NumPy 1.20.0, so we didn't have that to work with.)

mdhaber · 2023-03-29T02:30:20Z

scipy/stats/_stats_py.py

-    a, b, axis = _chk2_asarray(a, b, axis)
-
-    # check both a and b
-    cna, npa = _contains_nan(a, nan_policy)
-    cnb, npb = _contains_nan(b, nan_policy)
-    contains_nan = cna or cnb
-    if npa == 'omit' or npb == 'omit':
-        nan_policy = 'omit'
-
-    if contains_nan and nan_policy == 'omit':
-        if permutations or trim != 0:
-            raise ValueError("nan-containing/masked inputs with "
-                             "nan_policy='omit' are currently not "
-                             "supported by permutation tests or "
-                             "trimmed tests.")
-        a = ma.masked_invalid(a)
-        b = ma.masked_invalid(b)
-        return mstats_basic.ttest_ind(a, b, axis, equal_var, alternative)
-
    if a.size == 0 or b.size == 0:
-        return _ttest_nans(a, b, axis, Ttest_indResult)


This is no longer needed due to the addition of the _axis_nan_policy decorator.

mdhaber · 2023-03-29T02:31:24Z

scipy/stats/_stats_py.py

+        # when nan_policy='omit', `df` can be different for different axis-slices
+        df = np.broadcast_to(df, t.shape)[()]


Copied from ttest_rel. If it was needed there, it is needed here.

mdhaber · 2023-03-29T02:32:03Z