BUG: scipy.stats.wilcoxon in 1.13 fails on 2D array with nan -- 1D works #20591

JonathanShor · 2024-04-26T21:59:35Z

Describe your issue.

A regression seemingly introduced in 1.13. A 2D array with NaNs in some columns ValueErrors out as soon as it reaches a column with a NaN. It must be truly 2D, i.e. A.shape[1] > 1 is True. It also only occurs when there are more than 50 rows, i.e. A.shape[0] > 50.

Passing just that column with the NaNs, i.e. passing it as 1D, operates as expected. For example, stats.wilcoxon(A[:50,:] or stats.wilcoxon(A[:,1]) after the code example below runs fine.

Reproducing Code Example

from scipy import stats
import numpy as np

rng = np.random.default_rng(123)
A = rng.normal(size=(51,2))
A[5,1] = np.nan
stats.wilcoxon(A)

Error message

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/isilon/LFMI/VMdrive/Jonathan/PriorWeightingMooney/Visual/ECoG/Code/.venv/lib/python3.11/site-packages/scipy/_lib/_util.py", line 794, in wrapper
    return fun(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^
  File "/isilon/LFMI/VMdrive/Jonathan/PriorWeightingMooney/Visual/ECoG/Code/.venv/lib/python3.11/site-packages/scipy/stats/_axis_nan_policy.py", line 603, in axis_nan_policy_wrapper
    res = np.apply_along_axis(hypotest_fun, axis=0, arr=x)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/isilon/LFMI/VMdrive/Jonathan/PriorWeightingMooney/Visual/ECoG/Code/.venv/lib/python3.11/site-packages/numpy/lib/shape_base.py", line 402, in apply_along_axis
    buff[ind] = asanyarray(func1d(inarr_view[ind], *args, **kwargs))
    ~~~~^^^^^
ValueError: could not broadcast input array from shape (2,) into shape (3,)

SciPy/NumPy/Python version and system information

>>> import sys, scipy, numpy; print(scipy.__version__, numpy.__version__, sys.version_info); scipy.show_config()

1.13.0 1.26.4 sys.version_info(major=3, minor=11, micro=5, releaselevel='final', serial=0)
Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /usr/local/include
    lib directory: /usr/local/lib
    name: openblas
    openblas configuration: USE_64BITINT=0 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= ZEN MAX_THREADS=64
    pc file directory: /usr/local/lib/pkgconfig
    version: 0.3.26.dev
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /usr/local/include
    lib directory: /usr/local/lib
    name: openblas
    openblas configuration: USE_64BITINT=0 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= ZEN MAX_THREADS=64
    pc file directory: /usr/local/lib/pkgconfig
    version: 0.3.26.dev
  pybind11:
    detection method: config-tool
    include directory: unknown
    name: pybind11
    version: 2.12.0
Compilers:
  c:
    commands: cc
    linker: ld.bfd
    name: gcc
    version: 10.2.1
  c++:
    commands: c++
    linker: ld.bfd
    name: gcc
    version: 10.2.1
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 3.0.10
  fortran:
    commands: gfortran
    linker: ld.bfd
    name: gcc
    version: 10.2.1
  pythran:
    include directory: ../../tmp/pip-build-env-ilw0s1wu/overlay/lib/python3.11/site-packages/pythran
    version: 0.15.0
Machine Information:
  build:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
  cross-compiled: false
  host:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
Python Information:
  path: /opt/python/cp311-cp311/bin/python
  version: '3.11'

The text was updated successfully, but these errors were encountered:

mdhaber · 2024-04-26T22:09:04Z

I'll fix this shortly. In the meantime, please use:

stats.wilcoxon(A, method='approx')

The problem is just with method='auto' (and method='exact' if there are zeros).

JonathanShor · 2024-04-27T00:23:17Z

I'll fix this shortly. In the meantime, please use:
stats.wilcoxon(A, method='approx')
The problem is just with method='auto' (and method='exact' if there are zeros).

Should I expect this to have worse performance than 1.12 in some situations? This is what I'm seeing, tho haven't set up a formal test. 1.12 using method='auto' is about 35% faster than 1.13 using method='approx'.

mdhaber · 2024-04-27T00:44:59Z

What is the shape of the input, and does it have NaNs?
Try stats.wilcoxon(A, _no_deco=True) if you want a speed boost, but it will behave as though it omits NaNs.

But yes - in specific situations, you may observe a decrease in speed. Between 1.12 and 1.13, wilcoxon was rewritten to be natively vectorized and therefore faster in cases that would typically be slow. This added a few microseconds of constant overhead on short 1D slices, but the gain in performance for >1D arrays is significant, and it also happens to be faster for slices that are very long (>10k elements). Another potential advantage of the new method is that is automatically omits NaNs, which we can use to improve the performance of nan_policy='omit' in the future, which is usually very slow.

However, currently when the input has NaNs and is >1D, the implementation loops over each slice individually (as it did before, even when there were no NaNs). This looping combined with the constant overhead on each slice seems to result in the performance regression you're seeing. However, you can probably get much better performance now than before if you use stats.wilcoxon(A, _no_deco=True) and add NaNs to the elements corresponding with input slices that had NaNs.

JonathanShor · 2024-04-29T15:51:01Z

What is the shape of the input, and does it have NaNs? Try stats.wilcoxon(A, _no_deco=True) if you want a speed boost, but it will behave as though it omits NaNs.

Typically 20-200 rows, by ~1500 columns. So, far from 10k elements per slice. Would you expect me to see a slight benefit staying with 1.13 and using _no_deco=True?

Columns with NaNs, should they exist, are only present in a continuous span at the top of the index. Would I benefit from chunking the columns by NaN presence and calling wilcoxon twice?

JonathanShor · 2024-04-29T16:19:54Z

I went ahead and tried _no_deco=True since omitting NaNs is my desired behavior, and I'm not seeing a slight benefit, I'm seeing almost two orders of magnitude speed up! I had to double check I had actually turned parallelism in my setup off for the speed test.

Nice job on this update!

mdhaber · 2024-04-29T19:16:45Z

Great to hear : ) Hopefully it will get array API support before too long, too.

JonathanShor added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Apr 26, 2024

lucascolley added the scipy.stats label Apr 26, 2024

lucascolley added this to the 1.14.0 milestone Apr 26, 2024

mdhaber linked a pull request Apr 27, 2024 that will close this issue

MAINT: stats.wilcoxon: fix failure with multidimensional x with NaN and slice length > 50 #20592

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: scipy.stats.wilcoxon in 1.13 fails on 2D array with nan -- 1D works #20591

BUG: scipy.stats.wilcoxon in 1.13 fails on 2D array with nan -- 1D works #20591

JonathanShor commented Apr 26, 2024 •

edited

mdhaber commented Apr 26, 2024 •

edited

JonathanShor commented Apr 27, 2024

mdhaber commented Apr 27, 2024 •

edited

JonathanShor commented Apr 29, 2024

JonathanShor commented Apr 29, 2024

mdhaber commented Apr 29, 2024

BUG: scipy.stats.wilcoxon in 1.13 fails on 2D array with nan -- 1D works #20591

BUG: scipy.stats.wilcoxon in 1.13 fails on 2D array with nan -- 1D works #20591

Comments

JonathanShor commented Apr 26, 2024 • edited

Describe your issue.

Reproducing Code Example

Error message

SciPy/NumPy/Python version and system information

mdhaber commented Apr 26, 2024 • edited

JonathanShor commented Apr 27, 2024

mdhaber commented Apr 27, 2024 • edited

JonathanShor commented Apr 29, 2024

JonathanShor commented Apr 29, 2024

mdhaber commented Apr 29, 2024

JonathanShor commented Apr 26, 2024 •

edited

mdhaber commented Apr 26, 2024 •

edited

mdhaber commented Apr 27, 2024 •

edited