Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT, TST: few test failures noted on precursor Grace Hopper chip #20557

Open
tylerjereddy opened this issue Apr 22, 2024 · 1 comment
Open
Labels
maintenance Items related to regular maintenance tasks

Comments

@tylerjereddy
Copy link
Contributor

Early stage build/test result on precursor hardware for https://www.nvidia.com/en-us/data-center/grace-hopper-superchip/. I'll test again on the real thing in a few weeks, hopefully. Results look pretty good, no signs of the cache coherent (GPU can use CPU mem) memory causing problems, though I haven't tried the array API tests on the GPU there yet (I'll follow up with those in the same issue perhaps, though would likely just be upstream issues anyway...). I'm not sure how far the chip is from the "real thing" though, but probably reflects the compiler situation reasonably well.

Details of 3 test failures, one is just a timeout, the others look a bit wild but at least are isolated to `io/matlab` stuff.
====================================================================================================================================================== FAILURES ======================================================================================================================================================
___________________________________________________________________________________________________________________________________ TestZlibInputStream.test_all_data_read_overlap ___________________________________________________________________________________________________________________________________
[gw54] linux -- Python 3.12.1 /vast/home/treddy/python_venvs/py_312_grace_hopper/bin/python
scipy/io/matlab/tests/test_streams.py:205: in test_all_data_read_overlap
    assert_(compressed_data_len == BLOCK_SIZE + 2)
E   AssertionError
        COMPRESSION_LEVEL = 6
        compressed_data = b'x\x9c\xec\xdd\x05[U\xdb\x1a\x06PB\x04\x04\xe9n\x01i\x94nP\x1aI\x91\x94\x10A\x1a\x14\x14\x01\xf5\xdc\xee\xee\xee\xee\...bb\xff\xdd\xff\xee\x7f\xf7\xbf\xfb\xdf\xfd\xef\xfew\xff\xbb\xff\xdd\xff\xee\x7f\xf7\xbf_\xfe\xcf\xff\xfe\x0fl\xcc\xd1^'
        compressed_data_len = 251004
        data       = b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x...da\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7'
        self       = <scipy.io.matlab.tests.test_streams.TestZlibInputStream object at 0x40013dbc4890>
________________________________________________________________________________________________________________________________ TestZlibInputStream.test_all_data_read_bad_checksum _________________________________________________________________________________________________________________________________
[gw54] linux -- Python 3.12.1 /vast/home/treddy/python_venvs/py_312_grace_hopper/bin/python
scipy/io/matlab/tests/test_streams.py:221: in test_all_data_read_bad_checksum
    assert_(compressed_data_len == BLOCK_SIZE + 2)
E   AssertionError
        COMPRESSION_LEVEL = 6
        compressed_data = b'x\x9c\xec\xdd\x05[U\xdb\x1a\x06PB\x04\x04\xe9n\x01i\x94nP\x1aI\x91\x94\x10A\x1a\x14\x14\x01\xf5\xdc\xee\xee\xee\xee\...bb\xff\xdd\xff\xee\x7f\xf7\xbf\xfb\xdf\xfd\xef\xfew\xff\xbb\xff\xdd\xff\xee\x7f\xf7\xbf_\xfe\xcf\xff\xfe\x0fl\xcc\xd1^'
        compressed_data_len = 251004
        data       = b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x...da\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7'
        self       = <scipy.io.matlab.tests.test_streams.TestZlibInputStream object at 0x40013dbc4a70>
____________________________________________________________________________________________________________________________________________ test_examples[False-float32] ____________________________________________________________________________________________________________________________________________
[gw49] linux -- Python 3.12.1 /vast/home/treddy/python_venvs/py_312_grace_hopper/bin/python
scipy/sparse/linalg/tests/test_propack.py:133: in test_examples
    u3, s3, vh3 = np.linalg.svd(A.todense())
        A          = <1850x712 sparse matrix of type '<class 'numpy.float32'>'
	with 8636 stored elements in COOrdinate format>
        _          = array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., ... 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
        atol       = 0.00013
        data       = NpzFile 'object' with keys: A_real, A_complex
        dtype      = <class 'numpy.float32'>
        filename   = '/vast/home/treddy/github_projects/scipy/build-install/lib/python3.12/site-packages/scipy/sparse/linalg/tests/propack_test_data.npz'
        irl        = False
        k          = 200
        path_prefix = '/vast/home/treddy/github_projects/scipy/build-install/lib/python3.12/site-packages/scipy/sparse/linalg/tests'
        relative_path = 'propack_test_data.npz'
        s          = array([2.1233425, 2.0792916, 2.0701475, 2.0553436, 2.0349526, 2.0268695,
       1.9737159, 1.9396288, 1.9091855, 1.874...,
       1.2578827, 1.2546223, 1.2540137, 1.2490836, 1.2431141, 1.2423126,
       1.2403852, 1.239922 ], dtype=float32)
        sv_check   = 200
        u          = array([[-0.00560983,  0.00465656, -0.00183205, ...,  0.00434269,
         0.02259842,  0.02840544],
       [-0.0020452...1],
       [-0.02979757,  0.02140762, -0.00904884, ..., -0.01038366,
         0.00499203,  0.00354774]], dtype=float32)
        vh         = array([[-2.7545115e-02, -3.1745876e-03, -2.7440459e-02, ...,
        -6.2379640e-02, -2.2858093e-02, -2.3793750e-01],
...2767e-01, -4.2529564e-02,  4.3010801e-02, ...,
        -3.2596852e-04,  8.2741532e-04,  1.4732038e-02]], dtype=float32)
/vast/home/treddy/python_venvs/py_312_grace_hopper/lib/python3.12/site-packages/numpy/linalg/_linalg.py:1796: in svd
    u, s, vh = gufunc(a, signature=signature)
E   Failed: Timeout >120.0s
        _nx        = <module 'numpy' from '/vast/home/treddy/python_venvs/py_312_grace_hopper/lib/python3.12/site-packages/numpy/__init__.py'>
        a          = array([[0.2773501 , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.5 ...616386 ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.06163942]], dtype=float32)
        compute_uv = True
        full_matrices = True
        gufunc     = <ufunc 'svd_n_f'>
        hermitian  = False
        m          = 1850
        n          = 712
        result_t   = <class 'numpy.float32'>
        signature  = 'd->ddd'
        t          = <class 'numpy.float64'>
        wrap       = <built-in method __array_wrap__ of matrix object at 0x40015105fe50>
------------------------------------------------------------------------------------------------------------------------------------------------ Captured stdout call ------------------------------------------------------------------------------------------------------------------------------------------------
~~~~~~~~~~~~~~~~~~~~~ Stack of <unknown> (70369401041344) ~~~~~~~~~~~~~~~~~~~~~~
  File "/vast/home/treddy/python_venvs/py_312_grace_hopper/lib/python3.12/site-packages/execnet/gateway_base.py", line 411, in _perform_spawn
    reply.run()
  File "/vast/home/treddy/python_venvs/py_312_grace_hopper/lib/python3.12/site-packages/execnet/gateway_base.py", line 341, in run
    self._result = func(*args, **kwargs)
  File "/vast/home/treddy/python_venvs/py_312_grace_hopper/lib/python3.12/site-packages/execnet/gateway_base.py", line 1160, in _thread_receiver
    msg = Message.from_io(io)
  File "/vast/home/treddy/python_venvs/py_312_grace_hopper/lib/python3.12/site-packages/execnet/gateway_base.py", line 567, in from_io
    header = io.read(9)  # type 1, channel 4, payload 4
  File "/vast/home/treddy/python_venvs/py_312_grace_hopper/lib/python3.12/site-packages/execnet/gateway_base.py", line 534, in read
    data = self._read(numbytes - len(buf))
============================================================================================================================================== short test summary info ===============================================================================================================================================
FAILED scipy/io/matlab/tests/test_streams.py::TestZlibInputStream::test_all_data_read_overlap - AssertionError
FAILED scipy/io/matlab/tests/test_streams.py::TestZlibInputStream::test_all_data_read_bad_checksum - AssertionError
FAILED scipy/sparse/linalg/tests/test_propack.py::test_examples[False-float32] - Failed: Timeout >120.0s
================================================================================================================= 3 failed, 48614 passed, 2382 skipped, 155 xfailed, 14 xpassed in 349.96s (0:05:49) =================================================================================================================

pip freeze output below, basically just python -m pip install -r requirements/all.txt then bumping to NumPy 2.0.0rc1 after that and pulling in scipy-openblas from PyPI.

accessible-pygments==0.0.4
alabaster==0.7.16
anyio==4.3.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
array_api_strict==1.1.1
arrow==1.3.0
asttokens==2.4.1
asv==0.6.3
asv_runner==0.2.1
attrs==23.2.0
Babel==2.14.0
beautifulsoup4==4.12.3
beniget==0.4.1
bleach==6.1.0
build==1.2.1
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
comm==0.2.2
contourpy==1.2.1
coverage==7.4.4
cycler==0.12.1
Cython==3.0.10
cython-lint==0.16.2
debugpy==1.8.1
decorator==5.1.1
defusedxml==0.7.1
distlib==0.3.8
docutils==0.20.1
doit==0.36.0
execnet==2.1.1
executing==2.0.1
fastjsonschema==2.19.1
filelock==3.13.4
fonttools==4.51.0
fqdn==1.5.1
gast==0.5.4
greenlet==3.0.3
hypothesis==6.100.1
idna==3.7
imagesize==1.4.1
importlib_metadata==7.1.0
iniconfig==2.0.0
ipykernel==6.29.4
ipython==8.23.0
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.3
json5==0.9.25
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter-cache==1.0.0
jupyter-events==0.10.0
jupyter_client==8.6.1
jupyter_core==5.7.2
jupyter_server==2.14.0
jupyter_server_terminals==0.5.3
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.0
jupyterlite-core==0.2.3
jupyterlite-pyodide-kernel==0.2.3
jupyterlite-sphinx==0.13.1
jupytext==1.16.1
kiwisolver==1.4.5
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.8.4
matplotlib-inline==0.1.7
mdit-py-plugins==0.4.0
mdurl==0.1.2
meson==1.4.0
meson-python==0.16.0
mistune==3.0.2
mpmath==1.3.0
mypy==1.9.0
mypy-extensions==1.0.0
myst-nb==1.1.0
myst-parser==2.0.0
nbclient==0.10.0
nbconvert==7.16.3
nbformat==5.10.4
nest-asyncio==1.6.0
ninja==1.11.1.1
numpy==2.0.0rc1
numpydoc==1.7.0
overrides==7.7.0
packaging==24.0
pandocfilters==1.5.1
parso==0.8.4
pexpect==4.9.0
pillow==10.3.0
pkginfo==1.10.0
platformdirs==4.2.0
pluggy==1.5.0
ply==3.11
pooch==1.8.1
prometheus_client==0.20.0
prompt-toolkit==3.0.43
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pybind11==2.12.0
pycodestyle==2.11.1
pycparser==2.22
pydata-sphinx-theme==0.15.2
pydevtool==0.3.0
Pygments==2.17.2
Pympler==1.0.1
pyparsing==3.1.2
pyproject-metadata==0.8.0
pyproject_hooks==1.0.0
pytest==8.1.1
pytest-cov==5.0.0
pytest-timeout==2.3.1
pytest-xdist==3.5.0
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
pythran==0.15.0
PyYAML==6.0.1
pyzmq==26.0.2
referencing==0.34.0
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.7.1
rich-click==1.7.4
rpds-py==0.18.0
ruff==0.4.1
scipy-openblas32==0.3.23.293.2
Send2Trash==1.8.3
setuptools==69.5.1
six==1.16.0
sniffio==1.3.1
snowballstemmer==2.2.0
sortedcontainers==2.4.0
soupsieve==2.5
Sphinx==7.3.7
sphinx_design==0.5.0
sphinxcontrib-applehelp==1.0.8
sphinxcontrib-devhelp==1.0.6
sphinxcontrib-htmlhelp==2.0.5
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.7
sphinxcontrib-serializinghtml==1.1.10
SQLAlchemy==2.0.29
stack-data==0.6.3
tabulate==0.9.0
terminado==0.18.1
threadpoolctl==3.4.0
tinycss2==1.2.1
tokenize-rt==5.2.0
toml==0.10.2
tomli==2.0.1
tornado==6.4
traitlets==5.14.3
types-psutil==5.9.5.20240316
types-python-dateutil==2.9.0.20240316
typing_extensions==4.11.0
uri-template==1.3.0
urllib3==2.2.1
virtualenv==20.25.3
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0
zipp==3.18.1

Used GNU 12.1.0 compiler toolchain.

lscpu output:

Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              80
On-line CPU(s) list: 0-79
Thread(s) per core:  1
Core(s) per socket:  80
Socket(s):           1
NUMA node(s):        1
Vendor ID:           ARM
Model:               1
Model name:          Neoverse-N1
Stepping:            r3p1
CPU max MHz:         2800.0000
CPU min MHz:         1000.0000
BogoMIPS:            50.00
L1d cache:           64K
L1i cache:           64K
L2 cache:            1024K
NUMA node0 CPU(s):   0-79
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
@tylerjereddy tylerjereddy added the maintenance Items related to regular maintenance tasks label Apr 22, 2024
@tylerjereddy
Copy link
Contributor Author

I tested array API with cupy on this hardware this afternoon and found 62 test failures. However, 60 of them are just because of this warning poking through: cupy._util.PerformanceWarning: Jitify is performing a one-time only warm-up to populate the persistent cache, this may take a few seconds and will be improved in a future release...

If I run the suite a second time, those one-time warnings go away and it is just the usual two failures left:

====================================================================================================================================================== FAILURES ======================================================================================================================================================
___________________________________________________________________________________________________________________________________ TestZlibInputStream.test_all_data_read_overlap ___________________________________________________________________________________________________________________________________
[gw23] linux -- Python 3.12.1 /vast/home/treddy/python_venvs/py_312_grace_hopper/bin/python
scipy/io/matlab/tests/test_streams.py:205: in test_all_data_read_overlap
    assert_(compressed_data_len == BLOCK_SIZE + 2)
E   AssertionError
        COMPRESSION_LEVEL = 6
        compressed_data = b'x\x9c\xec\xdd\x05[U\xdb\x1a\x06PB\x04\x04\xe9n\x01i\x94nP\x1aI\x91\x94\x10A\x1a\x14\x14\x01\xf5\xdc\xee\xee\xee\xee\...bb\xff\xdd\xff\xee\x7f\xf7\xbf\xfb\xdf\xfd\xef\xfew\xff\xbb\xff\xdd\xff\xee\x7f\xf7\xbf_\xfe\xcf\xff\xfe\x0fl\xcc\xd1^'
        compressed_data_len = 251004
        data       = b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x...da\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7'
        self       = <scipy.io.matlab.tests.test_streams.TestZlibInputStream object at 0x4001935e7da0>
________________________________________________________________________________________________________________________________ TestZlibInputStream.test_all_data_read_bad_checksum _________________________________________________________________________________________________________________________________
[gw23] linux -- Python 3.12.1 /vast/home/treddy/python_venvs/py_312_grace_hopper/bin/python
scipy/io/matlab/tests/test_streams.py:221: in test_all_data_read_bad_checksum
    assert_(compressed_data_len == BLOCK_SIZE + 2)
E   AssertionError
        COMPRESSION_LEVEL = 6
        compressed_data = b'x\x9c\xec\xdd\x05[U\xdb\x1a\x06PB\x04\x04\xe9n\x01i\x94nP\x1aI\x91\x94\x10A\x1a\x14\x14\x01\xf5\xdc\xee\xee\xee\xee\...bb\xff\xdd\xff\xee\x7f\xf7\xbf\xfb\xdf\xfd\xef\xfew\xff\xbb\xff\xdd\xff\xee\x7f\xf7\xbf_\xfe\xcf\xff\xfe\x0fl\xcc\xd1^'
        compressed_data_len = 251004
        data       = b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x...da\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7'
        self       = <scipy.io.matlab.tests.test_streams.TestZlibInputStream object at 0x4001935e7f50>
============================================================================================================================================== short test summary info ===============================================================================================================================================
FAILED scipy/io/matlab/tests/test_streams.py::TestZlibInputStream::test_all_data_read_overlap - AssertionError
FAILED scipy/io/matlab/tests/test_streams.py::TestZlibInputStream::test_all_data_read_bad_checksum - AssertionError
================================================================================================================= 2 failed, 45218 passed, 5760 skipped, 155 xfailed, 14 xpassed in 86.27s (0:01:26) ==================================================================================================================

My inclination is that we should probably not fail the tests on that warning since it gives a different testsuite result when first installing cupy vs. running the second time. Maybe we could add a global filter on that warning. I was using cupy 13.1.0. Maybe that's something to check with the CuPy team on though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks
Projects
None yet
Development

No branches or pull requests

2 participants
@tylerjereddy and others