Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch for cusparseLt 0.6.1 #8074

Merged
merged 51 commits into from
Jun 6, 2024
Merged

Conversation

gdaisukesuzuki
Copy link
Contributor

@gdaisukesuzuki gdaisukesuzuki commented Jan 1, 2024

#6757

This PR is compatible only on CUDA 12 and cusparseLt 0.6.0/1 (not 0.5.0, 0.5.2).

@gdaisukesuzuki gdaisukesuzuki changed the title patch for cusparselt5 patch for cusparseLt 5.0.2 Jan 1, 2024
@gdaisukesuzuki gdaisukesuzuki changed the title patch for cusparseLt 5.0.2 patch for cusparseLt 0.5.2 Jan 2, 2024
@kmaehashi kmaehashi self-assigned this Jan 5, 2024
@kmaehashi kmaehashi added cat:enhancement Improvements to existing features prio:medium labels Jan 5, 2024
@ev-br
Copy link
Contributor

ev-br commented Feb 8, 2024

Stumbled on this while trying to figure out why the cupy main does not build on a machine it did in December. Tracked it down to cusparseLT 0.5.2 installed on that machine. Rebasing this PR on today's main (f643379) builds fine (great!) but fails down the line with (the specific test file is fairly random)

$ pytest tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py -vx
=========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.10.0, pytest-8.0.0, pluggy-1.4.0 -- /home/ev-br/.conda/envs/cupyx2/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/ev-br/repos/cupy
configfile: setup.cfg
collected 80 items                                                                                                                                                                                        

tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py::test_conditionally_positive_definite[cubic] SKIPPED (conditionally posdef: skip for now)                                         [  1%]
tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py::test_conditionally_positive_definite[gaussian] SKIPPED (conditionally posdef: skip for now)                                      [  2%]
tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py::test_conditionally_positive_definite[inverse_multiquadric] SKIPPED (conditionally posdef: skip for now)                          [  3%]
tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py::test_conditionally_positive_definite[inverse_quadratic] SKIPPED (conditionally posdef: skip for now)                             [  5%]
tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py::test_conditionally_positive_definite[linear] SKIPPED (conditionally posdef: skip for now)                                        [  6%]
tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py::test_conditionally_positive_definite[multiquadric] SKIPPED (conditionally posdef: skip for now)                                  [  7%]
tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py::test_conditionally_positive_definite[quintic] SKIPPED (conditionally posdef: skip for now)                                       [  8%]
tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py::test_conditionally_positive_definite[thin_plate_spline] SKIPPED (conditionally posdef: skip for now)                             [ 10%]
tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py::TestRBFInterpolatorNeighborsNone::test_scale_invariance_1d[cubic] FAILED                                                         [ 11%]

================================================================================================ FAILURES =================================================================================================
____________________________________________________________________ TestRBFInterpolatorNeighborsNone.test_scale_invariance_1d[cubic] _____________________________________________________________________

self = <cupyx_tests.scipy_tests.interpolate_tests.test_rbfinterp.TestRBFInterpolatorNeighborsNone object at 0x7fa5140afca0>, xp = <module 'cupy' from '/home/ev-br/repos/cupy/cupy/__init__.py'>
scp = <module 'cupyx.scipy' from '/home/ev-br/repos/cupy/cupyx/scipy/__init__.py'>, kernel = 'cubic'

    @testing.numpy_cupy_allclose(scipy_name='scp')
    @pytest.mark.parametrize('kernel', sorted(_SCALE_INVARIANT))
    def test_scale_invariance_1d(self, xp, scp, kernel):
        # Verify that the functions in _SCALE_INVARIANT are insensitive to the
        # shape parameter (when smoothing == 0) in 1d.
        seq = Halton(1, scramble=False, seed=_np.random.RandomState())
        x = xp.asarray(3*seq.random(50))
        y = _1d_test_function(x, xp)
        xitp = xp.asarray(3*seq.random(50))
>       yitp1 = self.build(scp, x, y, epsilon=1.0, kernel=kernel)(xitp)

tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py:126: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py:435: in build
    return scp.interpolate.RBFInterpolator(*args, **kwargs)
cupyx/scipy/interpolate/_rbfinterp.py:672: in __init__
    shift, scale, coeffs = _build_and_solve_system(
cupyx/scipy/interpolate/_rbfinterp.py:465: in _build_and_solve_system
    lhs, rhs, shift, scale = _build_system(
cupyx/scipy/interpolate/_rbfinterp.py:283: in _build_system
    mins = cp.min(y, axis=0)
cupy/_statistics/order.py:44: in amin
    return a.min(axis=axis, out=out, keepdims=keepdims)
cupy/_core/core.pyx:1019: in cupy._core.core._ndarray_base.min
    cpdef _ndarray_base min(self, axis=None, out=None, keepdims=False):
cupy/_core/core.pyx:1027: in cupy._core.core._ndarray_base.min
    return _statistics._ndarray_min(self, axis, out, None, keepdims)
cupy/_core/_routines_statistics.pyx:64: in cupy._core._routines_statistics._ndarray_min
    return _amin(self, axis=axis, out=out, dtype=dtype, keepdims=keepdims)
cupy/_core/_reduction.pyx:618: in cupy._core._reduction._SimpleReductionKernel.__call__
    return self._call(
cupy/_core/_reduction.pyx:370: in cupy._core._reduction._AbstractReductionKernel._call
    cub_success = _cub_reduction._try_to_call_cub_reduction(
cupy/_core/_cub_reduction.pyx:689: in cupy._core._cub_reduction._try_to_call_cub_reduction
    _launch_cub(
cupy/_core/_cub_reduction.pyx:540: in cupy._core._cub_reduction._launch_cub
    func = _SimpleCubReductionKernel_get_cached_function(
cupy/_util.pyx:64: in cupy._util.memoize.decorator.ret
    result = f(*args, **kwargs)
cupy/_core/_cub_reduction.pyx:240: in cupy._core._cub_reduction._SimpleCubReductionKernel_get_cached_function
    return _create_cub_reduction_function(
cupy/_core/_cub_reduction.pyx:223: in cupy._core._cub_reduction._create_cub_reduction_function
    module = compile_with_cache(
cupy/_core/core.pyx:2254: in cupy._core.core.compile_with_cache
    return cuda.compiler._compile_module_with_cache(
cupy/cuda/compiler.py:484: in _compile_module_with_cache
    return _compile_with_cache_cuda(
cupy/cuda/compiler.py:562: in _compile_with_cache_cuda
    ptx, mapping = compile_using_nvrtc(
cupy/cuda/compiler.py:319: in compile_using_nvrtc
    return _compile(source, options, cu_path,
cupy/cuda/compiler.py:290: in _compile
    options, headers, include_names = _jitify_prep(
cupy/cuda/compiler.py:233: in _jitify_prep
    jitify._init_module()
cupy/cuda/jitify.pyx:212: in cupy.cuda.jitify._init_module
    cpdef void _init_module() except*:
cupy/cuda/jitify.pyx:236: in cupy.cuda.jitify._init_module
    _init_cupy_headers()
cupy/cuda/jitify.pyx:209: in cupy.cuda.jitify._init_cupy_headers
    _init_cupy_headers_from_scratch()
cupy/cuda/jitify.pyx:192: in cupy.cuda.jitify._init_cupy_headers_from_scratch
    jitify(warmup_kernel, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   load_program(cuda_source, headers, nullptr, &include_paths,
E   AssertionError: Only cupy raises error
E   
E   Traceback (most recent call last):
E     File "/home/ev-br/repos/cupy/cupy/testing/_loops.py", line 45, in _call_func
E       result = impl(*args, **kw)
E     File "/home/ev-br/repos/cupy/tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py", line 126, in test_scale_invariance_1d
E       yitp1 = self.build(scp, x, y, epsilon=1.0, kernel=kernel)(xitp)
E     File "/home/ev-br/repos/cupy/tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py", line 435, in build
E       return scp.interpolate.RBFInterpolator(*args, **kwargs)
E     File "/home/ev-br/repos/cupy/cupyx/scipy/interpolate/_rbfinterp.py", line 672, in __init__
E       shift, scale, coeffs = _build_and_solve_system(
E     File "/home/ev-br/repos/cupy/cupyx/scipy/interpolate/_rbfinterp.py", line 465, in _build_and_solve_system
E       lhs, rhs, shift, scale = _build_system(
E     File "/home/ev-br/repos/cupy/cupyx/scipy/interpolate/_rbfinterp.py", line 283, in _build_system
E       mins = cp.min(y, axis=0)
E     File "/home/ev-br/repos/cupy/cupy/_statistics/order.py", line 44, in amin
E       return a.min(axis=axis, out=out, keepdims=keepdims)
E     File "cupy/_core/core.pyx", line 1019, in cupy._core.core._ndarray_base.min
E       cpdef _ndarray_base min(self, axis=None, out=None, keepdims=False):
E     File "cupy/_core/core.pyx", line 1027, in cupy._core.core._ndarray_base.min
E       return _statistics._ndarray_min(self, axis, out, None, keepdims)
E     File "cupy/_core/_routines_statistics.pyx", line 64, in cupy._core._routines_statistics._ndarray_min
E       return _amin(self, axis=axis, out=out, dtype=dtype, keepdims=keepdims)
E     File "cupy/_core/_reduction.pyx", line 618, in cupy._core._reduction._SimpleReductionKernel.__call__
E       return self._call(
E     File "cupy/_core/_reduction.pyx", line 370, in cupy._core._reduction._AbstractReductionKernel._call
E       cub_success = _cub_reduction._try_to_call_cub_reduction(
E     File "cupy/_core/_cub_reduction.pyx", line 689, in cupy._core._cub_reduction._try_to_call_cub_reduction
E       _launch_cub(
E     File "cupy/_core/_cub_reduction.pyx", line 540, in cupy._core._cub_reduction._launch_cub
E       func = _SimpleCubReductionKernel_get_cached_function(
E     File "cupy/_util.pyx", line 64, in cupy._util.memoize.decorator.ret
E       result = f(*args, **kwargs)
E     File "cupy/_core/_cub_reduction.pyx", line 240, in cupy._core._cub_reduction._SimpleCubReductionKernel_get_cached_function
E       return _create_cub_reduction_function(
E     File "cupy/_core/_cub_reduction.pyx", line 223, in cupy._core._cub_reduction._create_cub_reduction_function
E       module = compile_with_cache(
E     File "cupy/_core/core.pyx", line 2254, in cupy._core.core.compile_with_cache
E       return cuda.compiler._compile_module_with_cache(
E     File "/home/ev-br/repos/cupy/cupy/cuda/compiler.py", line 484, in _compile_module_with_cache
E       return _compile_with_cache_cuda(
E     File "/home/ev-br/repos/cupy/cupy/cuda/compiler.py", line 562, in _compile_with_cache_cuda
E       ptx, mapping = compile_using_nvrtc(
E     File "/home/ev-br/repos/cupy/cupy/cuda/compiler.py", line 319, in compile_using_nvrtc
E       return _compile(source, options, cu_path,
E     File "/home/ev-br/repos/cupy/cupy/cuda/compiler.py", line 290, in _compile
E       options, headers, include_names = _jitify_prep(
E     File "/home/ev-br/repos/cupy/cupy/cuda/compiler.py", line 233, in _jitify_prep
E       jitify._init_module()
E     File "cupy/cuda/jitify.pyx", line 212, in cupy.cuda.jitify._init_module
E       cpdef void _init_module() except*:
E     File "cupy/cuda/jitify.pyx", line 236, in cupy.cuda.jitify._init_module
E       _init_cupy_headers()
E     File "cupy/cuda/jitify.pyx", line 209, in cupy.cuda.jitify._init_cupy_headers
E       _init_cupy_headers_from_scratch()
E     File "cupy/cuda/jitify.pyx", line 192, in cupy.cuda.jitify._init_cupy_headers_from_scratch
E       jitify(warmup_kernel, options)
E     File "cupy/cuda/jitify.pyx", line 267, in cupy.cuda.jitify.jitify
E       load_program(cuda_source, headers, nullptr, &include_paths,
E   RuntimeError: Runtime compilation failed

cupy/cuda/jitify.pyx:267: AssertionError
------------------------------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------------------------------
---------------------------------------------------
--- JIT compile log for cupy_jitify_exercise ---
---------------------------------------------------
cub/util_cpp_dialect.cuh(143): warning #161-D: unrecognized #pragma
       CUB_COMPILER_DEPRECATION_SOFT(C++14, C++11);
       ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

cooperative_groups/details/helpers.h(448): error: identifier "cudaCGGetIntrinsicHandle" is undefined
              return (cudaCGGetIntrinsicHandle(cudaCGScopeMultiGrid));
                      ^

cooperative_groups/details/helpers.h(453): error: identifier "cudaCGSynchronize" is undefined
              cudaError_t err = cudaCGSynchronize(handle, 0);
                                ^

cooperative_groups/details/helpers.h(459): error: identifier "cudaCGGetSize" is undefined
              cudaCGGetSize(&numThreads, NULL, handle);
              ^

cooperative_groups/details/helpers.h(466): error: identifier "cudaCGGetRank" is undefined
              cudaCGGetRank(&threadRank, NULL, handle);
              ^

cooperative_groups/details/helpers.h(473): error: identifier "cudaCGGetRank" is undefined
              cudaCGGetRank(NULL, &gridRank, handle);
              ^

cooperative_groups/details/helpers.h(480): error: identifier "cudaCGGetSize" is undefined
              cudaCGGetSize(NULL, &numGrids, handle);
              ^

6 errors detected in the compilation of "cupy_jitify_exercise".

---------------------------------------------------
========================================================================================= short test summary info =========================================================================================
FAILED tests/cupyx_tests/scipy_tests/interpolate_tests/test_rbfinterp.py::TestRBFInterpolatorNeighborsNone::test_scale_invariance_1d[cubic] - AssertionError: Only cupy raises error
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
====================================================================================== 1 failed, 8 skipped in 37.38s ======================================================================================

On a different machine which does not have cursparseLt at all, this same pytest invocation works fine on main. So this patch is great and needs a further small tweak somewhere?

Hat tip @andfoy for effectively guiding me down this exercise.

@leofang
Copy link
Member

leofang commented Feb 8, 2024

@ev-br could you create an issue for the error that you see (and ping me)? It's not about cuSPARSELt but something else.

@ev-br
Copy link
Contributor

ev-br commented Feb 8, 2024

@leofang done: #8171

@ev-br
Copy link
Contributor

ev-br commented Feb 14, 2024

Either an update to this PR or cleaning up the cupy cache locally made the problem reported in #8074 (comment) disappear. So I guess I can confirm that this PR, when rebased on main, works for

$ python -c'import cupy; cupy.show_config()'
OS                           : Linux-5.15.0-76-generic-x86_64-with-glibc2.35
Python Version               : 3.10.0
CuPy Version                 : 13.0.0rc1
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.26.0
SciPy Version                : 1.11.2
Cython Build Version         : 0.29.37
Cython Runtime Version       : None
CUDA Root                    : /usr/local/cuda
nvcc PATH                    : /usr/local/cuda/bin/nvcc
CUDA Build Version           : 12010
CUDA Driver Version          : 12010
CUDA Runtime Version         : 12010 (linked to CuPy) / 12000 (locally installed)
cuBLAS Version               : (available)
cuFFT Version                : 11002
cuRAND Version               : 10302
cuSOLVER Version             : (11, 4, 4)
cuSPARSE Version             : (available)
NVRTC Version                : (12, 1)
Thrust Version               : 200200
CUB Build Version            : 200200
Jitify Build Version         : e4bfacd69
cuDNN Build Version          : None
cuDNN Version                : None
NCCL Build Version           : None
NCCL Runtime Version         : None
cuTENSOR Version             : None
cuSPARSELt Build Version     : 502
Device 0 Name                : NVIDIA GeForce RTX 2060
Device 0 Compute Capability  : 75
Device 0 PCI Bus ID          : 0000:01:00.0
Device 1 Name                : NVIDIA GeForce RTX 2060
Device 1 Compute Capability  : 75
Device 1 PCI Bus ID          : 0000:21:00.0

@gdaisukesuzuki
Copy link
Contributor Author

@ev-br
CusparseLt is designed to support the Ampere architecture (or later), which means it might not operate on the RTX2060....

@gdaisukesuzuki
Copy link
Contributor Author

@kmaehashi
This May, cusparselt 0.6.1 has been released. I confirmed that it can work without modifying this PR on 0.6.1. Please note that 0.6.1 only improves the features of 0.6.0 and there seems no change on the API.

@gdaisukesuzuki gdaisukesuzuki changed the title patch for cusparseLt 0.6.0 patch for cusparseLt 0.6.1 May 5, 2024
@kmaehashi
Copy link
Member

Hi sorry to keep you waiting! I'll try to get this in this week.

@kmaehashi
Copy link
Member

/test mini

@kmaehashi kmaehashi enabled auto-merge June 5, 2024 09:35
kmaehashi
kmaehashi previously approved these changes Jun 5, 2024
@kmaehashi
Copy link
Member

Let me push the required CI changes (#8351) to the branch 🙏

@kmaehashi
Copy link
Member

/test mini

@gdaisukesuzuki
Copy link
Contributor Author

@kmaehashi Thank you so much for writing the Docker setup.

@kmaehashi kmaehashi merged commit a54b7ab into cupy:main Jun 6, 2024
54 checks passed
@gdaisukesuzuki gdaisukesuzuki deleted the devcusparselt5 branch June 6, 2024 01:01
@jake-arkinstall
Copy link

As this is now in, is there a timeline on the next release?

(I'm looking to fetch it via pypi in keeping with an existing build)

@kmaehashi
Copy link
Member

@jake-arkinstall I'm curious about the usecase. Do you build/distribute CuPy built with cuSPARSELt support? Is it fine if there is a new v14 alpha release on PyPI? This is a backward-incompatible change (dropping support for earlier cusparseLt versions), and usually, we don't backport such PRs to v13.

@jake-arkinstall
Copy link

@jake-arkinstall I'm curious about the usecase. Do you build/distribute CuPy built with cuSPARSELt support? Is it fine if there is a new v14 alpha release on PyPI? This is a backward-incompatible change (dropping support for earlier cusparseLt versions), and usually, we don't backport such PRs to v13.

Long story!

I am building a package with Nix, and at the moment nixpkgs' cuda packages have a slightly frustrating support matrix when combining certain packages (which I'm in the fun position of wanting to combine). I have a local clone where I'm getting everything up to date so it all works together, with the intention of filing a PR to nixpkgs with shiny new things.

The existing version uses the pypi package as the source. I have a version that fetches cupy from git at the commit this was merged at, replaces the third_party submodules with nix derivations corresponding to their git repos, and adds the required dependencies for building. It works for me, but I know that it'd be a hard sell in a PR.

If there is a plan for a release with these changes, then I'll keep my local version for now, await the release, then file a PR to nixpkgs with the release details. If not, then I can decide to try the PR anyway or just keep the derivation for my project alone and decouple it.

All this is very much a "me" problem and I certainly wouldn't want to cause a fuss over it. I'm just trying to gauge rough timelines to see how I should synchronise.

@kmaehashi kmaehashi added this to the v14.0.0a1 milestone Jun 8, 2024
@kmaehashi
Copy link
Member

Thanks for sharing the background! As for the CuPy v14.0.0a1 release, which will be made from the main branch (where this PR has merged), I'm expecting it to be in late July.

One thing to note: So far, only cuSPARSELt usage in CuPy is a undocumented private thin wrapper to the C API as illustrated in the example. Building with or without cuSPARSELt support does not affect the CuPy's core functionalities/performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:enhancement Improvements to existing features prio:high
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants