Try to Fix #3725: cudaarithm: fix the compile faiure of CUDA 12. #3726

sdy623 · 2024-04-22T15:46:07Z

A slight API change of nppiMeanStdDevGetBufferHostSize_8u_C1R and nppiMeanStdDevGetBufferHostSize_32f_C1R in NPP of CUDA 12 has caused the #3725. I will try to fix this. I found that the type of bufSize is size_t instead of int in the reductions.cpp in the NPP header file, where the nppi_statistics_functions.h changed the type of second parameter from * int to * size_t.

nppi_statistics_functions.h 5392:5408

NppStatus 
nppiMeanStdDevGetBufferHostSize_32f_C1R_Ctx(NppiSize oSizeROI, size_t * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);
/** 
 * Buffer size for \ref nppiMean_StdDev_32f_C1R.
 * 
 * For common parameter descriptions, see \ref CommonMeanStdDevGetBufferHostSizeParameters.
 *
 */
NppStatus 
nppiMeanStdDevGetBufferHostSize_32f_C1R(NppiSize oSizeROI, size_t * hpBufferSize/* host pointer */);

/** 
 * Buffer size for \ref nppiMean_StdDev_8u_C1MR.
 * 
 * For common parameter descriptions, see \ref CommonMeanStdDevGetBufferHostSizeParameters.
 *
 */

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

opencv-alalek · 2024-04-22T16:28:54Z

/cc @cudawarped

modules/cudaarithm/src/reductions.cpp

sdy623

I am pretty sure it was introduced in CUDA 12.4 (12040).

Is it worth including an assert before it is used because BufferPool.getBuffer() expects an int? e.g.

CV_Assert(bufSize <= std::numeric_limits<int>::max());

What about the next call to get buffer size?

Have you tested this on 12.4? I think that it will still fail because of the other bug so I am not sure if it will pass any CI tests built against CUDA 12.4.

I checked the denfition of CUDA_VERSION in CUDA 12 which is the 12.XX.XX instad of a int number, so I edited the CmakeLists.txt to add a new denfination of CUDA_12_OR_HIGHER and fixed the compile of reductions.cpp, however It went other errors still need to slove

C:\opencv-bld\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp(379): error: no instance of overloaded function "cv::cudev::blockReduce" matches the argument list
            argument types are: (cuda::std::__4::tuple<volatile int *, volatile int *>, cuda::std::__4::tuple<int &, int &>, int, cuda::std::__4::tuple<cv::cudev::minimum<int>, cv::cudev::maximum<int>>)
              blockReduce<BLOCK_SIZE>(smem_tuple(sminval, smaxval), tie(mymin, mymax), tid, make_tuple(minOp, maxOp))

cudawarped · 2024-04-23T05:04:12Z

I checked the denfition of CUDA_VERSION in CUDA 12 which is the 12.XX.XX instad of a int number

@sdy623

I think you are confusing CMake generation and compilation. Adding that definition which is still for the incorrect verison of CUDA into the CMake file is unecessary.

Version Info

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda.h

> /**
>  * CUDA API version number
>  */
> #define CUDA_VERSION 12040

Function definitions

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\nppi_statistics_functions.h

NppStatus 
nppiMeanStdDevGetBufferHostSize_8u_C1R_Ctx(NppiSize oSizeROI, int * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\nppi_statistics_functions.h

NppStatus 
nppiMeanStdDevGetBufferHostSize_8u_C1R_Ctx(NppiSize oSizeROI, size_t * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);

sdy623 · 2024-04-23T05:35:04Z

<int &, int &>, int, cuda::std::__4::tuple<cv::cudev::minimum, cv::cudev::maximum>)
blockReduce<BLOCK_SIZE>(smem_tuple(sminval, smaxval),

I checked the denfition of CUDA_VERSION in CUDA 12 which is the 12.XX.XX instad of a int number

@sdy623

I think you are confusing CMake generation and compilation. Adding that definition which is still for the incorrect verison of CUDA into the CMake file is unecessary.

Version Info

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda.h
> /**
>  * CUDA API version number
>  */
> #define CUDA_VERSION 12040
Function definitions

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\nppi_statistics_functions.h
NppStatus 
nppiMeanStdDevGetBufferHostSize_8u_C1R_Ctx(NppiSize oSizeROI, int * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\nppi_statistics_functions.h
NppStatus 
nppiMeanStdDevGetBufferHostSize_8u_C1R_Ctx(NppiSize oSizeROI, size_t * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);

I deleted the edit of the CMake and find the correct CUDA_VERSION is 12040 in CUDA 12.4. I found a way to check the cuda.h without install it, just use 7-Zip to open the cuda_12.4.1_551.78_windows.exe and find this directory cuda_12.4.1_551.78_windows.exe\cuda_cudart\cudart\include\cuda.h and found the macro denfination

/**
 * CUDA API version number
 */
#define CUDA_VERSION 12040

cudawarped · 2024-04-23T07:17:25Z

I found a way to check the cuda.h without install it, just use 7-Zip to open the cuda_12.4.1_551.78_windows.exe and find this directory cuda_12.4.1_551.78_windows.exe\cuda_cudart\cudart\include\cuda.h and found the macro denfination

Whilst this is a completely valid way to check the header I would advise you to install CUDA 12.4 when submitting a PR which fixes something that it breaks.

If you do that you will realize that

Your conversions from size_t to int are throwing warnings

warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data

which you should address when using the new buffer size type as I mentioned above. e.g.
```
CV_Assert(bufSize <= std::numeric_limits<int>::max());
GpuMat buf = pool.getBuffer(1, static_cast<int>(bufSize), gsrc.type());
```
Because of the existing bug I mentioned before CUDA Toolkit 12.4.0 tuple incompatibility #3690 you will be unable to build the cudaarithm module even with this fix. i.e.

cmake --build . --target opencv_cudaarithm

still fails.

If I was authoring this PR I would install both CUDA 12.3 and 12.4 and check that this builds on both without errors.

modules/cudaarithm/src/reductions.cpp

A slight API change of NPP nppiMeanStdDevGetBufferHostSize_8u_C1R The type of bufSize is size_t instead of int in CUDA 12.4.x

cudawarped · 2024-04-23T18:21:59Z

@opencv-alalek this resolves the issue mentioned but still results in build errors because cudaarithm uses cudev ( #3690). I suggest this is revisited when cudev can be built against CUDA 12.4.

jiapei100 · 2024-04-28T22:29:16Z

I still have an issue #3728

cudawarped · 2024-04-29T04:33:33Z

I still have an issue #3728

@jiapei100 Your issue is related to cudev (#3690) not cudaarithm.

johnnynunez · 2024-05-09T22:21:06Z

great job! thank you

asmorkalov

The patch looks reasonable. I'm looking on #3690 if we can do something on OpenCV side.

opencv-alalek added the category: cuda label Apr 22, 2024

cudawarped reviewed Apr 22, 2024

View reviewed changes

modules/cudaarithm/src/reductions.cpp Outdated Show resolved Hide resolved

sdy623 force-pushed the 4.x branch from 391ea10 to 69e3537 Compare April 23, 2024 02:46

sdy623 commented Apr 23, 2024

View reviewed changes

sdy623 force-pushed the 4.x branch 2 times, most recently from 39073a2 to 34bd538 Compare April 23, 2024 05:27

sdy623 force-pushed the 4.x branch from 34bd538 to 8478d82 Compare April 23, 2024 07:26

cudawarped reviewed Apr 23, 2024

View reviewed changes

modules/cudaarithm/src/reductions.cpp Outdated Show resolved Hide resolved

cudaarithm: fix the compile faiure of CUDA 12.4.x .

4e766a0

A slight API change of NPP nppiMeanStdDevGetBufferHostSize_8u_C1R The type of bufSize is size_t instead of int in CUDA 12.4.x

sdy623 force-pushed the 4.x branch from 8478d82 to 4e766a0 Compare April 23, 2024 12:46

asmorkalov approved these changes May 22, 2024

View reviewed changes

asmorkalov merged commit 9358ad2 into opencv:4.x May 22, 2024
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to Fix #3725: cudaarithm: fix the compile faiure of CUDA 12. #3726

Try to Fix #3725: cudaarithm: fix the compile faiure of CUDA 12. #3726

sdy623 commented Apr 22, 2024 •

edited

opencv-alalek commented Apr 22, 2024

sdy623 left a comment

cudawarped commented Apr 23, 2024 •

edited

sdy623 commented Apr 23, 2024

cudawarped commented Apr 23, 2024 •

edited

cudawarped commented Apr 23, 2024

jiapei100 commented Apr 28, 2024

cudawarped commented Apr 29, 2024

johnnynunez commented May 9, 2024

asmorkalov left a comment

Try to Fix #3725: cudaarithm: fix the compile faiure of CUDA 12. #3726

Try to Fix #3725: cudaarithm: fix the compile faiure of CUDA 12. #3726

Conversation

sdy623 commented Apr 22, 2024 • edited

Pull Request Readiness Checklist

opencv-alalek commented Apr 22, 2024

sdy623 left a comment

Choose a reason for hiding this comment

cudawarped commented Apr 23, 2024 • edited

sdy623 commented Apr 23, 2024

cudawarped commented Apr 23, 2024 • edited

cudawarped commented Apr 23, 2024

jiapei100 commented Apr 28, 2024

cudawarped commented Apr 29, 2024

johnnynunez commented May 9, 2024

asmorkalov left a comment

Choose a reason for hiding this comment

sdy623 commented Apr 22, 2024 •

edited

cudawarped commented Apr 23, 2024 •

edited

cudawarped commented Apr 23, 2024 •

edited