Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to Fix #3725: cudaarithm: fix the compile faiure of CUDA 12. #3726

Merged
merged 1 commit into from May 22, 2024

Conversation

sdy623
Copy link
Contributor

@sdy623 sdy623 commented Apr 22, 2024

A slight API change of nppiMeanStdDevGetBufferHostSize_8u_C1R and nppiMeanStdDevGetBufferHostSize_32f_C1R in NPP of CUDA 12 has caused the #3725. I will try to fix this. I found that the type of bufSize is size_t instead of int in the reductions.cpp in the NPP header file, where the nppi_statistics_functions.h changed the type of second parameter from * int to * size_t.

nppi_statistics_functions.h 5392:5408

NppStatus 
nppiMeanStdDevGetBufferHostSize_32f_C1R_Ctx(NppiSize oSizeROI, size_t * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);
/** 
 * Buffer size for \ref nppiMean_StdDev_32f_C1R.
 * 
 * For common parameter descriptions, see \ref CommonMeanStdDevGetBufferHostSizeParameters.
 *
 */
NppStatus 
nppiMeanStdDevGetBufferHostSize_32f_C1R(NppiSize oSizeROI, size_t * hpBufferSize/* host pointer */);

/** 
 * Buffer size for \ref nppiMean_StdDev_8u_C1MR.
 * 
 * For common parameter descriptions, see \ref CommonMeanStdDevGetBufferHostSizeParameters.
 *
 */

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@opencv-alalek
Copy link

/cc @cudawarped

Copy link
Contributor Author

@sdy623 sdy623 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure it was introduced in CUDA 12.4 (12040).

Is it worth including an assert before it is used because BufferPool.getBuffer() expects an int? e.g.

CV_Assert(bufSize <= std::numeric_limits<int>::max());

What about the next call to get buffer size?

Have you tested this on 12.4? I think that it will still fail because of the other bug so I am not sure if it will pass any CI tests built against CUDA 12.4.

I checked the denfition of CUDA_VERSION in CUDA 12 which is the 12.XX.XX instad of a int number, so I edited the CmakeLists.txt to add a new denfination of CUDA_12_OR_HIGHER and fixed the compile of reductions.cpp, however It went other errors still need to slove

C:\opencv-bld\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp(379): error: no instance of overloaded function "cv::cudev::blockReduce" matches the argument list
            argument types are: (cuda::std::__4::tuple<volatile int *, volatile int *>, cuda::std::__4::tuple<int &, int &>, int, cuda::std::__4::tuple<cv::cudev::minimum<int>, cv::cudev::maximum<int>>)
              blockReduce<BLOCK_SIZE>(smem_tuple(sminval, smaxval), tie(mymin, mymax), tid, make_tuple(minOp, maxOp))

@cudawarped
Copy link
Contributor

cudawarped commented Apr 23, 2024

I checked the denfition of CUDA_VERSION in CUDA 12 which is the 12.XX.XX instad of a int number

@sdy623

I think you are confusing CMake generation and compilation. Adding that definition which is still for the incorrect verison of CUDA into the CMake file is unecessary.

Version Info

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda.h

> /**
>  * CUDA API version number
>  */
> #define CUDA_VERSION 12040


Function definitions

  1. C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\nppi_statistics_functions.h
NppStatus 
nppiMeanStdDevGetBufferHostSize_8u_C1R_Ctx(NppiSize oSizeROI, int * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);
  1. C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\nppi_statistics_functions.h
NppStatus 
nppiMeanStdDevGetBufferHostSize_8u_C1R_Ctx(NppiSize oSizeROI, size_t * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);

@sdy623 sdy623 force-pushed the 4.x branch 2 times, most recently from 39073a2 to 34bd538 Compare April 23, 2024 05:27
@sdy623
Copy link
Contributor Author

sdy623 commented Apr 23, 2024

<int &, int &>, int, cuda::std::__4::tuple<cv::cudev::minimum, cv::cudev::maximum>)
blockReduce<BLOCK_SIZE>(smem_tuple(sminval, smaxval),

I checked the denfition of CUDA_VERSION in CUDA 12 which is the 12.XX.XX instad of a int number

@sdy623

I think you are confusing CMake generation and compilation. Adding that definition which is still for the incorrect verison of CUDA into the CMake file is unecessary.

Version Info

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda.h

> /**
>  * CUDA API version number
>  */
> #define CUDA_VERSION 12040

Function definitions

  1. C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\nppi_statistics_functions.h
NppStatus 
nppiMeanStdDevGetBufferHostSize_8u_C1R_Ctx(NppiSize oSizeROI, int * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);
  1. C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\nppi_statistics_functions.h
NppStatus 
nppiMeanStdDevGetBufferHostSize_8u_C1R_Ctx(NppiSize oSizeROI, size_t * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);

I deleted the edit of the CMake and find the correct CUDA_VERSION is 12040 in CUDA 12.4. I found a way to check the cuda.h without install it, just use 7-Zip to open the cuda_12.4.1_551.78_windows.exe and find this directory cuda_12.4.1_551.78_windows.exe\cuda_cudart\cudart\include\cuda.h and found the macro denfination

/**
 * CUDA API version number
 */
#define CUDA_VERSION 12040

@cudawarped
Copy link
Contributor

cudawarped commented Apr 23, 2024

I found a way to check the cuda.h without install it, just use 7-Zip to open the cuda_12.4.1_551.78_windows.exe and find this directory cuda_12.4.1_551.78_windows.exe\cuda_cudart\cudart\include\cuda.h and found the macro denfination

Whilst this is a completely valid way to check the header I would advise you to install CUDA 12.4 when submitting a PR which fixes something that it breaks.

If you do that you will realize that

  1. Your conversions from size_t to int are throwing warnings

    warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data

    which you should address when using the new buffer size type as I mentioned above. e.g.

    CV_Assert(bufSize <= std::numeric_limits<int>::max());
    GpuMat buf = pool.getBuffer(1, static_cast<int>(bufSize), gsrc.type());
    
  2. Because of the existing bug I mentioned before CUDA Toolkit 12.4.0 tuple incompatibility #3690 you will be unable to build the cudaarithm module even with this fix. i.e.

    cmake --build . --target opencv_cudaarithm

    still fails.

If I was authoring this PR I would install both CUDA 12.3 and 12.4 and check that this builds on both without errors.

A slight API change of NPP nppiMeanStdDevGetBufferHostSize_8u_C1R
The type of bufSize is size_t instead of int in CUDA 12.4.x
@cudawarped
Copy link
Contributor

@opencv-alalek this resolves the issue mentioned but still results in build errors because cudaarithm uses cudev ( #3690). I suggest this is revisited when cudev can be built against CUDA 12.4.

@jiapei100
Copy link

I still have an issue #3728

@cudawarped
Copy link
Contributor

I still have an issue #3728

@jiapei100 Your issue is related to cudev (#3690) not cudaarithm.

@johnnynunez
Copy link

great job! thank you

Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks reasonable. I'm looking on #3690 if we can do something on OpenCV side.

@asmorkalov asmorkalov merged commit 9358ad2 into opencv:4.x May 22, 2024
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants