New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core (opencl): CLBlast integration via dyanmic loading #25568
base: 4.x
Are you sure you want to change the base?
Conversation
Observed problems:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamic loading makes sense if there is strong API versioning in used project.
if(WITH_CLBLAST) | ||
find_path(CLBLAST_INCLUDE_DIR | ||
NAMES clblast_c.h | ||
HINTS ENV CLBLAST_INSTALL_DIR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to always use "dynamic loading" then it makes sense to place this header into 3rdparty/include
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it may work but CLBlast version is bumped occassionally along with some tuned results for different devices. If we put a fixed header in 3rdparty/include
, I guess it won't block the new library unless this file has significant changes right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is just no reason to use DIFFERENT versions of
- header clblast_c.h
- autogenerated files with function from another version of clblast_c.h
(const cl_mem)B.handle(ACCESS_READ), offsetB, ldb, | ||
(float)beta, | ||
(cl_mem)D.handle(ACCESS_RW), offsetC, ldc, | ||
&queue, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is about async processing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean async processing several calls to Sgemm for example?
@@ -415,6 +415,9 @@ OCV_OPTION(WITH_OPENCLAMDFFT "Include AMD OpenCL FFT library support" ON | |||
OCV_OPTION(WITH_OPENCLAMDBLAS "Include AMD OpenCL BLAS library support" ON | |||
VISIBLE_IF NOT ANDROID AND NOT IOS AND NOT XROS AND NOT WINRT | |||
VERIFY HAVE_CLAMDBLAS) | |||
OCV_OPTION(WITH_CLBLAST "Include CLBlast library support" ON | |||
VISIBLE_IF TRUE | |||
VERIFY HAVE_CLBLAST) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked build with clblast and verification, it works fine:
cmake \
-DCMAKE_INSTALL_PREFIX=install \
-DWITH_QT=ON \
-DWITH_1394=OFF \
-DWITH_JASPER=OFF \
-DWITH_OPENCLAMDFFT=OFF \
-DWITH_OPENCLAMDBLAS=OFF \
-DWITH_LAPACK=OFF \
-DWITH_CLBLAST=ON \
-DENABLE_CONFIG_VERIFICATION=ON \
../opencv
...
-- OpenCL: YES (CLBlast INTELVA)
-- Include path: /work/opencv/3rdparty/include/opencl/1.2 /usr/include
-- Link libraries: Dynamic load
...
-- Verifying WITH_CLBLAST=ON => 'HAVE_CLBLAST'=TRUE
...
BTW, I observe several identical warnings in matmul.dispatch.cpp (GCC 11, Ubuntu 22):
/opencv/modules/core/src/matmul.dispatch.cpp:147:31: warning: type qualifiers ignored on cast result type [-Wignored-qualifiers]
147 | (const cl_mem)A.handle(ACCESS_READ), offsetA, lda,
...
/opencv/modules/core/src/matmul.dispatch.cpp:184:31: warning: type qualifiers ignored on cast result type [-Wignored-qualifiers]
184 | (const cl_mem)B.handle(ACCESS_READ), offsetB, ldb,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I observe several identical warnings in matmul.dispatch.cpp (GCC 11, Ubuntu 22):
/opencv/modules/core/src/matmul.dispatch.cpp:147:31: warning: type qualifiers ignored on cast result type [-Wignored-qualifiers] 147 | (const cl_mem)A.handle(ACCESS_READ), offsetA, lda, ... /opencv/modules/core/src/matmul.dispatch.cpp:184:31: warning: type qualifiers ignored on cast result type [-Wignored-qualifiers] 184 | (const cl_mem)B.handle(ACCESS_READ), offsetB, ldb, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Warnings are resolved by removing const
.
@fengyuentau, from the patch I can conclude that we need only a small portion of clblast. Can we extract a subset of clblast and put it to opencv/3rdparty and link it to OpenCV? (i.e. don't use dynamic loading, which is much less convenient for end users). Also, I believe, we need to solve problems with mac and intel somehow. I remember you said (and also see it from the performance charts) that the current Intel version of gemm in OpenCV is faster than clblast, maybe we should keep Intel version. |
@fengyuentau Thanks a lot for the effort! The PR was discussed on OpenCV Core team meeting and conclusion is the following:
|
I have done several testings on the clblast accuracy problem. It turns out clblast with tuning results on these platform gives incorrect results, and after reverting those tuning results it can give the correct results. See my repo for testing: https://github.com/fengyuentau/test-clblast. |
Second commit is all about auto-generated code.
Usage
Get CLBlast:
Test with this patch:
Performance
All perf results in a zip: perf.mali-g52+m1+uhd770+gtx1080ti.zip.
Usage example:
Khadas VIM4 (8GB mem, 32GB disk space) with Mali G52 r1p0
Macbook Air M1 (16GB mem, 512GB disk space)
Accuracy problem with scale >= 1280, but it is ok with scal = 1024.
PC with i7-12700K (64GB mem, 1T disk space) with Intel(R) UHD Graphics 770
Accuracy problem with complex (type CV_32FC2).
PC with GTX 1080 Ti (12GB gpu mem, CUDA 12.3)
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.