Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAL mul8x8to16 added #25506

Merged
merged 13 commits into from May 20, 2024
Merged

HAL mul8x8to16 added #25506

merged 13 commits into from May 20, 2024

Conversation

savuor
Copy link
Contributor

@savuor savuor commented Apr 29, 2024

fixes #25034

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov asmorkalov added this to the 4.10.0 milestone Apr 29, 2024
@savuor savuor marked this pull request as ready for review April 29, 2024 21:44
int res;
res = cv_hal_mul8u16u(/* src1_data */ &ua, /* src1_step */ 1, /* src2_data */ &ub, /* src2_step */ 1,
/* dst_data */ &uc, /* dst_step */ 1, /* width */ 1, /* height */ 1, /* scale */ 1);
if (res == CV_HAL_ERROR_NOT_IMPLEMENTED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HAL function may return not implemented status, if size is to small or data is not aligned. Static pre-check does not work.

void* usrdata)
{
double scale = *((double*)usrdata);
cv_hal_mul8s16s((schar*)src1, step1, (schar*)src2, step2, (short*)dst, step, width, height, scale);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cv_hal_mul8s16s and cv_hal_mul8u16u return value is not checked. They may return error or not implemented status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have an ugly solution for this, please take a look

{
CV_INSTRUMENT_REGION();

arithm_op(src1, src2, dst, noArray(), dtype, getMulTab(),
true, &scale, std::abs(scale - 1.0) < DBL_EPSILON ? OCL_OP_MUL : OCL_OP_MUL_SCALE);
static bool halMul8to16available = (cv_hal_mul8u16u != hal_ni_mul8u16u) && (cv_hal_mul8s16s != hal_ni_mul8s16s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if HAL implements only one of them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@savuor savuor mentioned this pull request May 12, 2024
6 tasks
@vpisarev
Copy link
Contributor

vpisarev commented May 14, 2024

Can we replace getMulTab(bool extended) function with getMulFunc(int src1depth, int src2depth, int dstdepth)?
We can pass such a function into arithm_op. arithm_op will first try to get direct function for the original src1, src2 and dst depths and if it fails, it will make another try with normalized types (in the latter case the explicit conversion 8u->32f is done for each of the inputs in the case of 8u x 8u -> 32f op).

@@ -617,7 +622,11 @@ static void arithm_op(InputArray _src1, InputArray _src2, OutputArray _dst,

Mat src1 = psrc1->getMat(), src2 = psrc2->getMat(), dst = _dst.getMat();
Size sz = getContinuousSize2D(src1, src2, dst, src1.channels());
tab[depth1](src1.ptr(), src1.step, src2.ptr(), src2.step, dst.ptr(), dst.step, sz.width, sz.height, usrdata);
if (extendedFunc(src1.ptr(), src1.step, src2.ptr(), src2.step,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!extendedFunc || extendedFunc() != 0)

{
double scale = *((double*)usrdata);
int res = cv_hal_mul8u16u(src1, step1, src2, step2, (ushort *)dst, step, width, height, scale);
if (res == 0 || res == CV_HAL_ERROR_NOT_IMPLEMENTED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use constant instead of 0.

Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@asmorkalov asmorkalov merged commit 69af621 into opencv:4.x May 20, 2024
30 checks passed
@savuor savuor deleted the rv/hal_mul16 branch May 20, 2024 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Extend OpenCV HAL multiply with signed and unsigned cases
3 participants