Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] Support weight-compression dt s8 #24457

Merged
merged 7 commits into from May 22, 2024

Conversation

usstq
Copy link
Contributor

@usstq usstq commented May 10, 2024

Details:

  • FC with symmetrically quantized/compressed weight may have i8 (instead of u8) as weight data type (it saves the zero-point subtraction cost), this change added support to such weight dt.
  • oneDNN fork PR: Support weight-compressed date type s8 oneDNN#249

Tickets:

@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label May 10, 2024
@usstq usstq marked this pull request as ready for review May 10, 2024 01:18
@usstq usstq requested review from a team as code owners May 10, 2024 01:18
@usstq usstq requested a review from luweizhou2016 May 10, 2024 01:31
@yuxu42 yuxu42 added this to the 2024.2 milestone May 11, 2024
Copy link
Contributor

@luweizhou2016 luweizhou2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so seems vnni2 doesn't support the compressed weight , or dynamic quantization?

@luweizhou2016
Copy link
Contributor

@usstq , pls help to attach onednn side PR link.

@usstq
Copy link
Contributor Author

usstq commented May 14, 2024

@usstq , pls help to attach onednn side PR link.

It's openvinotoolkit/oneDNN#249

@luweizhou2016
Copy link
Contributor

The code is okay for me. need @dmitry-gorokhov to take a look. Pls:

  1. apply the comment changes.
  2. change onednn commit message as https://wiki.ith.intel.com/pages/viewpage.action?pageId=2836774726. Refer the previous message.

// https://oneapi-src.github.io/oneDNN/dev_guide_int8_computations.html?highlight=128#inputs-of-the-same-type-s8
auto src_wdt = srcWeightDesc->getPrecision();
auto dst_wdt = dstWeightDesc->getPrecision();
if (needShiftSignedToUnsigned && src_wdt.is_integral_number() && src_wdt.is_signed() &&
Copy link
Contributor Author

@usstq usstq May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitry-gorokhov FYI, Here is the tricky logic added to fallback to u8/u4 with zp 128/8:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have two follow-ups here:

  1. Additional access to the weights (to perform 128/8 addition) adds some time to FIL metric. I suppose it is relatively not that high, but still worth to align with oneDNN team on possible merged solution with layout reorder
  2. Signed to unsigned conversion will not be needed for LNL+ arch which incorporates VNNI with full precision combinations support.

@moslex moslex added the priority: high High piority label May 21, 2024
@dmitry-gorokhov dmitry-gorokhov added this pull request to the merge queue May 22, 2024
Merged via the queue into openvinotoolkit:master with commit d8f5d7b May 22, 2024
110 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin Code Freeze priority: high High piority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants