New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] Support weight-compression dt s8 #24457
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so seems vnni2 doesn't support the compressed weight , or dynamic quantization?
src/plugins/intel_cpu/src/nodes/executors/dnnl/dnnl_fullyconnected_primitive.cpp
Outdated
Show resolved
Hide resolved
@usstq , pls help to attach onednn side PR link. |
|
The code is okay for me. need @dmitry-gorokhov to take a look. Pls:
|
// https://oneapi-src.github.io/oneDNN/dev_guide_int8_computations.html?highlight=128#inputs-of-the-same-type-s8 | ||
auto src_wdt = srcWeightDesc->getPrecision(); | ||
auto dst_wdt = dstWeightDesc->getPrecision(); | ||
if (needShiftSignedToUnsigned && src_wdt.is_integral_number() && src_wdt.is_signed() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitry-gorokhov FYI, Here is the tricky logic added to fallback to u8/u4 with zp 128/8:
- we reorder layout of src into dst w/o type conversion
- after layout is made right, we shift i8/i4 value by adding zero-point to make them into u8/u4 range, as documented in https://oneapi-src.github.io/oneDNN/dev_guide_int8_computations.html?highlight=128#inputs-of-the-same-type-s8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have two follow-ups here:
- Additional access to the weights (to perform 128/8 addition) adds some time to FIL metric. I suppose it is relatively not that high, but still worth to align with oneDNN team on possible merged solution with layout reorder
- Signed to unsigned conversion will not be needed for LNL+ arch which incorporates VNNI with full precision combinations support.
Details:
Tickets: