Skip to content

v3.4.2

Latest
Compare
Choose a tag to compare
@vpirogov vpirogov released this 10 May 22:02
· 1 commit to rls-v3.4 since this release

This is a patch release containing the following changes to v3.4.1:

  • Fixed performance regression in deconvolution on processors with Intel AVX-512 instruction set (307b35b, f46fffb)
  • Improved performance of batched matmul with binary post-op on processors with Intel AVX-512 instruction set (d39e1b7)
  • Fixed performance regression in softmax with destination memory format set to any on processors with Intel AVX-512 instruction set (756d3cf)
  • Fixed incorrect results in int8 deconvolution with source zero points on processors with Intel AMX instruction set (d5ddbc8)
  • Fixed performance regression in convolution on processors with Intel AVX2 instruction set (2968c89)
  • Improved f8_e4m3 matmul performance on Intel Data Center GPU Max Series (068f850, 668abae, c3972ef, ad94382)
  • Fixed sporadic accuracy issues in bf16 depthwise convolution backpropagation on processors with Intel AVX-512 instruction set (0184044)
  • Fixed primitive creation issue for fp16 pooling backpropagation on Intel GPUs (e4737d9)
  • Fixed failure for subgraphs with int8 matmul operation with experimental Graph Compiler on processors with Intel AMX instruction set (5ebde2e)
  • Fixed assert in experimental Graph Compiler on Windows (f53fbd1, fd903ae)
  • Fixed incorrect results for subgraphs with shuffle operation with experimental Graph Compiler (aef5023)
  • Improved performance of subgraphs involving int8 matmul with experimental Graph Compiler on processors with Intel AMX support (0ca5bc5)
  • Fixed page fault in fp16 matmul primitive on Intel Data Center GPU Max Series (5587f08)
  • Fixed incorrect results in dp32 deconvolution with Arm Compute Library on AArch64 processors (b7694a0)
  • Fixed performance regression in deconvolution on processors with Intel AVX2 instruction set (6f452e2)