Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SIMD for block inits with GC fields #102132

Merged
merged 5 commits into from May 15, 2024
Merged

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented May 12, 2024

Closes #83297

Currently, we conservatively give up on using SIMDs for structs (blocks) with GC references since SIMD (on x86/64 arch) has certain requirements to provide atomicity guarantees (data needs to be 16 byte aligned, CPU must support AVX2 and we must use aligned store instructions). Let's at least use SIMD for continuous non-GC parts of such structs, example:

struct MyStruct
{
    string gc1;
    long a;
    long b;
    long c;
    long d;
    long e;
    long f;
    long g;
    long h;
}

MyStruct Test() => new();

Codegen diff:

; Method Bench:Test():Bench+MyStruct:this (FullOpts)
       xor      eax, eax
       mov      qword ptr [rdx], rax ;; <-- GC slot
-      mov      qword ptr [rdx+0x08], rax
-      mov      qword ptr [rdx+0x10], rax
-      mov      qword ptr [rdx+0x18], rax
-      mov      qword ptr [rdx+0x20], rax
-      mov      qword ptr [rdx+0x28], rax
-      mov      qword ptr [rdx+0x30], rax
-      mov      qword ptr [rdx+0x38], rax
-      mov      qword ptr [rdx+0x40], rax
+      vxorps   xmm0, xmm0, xmm0
+      vmovdqu32 zmmword ptr [rdx+0x08], zmm0
       mov      rax, rdx
       ret      
-; Total bytes of code: 41
+; Total bytes of code: 23

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 12, 2024
@dotnet dotnet deleted a comment from EgorBot May 12, 2024
@dotnet dotnet deleted a comment from EgorBot May 12, 2024
@dotnet dotnet deleted a comment from EgorBot May 12, 2024
@EgorBo
Copy link
Member Author

EgorBo commented May 12, 2024

@EgorBot

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Bench>(args: args);

public class Bench
{
    public struct MyStructWithGC
    {
        object gc;
        long a; long b; long c; long d; 
        long e; long f; long g; long h; 
    }

    MyStructWithGC _fld1;
    MyStructWithGC _fld2;

    [Benchmark]
    public void Zeroing()
    {
        _fld1 = default;
        _fld2 = default;
    }
}

@EgorBot
Copy link

EgorBot commented May 12, 2024

@EgorBo

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 2 logical cores and 1 physical core
.NET SDK 9.0.100-preview.3.24204.13
  [Host]     : .NET 9.0.0 (9.0.24.17209), X64 RyuJIT AVX2
  Job-BWFEHP : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-NQQMIR : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method Toolchain Mean Ratio
Zeroing /Main/corerun 1.9355 ns 1.00
Zeroing /PR/corerun 0.4288 ns 0.22

@EgorBo EgorBo marked this pull request as ready for review May 12, 2024 18:23
@EgorBo
Copy link
Member Author

EgorBo commented May 13, 2024

@jakobbotsch @kunalspathak @dotnet/jit-contrib PTAL, diffs aren't too big

@EgorBo
Copy link
Member Author

EgorBo commented May 15, 2024

Ping @jakobbotsch @kunalspathak @dotnet/jit-contrib 🙂 #102209 depends on it (for simpler diffs)

@jakobbotsch
Copy link
Member

jakobbotsch commented May 15, 2024

It would be nice to support it for copies as well (I think that would fix #90196 and #7469)

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@EgorBo EgorBo merged commit 2b0c1de into dotnet:main May 15, 2024
107 checks passed
@EgorBo EgorBo deleted the simd-blk-gc-init branch May 15, 2024 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suboptimal struct zeroing in case of gc pointers
4 participants