Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: refactor asm2asm #393

Merged
merged 7 commits into from Jul 4, 2023
Merged

feat: refactor asm2asm #393

merged 7 commits into from Jul 4, 2023

Conversation

AsterDY
Copy link
Collaborator

@AsterDY AsterDY commented Mar 29, 2023

What this PR for

Refactor asm2asm tool and golang-importing templates. We do this to mainly achieve two goals:

Support traceback when native C functions get panic or profiled

In the past, we generated Plan 9 assembly codes by asm2asm (https://github.com/chenzhuoyu/asm2asm) tools. But since Go ASM can't recognize many 'SP-writing' instructions like addq $xx, %rsp generated from clang, the imported c functions have incorrect pcsp funcdata and will fatally crash once panic or profiling happens inside C codes (even if it is recoverable). Thus, we need to calculate correct pcsp funcdata and import them into go runtime -- which is merely possible under the mechanism of Go ASM.

Better performance

On the other hand, Go ASM doesn't support register-based ABI yet (although it has been supported by pure go functions), which causes obviously performance decline of imported C functions (estimated about 5%~20%). If we can drop the denpency of Go ASM and import C functions in the form of pure machines, dynamically, we can gain a lot of flexibility and improvement on ABI conversion.

How doest it work

In compiling time, we refactor asm2asm to calculate and generate all the function information we need:

  • c machine codes, stored in a Go byte slice;
  • c function entries (both exported and unexported);
  • c function data, at present includes pcsp、max stack size, and text size.

In initial time, we implement ABI convertor (loader/WrapGoC()) to generate conversion codes:

  • check if Go stackguard is large enough for specific C function red zone + max stack size -- If it is not, call runtime.morstack();
  • spill any pointer-kind arguments onto stack for Go GC tracing;
  • exchange Go argument register (rax, rbx ...) with C argument register (rdi, rsi, ...), using stack slots if need (more than 3 arguements)
  • load C func entry and call it (movq + call)

and register both C function and Go stub function. At present, we register every C functions as NoPreempt trait and reserve 32B red-zone for them, to avoid unexpected problems caused by different ABI

Benchmark

  • Get/Set
name                      old time/op    new time/op    delta
SetOne_Sonic-16             3.35µs ± 7%    3.57µs ±15%     ~     (p=0.278 n=9+10)
GetOne_Sonic-16             1.69µs ±13%    1.35µs ± 2%  -20.26%  (p=0.000 n=9+10)

name                      old speed      new speed      delta
SetOne_Sonic-16           3.90GB/s ± 7%  3.68GB/s ±13%     ~     (p=0.278 n=9+10)
GetOne_Sonic-16           7.59GB/s ±17%  9.67GB/s ± 2%  +27.43%  (p=0.000 n=10+10)

name                      old alloc/op   new alloc/op   delta
GetOne_Sonic-16              24.0B ± 0%     24.0B ± 0%     ~     (all equal)
SetOne_Sonic-16             1.58kB ± 0%    1.58kB ± 0%     ~     (all equal)

name                      old allocs/op  new allocs/op  delta
GetOne_Sonic-16               1.00 ± 0%      1.00 ± 0%     ~     (all equal)
SetOne_Sonic-16               17.0 ± 0%      17.0 ± 0%     ~     (all equal)
  • Encode
name                                    old time/op    new time/op    delta
Encoder_Generic_Sonic-16                  40.8µs ±16%    39.5µs ±10%     ~     (p=0.243 n=9+10)
Encoder_Binding_Sonic-16                  6.78µs ± 2%    6.79µs ± 3%     ~     (p=0.842 n=9+10)
Encoder_Binding_Sonic_Fast-16             5.84µs ± 2%    5.80µs ± 3%     ~     (p=0.353 n=10+10)
Encoder_Generic_Sonic_Fast-16             27.5µs ± 4%    25.9µs ± 2%   -5.99%  (p=0.000 n=10+10)

name                                    old speed      new speed      delta
Encoder_Generic_Sonic-16                 321MB/s ±14%   331MB/s ± 9%     ~     (p=0.243 n=9+10)
Encoder_Binding_Sonic-16                1.92GB/s ± 2%  1.92GB/s ± 3%     ~     (p=0.842 n=9+10)
Encoder_Binding_Sonic_Fast-16           2.23GB/s ± 2%  2.25GB/s ± 3%     ~     (p=0.353 n=10+10)
Encoder_Generic_Sonic_Fast-16            474MB/s ± 4%   503MB/s ± 2%   +6.29%  (p=0.000 n=10+10)

name                                    old alloc/op   new alloc/op   delta
Encoder_Binding_Sonic-16                  14.1kB ± 0%    14.1kB ± 0%   +0.23%  (p=0.000 n=10+10)
Encoder_Generic_Sonic-16                  13.8kB ± 0%    13.8kB ± 0%     ~     (p=0.154 n=9+9)
Encoder_Generic_Sonic_Fast-16             9.71kB ± 0%    9.69kB ± 0%     ~     (p=0.225 n=10+10)
Encoder_Binding_Sonic_Fast-16             9.83kB ± 0%    9.84kB ± 0%     ~     (p=0.239 n=10+10)

name                                    old allocs/op  new allocs/op  delta
Encoder_Generic_Sonic-16                    4.00 ± 0%      4.00 ± 0%     ~     (all equal)
Encoder_Generic_Sonic_Fast-16               4.00 ± 0%      4.00 ± 0%     ~     (all equal)
Encoder_Binding_Sonic-16                    4.00 ± 0%      4.00 ± 0%     ~     (all equal)
Encoder_Binding_Sonic_Fast-16               4.00 ± 0%      4.00 ± 0%     ~     (all equal)
  • Decode
name                                    old time/op    new time/op    delta
Decoder_Binding_Sonic_Fast-16             35.0µs ± 4%    39.2µs ± 6%  +11.80%  (p=0.000 n=10+10)
Decoder_Binding_Sonic-16                  36.3µs ± 3%    39.2µs ± 4%   +8.03%  (p=0.000 n=10+10)
Decoder_Generic_Sonic-16                  91.6µs ± 6%    80.4µs ± 6%  -12.18%  (p=0.000 n=9+10)
Decoder_Generic_Sonic_Fast-16             69.5µs ±10%    59.2µs ± 4%  -14.91%  (p=0.000 n=10+10)

name                                    old speed      new speed      delta
Decoder_Binding_Sonic_Fast-16            372MB/s ± 4%   333MB/s ± 6%  -10.52%  (p=0.000 n=10+10)
Decoder_Binding_Sonic-16                 359MB/s ± 3%   332MB/s ± 4%   -7.43%  (p=0.000 n=10+10)
Decoder_Generic_Sonic-16                 143MB/s ± 7%   162MB/s ± 7%  +13.86%  (p=0.000 n=9+10)
Decoder_Generic_Sonic_Fast-16            189MB/s ±10%   220MB/s ± 4%  +16.85%  (p=0.000 n=10+10)

name                                    old alloc/op   new alloc/op   delta
Decoder_Generic_Sonic_Fast-16             49.0kB ± 0%    49.1kB ± 0%   +0.05%  (p=0.004 n=10+10)
Decoder_Generic_Sonic-16                  56.8kB ± 0%    56.8kB ± 0%     ~     (p=0.984 n=10+9)
Decoder_Binding_Sonic_Fast-16             24.3kB ± 0%    24.3kB ± 0%     ~     (p=0.149 n=10+9)
Decoder_Binding_Sonic-16                  27.3kB ± 0%    27.3kB ± 0%   -0.01%  (p=0.044 n=9+8)

name                                    old allocs/op  new allocs/op  delta
Decoder_Generic_Sonic-16                     723 ± 0%       723 ± 0%     ~     (all equal)
Decoder_Generic_Sonic_Fast-16                313 ± 0%       313 ± 0%     ~     (all equal)
Decoder_Binding_Sonic-16                     137 ± 0%       137 ± 0%     ~     (all equal)
Decoder_Binding_Sonic_Fast-16               34.0 ± 0%      34.0 ± 0%     ~     (all equal)

@codecov-commenter
Copy link

codecov-commenter commented Jul 3, 2023

Codecov Report

❗ No coverage uploaded for pull request base (main@b40bbd5). Click here to learn what that means.
The diff coverage is n/a.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@           Coverage Diff           @@
##             main     #393   +/-   ##
=======================================
  Coverage        ?   78.01%           
=======================================
  Files           ?       62           
  Lines           ?    10419           
  Branches        ?        0           
=======================================
  Hits            ?     8128           
  Misses          ?     1943           
  Partials        ?      348           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants