Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strong performance regression with target-cpu=native #190

Open
fr0staman opened this issue Dec 22, 2023 · 11 comments
Open

Strong performance regression with target-cpu=native #190

fr0staman opened this issue Dec 22, 2023 · 11 comments

Comments

@fr0staman
Copy link

So, ahash with target-cpu=native on my setup shows significant performance regression
This may be a Rust/LLVM issue, but I'll create an issue here first.

Repro:
https://github.com/fr0staman/rust-ahash-target-native-performance-issue

My setup

Rust:

rustc 1.74.1 (a28077b28 2023-12-04)
binary: rustc
commit-hash: a28077b28a02b92985b3a3faecf92813155f1ea1
commit-date: 2023-12-04
host: x86_64-unknown-linux-gnu
release: 1.74.1
LLVM version: 17.0.4

System:

CPU: AMD Ryzen 5 4500U
OS: Ubuntu 22.04.3 LTS

Results

Standard target

fr0staman@kotobook:~/source/repos/rust/rust-ahash-target-native-performance-issue$ cargo bench
    Finished bench [optimized] target(s) in 36.18s
     Running unittests src/main.rs (target/release/deps/rust_ahash_target_native_performance_issue-4df22a78d1110619)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/issue.rs (target/release/deps/ahash-3b7ae86a7bc7bacb)
Gnuplot not found, using plotters backend
Performance/ahash/(32, 128)
                        time:   [21.672 µs 21.698 µs 21.727 µs]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
Performance/ahash/(256, 1024)
                        time:   [983.01 µs 983.94 µs 984.92 µs]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
Performance/ahash/(1024, 4096)
                        time:   [15.256 ms 15.298 ms 15.341 ms]

target-cpu=native

fr0staman@kotobook:~/source/repos/rust/rust-ahash-target-native-performance-issue$ RUSTFLAGS='-C target-cpu=native' cargo bench
    Finished bench [optimized] target(s) in 46.42s
     Running unittests src/main.rs (target/release/deps/rust_ahash_target_native_performance_issue-4df22a78d1110619)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/issue.rs (target/release/deps/ahash-3b7ae86a7bc7bacb)
Gnuplot not found, using plotters backend
Performance/ahash/(32, 128)
                        time:   [37.734 µs 37.761 µs 37.789 µs]
                        change: [+73.336% +73.657% +73.980%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high severe
Performance/ahash/(256, 1024)
                        time:   [2.4681 ms 2.4698 ms 2.4717 ms]
                        change: [+150.51% +150.90% +151.29%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Performance/ahash/(1024, 4096)
                        time:   [38.308 ms 38.369 ms 38.433 ms]
                        change: [+149.98% +150.82% +151.60%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
@puuuuh
Copy link

puuuuh commented Dec 24, 2023

example can be reduced to target-feature=+aes

@tkaitchuck
Copy link
Owner

It looks like this bench is only hashing char which SHOULD be specialized both cases. (Ideally to identical instructions.) I'll take a look this.

@tkaitchuck
Copy link
Owner

This does not appear to happen on my intel i9. There must be something odd in the assembly for the Ryzen.
If +aes is giving identical performance to native it is possible it's not picking up the sse2 instructions for some reason.

@tkaitchuck
Copy link
Owner

@fr0staman If you run rustc --print=target-cpus what does it indicate the detected CPU target is?

@tkaitchuck
Copy link
Owner

This might be related rust-lang/rust#80633

@0xdeafbeef
Copy link

rustc --print=target-cpus

Available CPUs for this target:
    native                  - Select the CPU of the current host (currently znver4).
    alderlake
    amdfam10
    athlon
    athlon-4
    athlon-fx
    athlon-mp
    athlon-tbird
    athlon-xp
    athlon64
    athlon64-sse3
    atom
    atom_sse4_2
    atom_sse4_2_movbe
    barcelona
    bdver1
    bdver2
    bdver3
    bdver4
    bonnell
    broadwell
    btver1
    btver2
    c3
    c3-2
    cannonlake
    cascadelake
    cooperlake
    core-avx-i
    core-avx2
    core2
    core_2_duo_sse4_1
    core_2_duo_ssse3
    core_2nd_gen_avx
    core_3rd_gen_avx
    core_4th_gen_avx
    core_4th_gen_avx_tsx
    core_5th_gen_avx
    core_5th_gen_avx_tsx
    core_aes_pclmulqdq
    core_i7_sse4_2
    corei7
    corei7-avx
    emeraldrapids
    generic
    geode
    goldmont
    goldmont-plus
    goldmont_plus
    grandridge
    graniterapids
    graniterapids-d
    graniterapids_d
    haswell
    i386
    i486
    i586
    i686
    icelake-client
    icelake-server
    icelake_client
    icelake_server
    ivybridge
    k6
    k6-2
    k6-3
    k8
    k8-sse3
    knl
    knm
    lakemont
    meteorlake
    mic_avx512
    nehalem
    nocona
    opteron
    opteron-sse3
    penryn
    pentium
    pentium-m
    pentium-mmx
    pentium2
    pentium3
    pentium3m
    pentium4
    pentium4m
    pentium_4
    pentium_4_sse3
    pentium_ii
    pentium_iii
    pentium_iii_no_xmm_regs
    pentium_m
    pentium_mmx
    pentium_pro
    pentiumpro
    prescott
    raptorlake
    rocketlake
    sandybridge
    sapphirerapids
    sierraforest
    silvermont
    skx
    skylake
    skylake-avx512
    skylake_avx512
    slm
    tigerlake
    tremont
    westmere
    winchip-c6
    winchip2
    x86-64                  - This is the default target CPU for the current build target (currently x86_64-unknown-linux-gnu).
    x86-64-v2
    x86-64-v3
    x86-64-v4
    yonah
    znver1
    znver2
    znver3
    znver4
    ```
    
    Also has regression

@fr0staman
Copy link
Author

rustc --print=target-cpus

Available CPUs for this target:
    native                  - Select the CPU of the current host (currently znver1).
    alderlake
    amdfam10
    athlon
    athlon-4
    athlon-fx
    athlon-mp
    athlon-tbird
    athlon-xp
    athlon64
    athlon64-sse3
    atom
    atom_sse4_2
    atom_sse4_2_movbe
    barcelona
    bdver1
    bdver2
    bdver3
    bdver4
    bonnell
    broadwell
    btver1
    btver2
    c3
    c3-2
    cannonlake
    cascadelake
    cooperlake
    core-avx-i
    core-avx2
    core2
    core_2_duo_sse4_1
    core_2_duo_ssse3
    core_2nd_gen_avx
    core_3rd_gen_avx
    core_4th_gen_avx
    core_4th_gen_avx_tsx
    core_5th_gen_avx
    core_5th_gen_avx_tsx
    core_aes_pclmulqdq
    core_i7_sse4_2
    corei7
    corei7-avx
    emeraldrapids
    generic
    geode
    goldmont
    goldmont-plus
    goldmont_plus
    grandridge
    graniterapids
    graniterapids-d
    graniterapids_d
    haswell
    i386
    i486
    i586
    i686
    icelake-client
    icelake-server
    icelake_client
    icelake_server
    ivybridge
    k6
    k6-2
    k6-3
    k8
    k8-sse3
    knl
    knm
    lakemont
    meteorlake
    mic_avx512
    nehalem
    nocona
    opteron
    opteron-sse3
    penryn
    pentium
    pentium-m
    pentium-mmx
    pentium2
    pentium3
    pentium3m
    pentium4
    pentium4m
    pentium_4
    pentium_4_sse3
    pentium_ii
    pentium_iii
    pentium_iii_no_xmm_regs
    pentium_m
    pentium_mmx
    pentium_pro
    pentiumpro
    prescott
    raptorlake
    rocketlake
    sandybridge
    sapphirerapids
    sierraforest
    silvermont
    skx
    skylake
    skylake-avx512
    skylake_avx512
    slm
    tigerlake
    tremont
    westmere
    winchip-c6
    winchip2
    x86-64                  - This is the default target CPU for the current build target (currently x86_64-unknown-linux-gnu).
    x86-64-v2
    x86-64-v3
    x86-64-v4
    yonah
    znver1
    znver2
    znver3
    znver4

@Pzixel
Copy link

Pzixel commented Dec 30, 2023

@tkaitchuck I actually think this issue might be relevant: https://internals.rust-lang.org/t/slower-code-with-c-target-cpu-native/17315/7

@0xdeafbeef
Copy link

https://share.firefox.dev/3RWEHk5 without aes flag
https://share.firefox.dev/48D3E9Y with aes flag

image
image

Aes feature is indeed detected

@tkaitchuck
Copy link
Owner

@fr0staman Can you check if this is fixed on the 0.9 prerelease branch

@fr0staman
Copy link
Author

Certainly!

Unfortunately, nothing has changed:

fr0staman@kotobook:~/source/repos/rust/rust-ahash-target-native-performance-issue$ RUSTFLAGS='-C target-cpu=native' cargo bench
   ...
   Compiling ahash v0.9.0 (https://github.com/tkaitchuck/aHash?branch=0.9-prerelease#af37d79e)
   ...
    Finished bench [optimized] target(s) in 43.16s
     Running unittests src/main.rs (target/release/deps/rust_ahash_target_native_performance_issue-a98c230d15dcf9ae)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/issue.rs (target/release/deps/issue-a3d835f7ef64d9be)
Gnuplot not found, using plotters backend
Performance/ahash/(32, 128)
                        time:   [37.539 µs 37.543 µs 37.546 µs]
                        change: [+97.437% +97.897% +98.305%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Performance/ahash/(256, 1024)
                        time:   [2.3726 ms 2.3733 ms 2.3740 ms]
                        change: [+156.12% +156.46% +156.76%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Performance/ahash/(1024, 4096)
                        time:   [38.066 ms 38.109 ms 38.153 ms]
                        change: [+154.20% +155.09% +155.95%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants