Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GCC's may_alias attribute for unaligned memory access #1548

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

ccawley2011
Copy link
Contributor

On platforms that don't allow unaligned memory access, calls to memcpy don't always get inlined in cases where they would on platforms with it. Using the may_alias attribute ensures that the code for reading and writing one byte at a time is inlined, and should also handle cases where unaligned memory access is allowed (although checking UNALIGNED_OK works better in that regard).

@codecov
Copy link

codecov bot commented Jul 27, 2023

Codecov Report

Attention: Patch coverage is 84.37500% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 67.37%. Comparing base (af494fc) to head (d8da4c7).

❗ Current head d8da4c7 differs from pull request most recent head f295cca. Consider uploading reports for the commit f295cca to get more accurate results

Files Patch % Lines
arch/x86/chunkset_sse2.c 0.00% 3 Missing ⚠️
arch/generic/compare256_c.c 50.00% 2 Missing ⚠️
match_tpl.h 81.81% 2 Missing ⚠️
arch/generic/chunkset_c.c 80.00% 1 Missing ⚠️
compare256_rle.h 50.00% 1 Missing ⚠️
zmemory.h 95.83% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           develop    #1548       +/-   ##
============================================
+ Coverage         0   67.37%   +67.37%     
============================================
  Files            0      118      +118     
  Lines            0     9316     +9316     
  Branches         0     2338     +2338     
============================================
+ Hits             0     6277     +6277     
- Misses           0     1866     +1866     
- Partials         0     1173     +1173     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@KungFuJesus
Copy link
Contributor

Ah cool, this might actually improve the purely scalar performance I saw testing this on a Sun Fire V240 (depressingly slower than stock zlib). I'll bet the extra function calls here per aliasing load were a big part of that.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jul 28, 2023

Fails a lot of CI tests, need to investigate why...

@ccawley2011 ccawley2011 force-pushed the may-alias branch 2 times, most recently from 8737e10 to 6adb2fc Compare July 28, 2023 08:32
zmemory.h Outdated

static inline uint32_t zng_memread_4(const void *ptr) {
#if defined(UNALIGNED_OK)
return *(const uint32_t *)ptr;
Copy link
Member

@nmoinvaz nmoinvaz Aug 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a ton of work to avoid this kind of thing...
Compiler will convert memcpy to unaligned access if it is supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These UNALIGNED_OK branches should be removed. So there is only HAVE_MAY_ALIAS and the memcpy path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, UNALIGNED_OK branch should be removed.
We for sure don't want to default to that, if it was a new default-off setting like FORCE_UNALIGNED or something, then there would at least be room for discussion.

The HAVE_MAY_ALIAS branch is interesting, I wonder what kind of performance that gets compared to memcpy on the various platforms/compilers. If we do want to use that, detection would preferably be in configure/CMake for cleaner and wider compiler support, or at the very least moved to zbuild.h.

@nmoinvaz
Copy link
Member

nmoinvaz commented Aug 5, 2023

On platforms that don't allow unaligned memory access, calls to memcpy don't always get inlined in cases where they would on platforms with it.

Do you have a Compiler Explorer example where this is the case?

@phprus
Copy link
Contributor

phprus commented Aug 6, 2023

Does it revert PR #1309?
What modern and supported architectures and compilers are having this issue?

@ccawley2011
Copy link
Contributor Author

On platforms that don't allow unaligned memory access, calls to memcpy don't always get inlined in cases where they would on platforms with it.

Do you have a Compiler Explorer example where this is the case?

This example showcases the difference on ARM with modern GCC when unaligned memory access is and isn't available. It also covers SPARC as well, where memcpy is inlined for 4 byte reads but not 8 byte reads. It doesn't appear to be an issue on every architecture that requires aligned memory access, but it does occur on the architectures demonstrated. I also haven't investigated how compilers approach 16 and 32 byte reads, which would heavily impact chunkcopy_safe if memcpy doesn't get inlined.

Compiler Explorer link

zmemory.h Outdated

static inline uint32_t zng_memread_4(const void *ptr) {
#if defined(UNALIGNED_OK)
return *(const uint32_t *)ptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These UNALIGNED_OK branches should be removed. So there is only HAVE_MAY_ALIAS and the memcpy path.

zmemory.h Outdated
}

static inline int32_t zng_memcmp_4(const void *src0, const void *src1) {
#if defined(UNALIGNED_OK) || defined(HAVE_MAY_ALIAS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove the defined(UNALIGNED_OK) from all these. Only HAVE_MAY_ALIAS. We error on the side of "the compiler knows best" about converting memcpy to unaligned access. This is better than undefined behavior that can occur. There are other GitHub PRs related to this including #1100 that might be worth reading.

@ccawley2011
Copy link
Contributor Author

OK, I've rebased the branch and removed the special cases for UNALIGNED_OK and UNALIGNED64_OK.

#if defined(HAVE_MAY_ALIAS)
return zng_memread_2(src0) != zng_memread_2(src1);
#else
return memcmp(src0, src1, 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original comment suggests it should be using memcpy and not memcmp.

Perhaps it should be: return zng_memread_2(src0) != zng_memread_2(src1); no matter what.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code now does return zng_memread_2(src0) != zng_memread_2(src1); for cases where UNALIGNED_OK is defined, which should cover MSVC.

I'm not sure about using it for all cases since the worst case scenario there would be to emit two memcpy calls instead of one memcmp call, but since this PR covers the most common compilers (GCC >= 4 (and compatible) and MSVC), this should be OK for the time being.

@Dead2 Dead2 removed the Rebase needed Please do a 'git rebase develop yourbranch' label Feb 21, 2024
zmemory.h Outdated Show resolved Hide resolved
@Dead2
Copy link
Member

Dead2 commented Feb 23, 2024

I'll have to do some in-depth testing of this, but I am leaving on vacation now with very limited access, so it'll unfortunately have to wait for a while.

@Dead2
Copy link
Member

Dead2 commented Mar 5, 2024

X86-64 i9900K, GCC 13.2

Develop Mar 5 af494fc

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%      0.104/0.108/0.111/0.002        0.033/0.036/0.037/0.001        8,526,745
 2     43.871%      0.197/0.200/0.202/0.001        0.033/0.036/0.038/0.001        6,903,702
 3     42.388%      0.245/0.248/0.250/0.001        0.032/0.035/0.036/0.002        6,670,239
 4     41.647%      0.272/0.277/0.280/0.002        0.032/0.034/0.035/0.001        6,553,746
 5     41.216%      0.299/0.306/0.309/0.002        0.030/0.034/0.035/0.001        6,485,936
 6     41.038%      0.347/0.354/0.357/0.002        0.030/0.034/0.036/0.002        6,457,827
 7     40.778%      0.493/0.501/0.505/0.003        0.030/0.034/0.035/0.001        6,416,941
 8     40.704%      0.610/0.615/0.620/0.003        0.032/0.034/0.035/0.001        6,405,249
 9     40.409%      0.899/0.907/0.912/0.003        0.030/0.033/0.035/0.002        6,358,951

 avg1  42.915%                        0.391                          0.034
 tot                                105.495                          9.250       60,779,336

   text    data     bss     dec     hex filename
 136798    1312       8  138118   21b86 libz-ng.so.2

PR 1548

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%      0.105/0.109/0.111/0.001        0.034/0.035/0.037/0.001        8,526,745
 2     43.871%      0.198/0.200/0.201/0.001        0.031/0.036/0.038/0.002        6,903,702
 3     42.388%      0.241/0.247/0.250/0.002        0.032/0.034/0.035/0.001        6,670,239
 4     41.647%      0.274/0.277/0.279/0.001        0.031/0.034/0.035/0.001        6,553,746
 5     41.216%      0.301/0.307/0.311/0.003        0.031/0.034/0.035/0.001        6,485,936
 6     41.038%      0.350/0.354/0.358/0.002        0.029/0.033/0.035/0.001        6,457,827
 7     40.778%      0.493/0.500/0.503/0.002        0.031/0.034/0.036/0.001        6,416,941
 8     40.704%      0.608/0.616/0.621/0.003        0.030/0.033/0.034/0.001        6,405,249
 9     40.409%      0.899/0.907/0.912/0.004        0.030/0.033/0.035/0.001        6,358,951

 avg1  42.915%                        0.391                          0.034
 tot                                105.539                          9.192       60,779,336

   text    data     bss     dec     hex filename
 136798    1312       8  138118   21b86 libz-ng.so.2

No changes noticeable.

Anyone able to do MSVC builds and benchmarks?

@nmoinvaz
Copy link
Member

nmoinvaz commented Mar 5, 2024

This PR must be rebased. There are some conflicts in the nmake files.

@nmoinvaz nmoinvaz added the Rebase needed Please do a 'git rebase develop yourbranch' label Mar 5, 2024
@nmoinvaz
Copy link
Member

nmoinvaz commented Mar 5, 2024

silesia.tar

MSVC 19.38.33134.0

DEVELOP af494fc

OS: Windows 10 10.0.22631 AMD64
CPU: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
Tool: ../zlib-ng/build-develop/Release/minigzip.exe Size: 105,472 B
Levels: 0-9
Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev   Compressed size
 0    100.008%      0.086/0.087/0.088/0.001       211,973,953
 1     44.409%      0.696/0.699/0.700/0.001        94,127,497
 2     35.519%      1.101/1.105/1.108/0.002        75,286,322
 3     33.882%      1.472/1.477/1.481/0.003        71,815,206
 4     33.175%      1.705/1.711/1.714/0.002        70,317,229
 5     32.661%      1.946/1.949/1.952/0.002        69,228,146
 6     32.507%      2.325/2.333/2.337/0.003        68,902,143
 7     32.255%      3.091/3.098/3.106/0.005        68,366,759
 8     32.167%      4.974/5.027/5.094/0.030        68,180,762
 9     31.887%      5.402/5.447/5.522/0.032        67,586,442

 avg1  40.847%                        2.293
 avg2  45.386%                        2.548
 tot                                687.972       865,784,459

PR #1548 rebased on top of DEVELOP

OS: Windows 10 10.0.22631 AMD64
CPU: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
Tool: ../zlib-ng/build-1548/Release/minigzip.exe Size: 104,960 B
Levels: 0-9
Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev   Compressed size
 0    100.008%      0.085/0.087/0.087/0.000       211,973,953
 1     44.409%      0.694/0.698/0.699/0.001        94,127,497
 2     35.519%      1.099/1.102/1.104/0.001        75,286,322
 3     33.882%      1.472/1.476/1.479/0.002        71,815,206
 4     33.175%      1.708/1.711/1.713/0.002        70,317,229
 5     32.661%      1.943/1.947/1.950/0.002        69,228,146
 6     32.507%      2.323/2.329/2.333/0.003        68,902,143
 7     32.255%      3.091/3.097/3.100/0.002        68,366,759
 8     32.167%      4.965/5.018/5.054/0.028        68,180,762
 9     31.887%      5.398/5.436/5.471/0.020        67,586,442

 avg1  40.847%                        2.290
 avg2  45.386%                        2.544
 tot                                686.977       865,784,459

I forgot to turn on decompression in the tests so I will check that out later. It is interesting to see a reduction in code size even though it should compile to the same thing.

@nmoinvaz
Copy link
Member

nmoinvaz commented Mar 5, 2024

With decompression

silesia.tar

MSVC 19.38.33134.0

DEVELOP af494fc

OS: Windows 10 10.0.22631 AMD64
CPU: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
Tool: ../zlib-ng/build-develop/Release/minigzip.exe Size: 105,472 B
Levels: 0-9
Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 0    100.008%      0.087/0.089/0.089/0.001        0.084/0.085/0.086/0.001      211,973,953
 1     44.409%      0.702/0.706/0.709/0.002        0.341/0.345/0.346/0.001       94,127,497
 2     35.519%      1.111/1.116/1.120/0.002        0.336/0.339/0.341/0.001       75,286,322
 3     33.882%      1.486/1.492/1.500/0.004        0.320/0.324/0.325/0.001       71,815,206
 4     33.175%      1.724/1.733/1.754/0.007        0.314/0.317/0.318/0.001       70,317,229
 5     32.661%      1.957/1.973/1.992/0.009        0.310/0.313/0.314/0.001       69,228,146
 6     32.507%      2.346/2.367/2.405/0.018        0.308/0.312/0.314/0.001       68,902,143
 7     32.255%      3.115/3.142/3.182/0.020        0.307/0.311/0.313/0.001       68,366,759
 8     32.167%      5.091/5.179/5.234/0.041        0.309/0.312/0.313/0.001       68,180,762
 9     31.887%      5.488/5.575/5.633/0.031        0.304/0.307/0.309/0.001       67,586,442

 avg1  40.847%                        2.337                          0.296
 avg2  45.386%                        2.597                          0.329
 tot                                701.133                         88.939      865,784,459

PR #1548 rebased on top of DEVELOP

OS: Windows 10 10.0.22631 AMD64
CPU: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
Tool: ../zlib-ng/build-1548/Release/minigzip.exe Size: 104,960 B
Levels: 0-9
Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 0    100.008%      0.087/0.089/0.089/0.001        0.084/0.086/0.086/0.001      211,973,953
 1     44.409%      0.697/0.705/0.709/0.004        0.343/0.346/0.348/0.001       94,127,497
 2     35.519%      1.104/1.121/1.139/0.009        0.334/0.339/0.342/0.002       75,286,322
 3     33.882%      1.479/1.498/1.520/0.012        0.321/0.325/0.327/0.002       71,815,206
 4     33.175%      1.726/1.733/1.743/0.005        0.315/0.317/0.319/0.001       70,317,229
 5     32.661%      1.959/1.970/1.985/0.006        0.309/0.313/0.315/0.001       69,228,146
 6     32.507%      2.345/2.362/2.392/0.012        0.309/0.311/0.313/0.001       68,902,143
 7     32.255%      3.117/3.138/3.180/0.018        0.309/0.311/0.313/0.001       68,366,759
 8     32.167%      5.063/5.174/5.256/0.056        0.310/0.311/0.313/0.001       68,180,762
 9     31.887%      5.523/5.589/5.643/0.034        0.304/0.308/0.309/0.001       67,586,442

 avg1  40.847%                        2.338                          0.297
 avg2  45.386%                        2.597                          0.330
 tot                                701.305                         88.968      865,784,459

@nmoinvaz nmoinvaz removed the Rebase needed Please do a 'git rebase develop yourbranch' label Mar 6, 2024
@Dead2
Copy link
Member

Dead2 commented Mar 6, 2024

So, we have established that new gcc and msvc (both on x86-64) do not get regressions with this.
Do we know what compiler versions (and arches) that benefit from this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants