New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use GCC's may_alias attribute for unaligned memory access #1548
base: develop
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1548 +/- ##
============================================
+ Coverage 0 67.37% +67.37%
============================================
Files 0 118 +118
Lines 0 9316 +9316
Branches 0 2338 +2338
============================================
+ Hits 0 6277 +6277
- Misses 0 1866 +1866
- Partials 0 1173 +1173 ☔ View full report in Codecov by Sentry. |
Ah cool, this might actually improve the purely scalar performance I saw testing this on a Sun Fire V240 (depressingly slower than stock zlib). I'll bet the extra function calls here per aliasing load were a big part of that. |
Fails a lot of CI tests, need to investigate why... |
8737e10
to
6adb2fc
Compare
zmemory.h
Outdated
|
||
static inline uint32_t zng_memread_4(const void *ptr) { | ||
#if defined(UNALIGNED_OK) | ||
return *(const uint32_t *)ptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a ton of work to avoid this kind of thing...
Compiler will convert memcpy
to unaligned access if it is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These UNALIGNED_OK
branches should be removed. So there is only HAVE_MAY_ALIAS
and the memcpy
path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, UNALIGNED_OK
branch should be removed.
We for sure don't want to default to that, if it was a new default-off setting like FORCE_UNALIGNED
or something, then there would at least be room for discussion.
The HAVE_MAY_ALIAS
branch is interesting, I wonder what kind of performance that gets compared to memcpy on the various platforms/compilers. If we do want to use that, detection would preferably be in configure/CMake for cleaner and wider compiler support, or at the very least moved to zbuild.h
.
Do you have a Compiler Explorer example where this is the case? |
Does it revert PR #1309? |
This example showcases the difference on ARM with modern GCC when unaligned memory access is and isn't available. It also covers SPARC as well, where |
zmemory.h
Outdated
|
||
static inline uint32_t zng_memread_4(const void *ptr) { | ||
#if defined(UNALIGNED_OK) | ||
return *(const uint32_t *)ptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These UNALIGNED_OK
branches should be removed. So there is only HAVE_MAY_ALIAS
and the memcpy
path.
zmemory.h
Outdated
} | ||
|
||
static inline int32_t zng_memcmp_4(const void *src0, const void *src1) { | ||
#if defined(UNALIGNED_OK) || defined(HAVE_MAY_ALIAS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should remove the defined(UNALIGNED_OK)
from all these. Only HAVE_MAY_ALIAS
. We error on the side of "the compiler knows best" about converting memcpy
to unaligned access. This is better than undefined behavior that can occur. There are other GitHub PRs related to this including #1100 that might be worth reading.
6adb2fc
to
afdacc1
Compare
OK, I've rebased the branch and removed the special cases for |
afdacc1
to
5a9d9e3
Compare
5a9d9e3
to
28509bd
Compare
#if defined(HAVE_MAY_ALIAS) | ||
return zng_memread_2(src0) != zng_memread_2(src1); | ||
#else | ||
return memcmp(src0, src1, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original comment suggests it should be using memcpy
and not memcmp
.
Perhaps it should be: return zng_memread_2(src0) != zng_memread_2(src1);
no matter what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code now does return zng_memread_2(src0) != zng_memread_2(src1);
for cases where UNALIGNED_OK
is defined, which should cover MSVC.
I'm not sure about using it for all cases since the worst case scenario there would be to emit two memcpy
calls instead of one memcmp
call, but since this PR covers the most common compilers (GCC >= 4 (and compatible) and MSVC), this should be OK for the time being.
28509bd
to
5dc5794
Compare
5dc5794
to
d8da4c7
Compare
I'll have to do some in-depth testing of this, but I am leaving on vacation now with very limited access, so it'll unfortunately have to wait for a while. |
X86-64 i9900K, GCC 13.2 Develop Mar 5 af494fc
PR 1548
No changes noticeable. Anyone able to do MSVC builds and benchmarks? |
This PR must be rebased. There are some conflicts in the nmake files. |
silesia.tarMSVC 19.38.33134.0 DEVELOP af494fc
PR #1548 rebased on top of DEVELOP
I forgot to turn on decompression in the tests so I will check that out later. It is interesting to see a reduction in code size even though it should compile to the same thing. |
With decompression silesia.tarMSVC 19.38.33134.0 DEVELOP af494fc
PR #1548 rebased on top of DEVELOP
|
So, we have established that new gcc and msvc (both on x86-64) do not get regressions with this. |
On platforms that don't allow unaligned memory access, calls to
memcpy
don't always get inlined in cases where they would on platforms with it. Using themay_alias
attribute ensures that the code for reading and writing one byte at a time is inlined, and should also handle cases where unaligned memory access is allowed (although checkingUNALIGNED_OK
works better in that regard).