Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux x64: segfault by Unity/Mono #1708

Open
jo-oe opened this issue Apr 8, 2024 · 20 comments · May be fixed by #1713
Open

Linux x64: segfault by Unity/Mono #1708

jo-oe opened this issue Apr 8, 2024 · 20 comments · May be fixed by #1713
Labels
Compatibility API/ABI Compatibility issue

Comments

@jo-oe
Copy link

jo-oe commented Apr 8, 2024

Replacing zlib with the zlib-ng compatibility library consistently causes a segfault in the game "Kerbal Space Program" (built on Unity 2019.4) in certain situations.

Coredump backtrace:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007fd9344ab163 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
#2  0x00007fd93445365e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007fd93443b902 in __GI_abort () at abort.c:79
#4  0x00007fd9350eef0f in HandleSignal(int, siginfo_t*, void*) () from /home/jonas/games/steam/steamapps/common/Kerbal Space Program/UnityPlayer.so
#5  0x00007fd8dcf51a42 in ?? () from /home/jonas/games/steam/steamapps/common/Kerbal Space Program/KSP_Data/MonoBleedingEdge/x86_64/libmonobdwgc-2.0.so
#6  0x00007fd8dce5b9f4 in ?? () from /home/jonas/games/steam/steamapps/common/Kerbal Space Program/KSP_Data/MonoBleedingEdge/x86_64/libmonobdwgc-2.0.so
#7  <signal handler called>
#8  0x00007fd9344b9d05 in __GI___libc_free (mem=0x10011) at malloc.c:3375
#9  0x00007fd9338b8964 in z_free_aligned (ptr=<optimized out>, opaque=<optimized out>, zfree=<optimized out>)
    at /usr/src/debug/zlib-ng-2.1.6-2.fc40.x86_64/zutil.c:158
#10 deflateEnd (strm=0x6fd5fa0) at /usr/src/debug/zlib-ng-2.1.6-2.fc40.x86_64/deflate.c:1018
#11 0x00007fd7bf61abfa in CloseZStream ()
   from /home/jonas/games/steam/steamapps/common/Kerbal Space Program/KSP_Data/MonoBleedingEdge/x86_64/libMonoPosixHelper.so
#12 0x00000000416daead in ?? ()
#13 0x0000000000000000 in ?? ()

(Sorry, this is a proprietary app, so several symbols do not have debug information.)

System:
GNU/Linux / Fedora 40 (Beta) / Kernel 6.8.4
CPU: AMD Ryzen 7 7840HS

Please tell me what further information I can provide.

@mtl1979
Copy link
Collaborator

mtl1979 commented Apr 8, 2024

Crash when calling z_free_aligned usually means that the freed memory was not allocated by zlib-ng... This is usually caused by changing memory allocation function after using function from zlib-ng that need to allocate memory.

@jo-oe
Copy link
Author

jo-oe commented Apr 8, 2024

I am not able to change the software in question as it is proprietary and not updated any more.
It might rely on some undocumented behaviour from zlib which does not expose the problem. But it works reliably with zlib and always crashes with zlib-ng. So I decided to report it here.

@mtl1979
Copy link
Collaborator

mtl1979 commented Apr 8, 2024

If you look at line #8 in the stack trace, the pointer passed to __GI___libc_free is invalid, the value is way too small... It should be close to the address at start of the same line. This is almost always caused by mixing memory allocation and deallocation functions from different libraries.

@jo-oe
Copy link
Author

jo-oe commented Apr 9, 2024

If you look at line #8 in the stack trace, the pointer passed to __GI___libc_free is invalid, the value is way too small... It should be close to the address at start of the same line. This is almost always caused by mixing memory allocation and deallocation functions from different libraries.

Thank you for the ideas on where to start looking. With lots of difficulties, I managed to compile a libMonoPosixHelper.so from the Unity Mono repository. And by changing a few lines, the segfault can be avoided. Alloc/free functions were indeed the culprit.

diff --git a/support/zlib-helper.c b/support/zlib-helper.c
index 9dcebc7e967..d77f2eafc16 100644
--- a/support/zlib-helper.c
+++ b/support/zlib-helper.c
@@ -82,6 +82,8 @@ CreateZStream (gint compress, guchar gzip, read_write_func func, void *gchandle)
 #endif
 
        z = z_new0 (z_stream, 1);
+       z->zalloc = z_alloc;
+       z->zfree = z_free;
        if (compress) {
                retval = deflateInit2 (z, Z_DEFAULT_COMPRESSION, Z_DEFLATED, gzip ? 31 : -15, 8, Z_DEFAULT_STRATEGY);
        } else {
@@ -92,8 +94,6 @@ CreateZStream (gint compress, guchar gzip, read_write_func func, void *gchandle)
                free (z);
                return NULL;
        }
-       z->zalloc = z_alloc;
-       z->zfree = z_free;
        result = z_new0 (ZStream, 1);
        result->stream = z;
        result->func = func;

I'd still like to point out though that I hacked a proprietary software (video game) that was working with an outdated (buggy) library.

zlib (not ng!) would not expose this bug, while zlib-ng does!
While I would understand if you'd consider this issue a WONTFIX, I'd like to point out the following:
For the average end user unable to replace a buggy 3rd party library, this will probably mean as soon as zlib-ng is adapted downstream (Fedora 40 does this!), they won't be able to use this software any more.

@jo-oe
Copy link
Author

jo-oe commented Apr 9, 2024

Apparently deflateInit2 from zlib and zlib-ng have a different defaults when 0 is passed for the alloc/free functions in the z_stream object. So when the alloc/free functions are only set later in the z_stream object, it does not matter with zlib, but crashes zlib-ng

@KungFuJesus
Copy link
Contributor

This is a bug in the proprietary library, even with zlib. Check out this entry in the zlib manual
https://www.zlib.net/manual.html#Usage

The fact that it doesn't cause an issue with zlib is simply luck. If you used allocators that were picky and needed matched frees, you'd be in a similar boat. It just so happens we have a custom aligned free in the event that you setup a custom allocator so that we can guarantee alignment even when your allocator won't. But, we need to know at library initialization time to handle it.

@KungFuJesus
Copy link
Contributor

KungFuJesus commented Apr 9, 2024

Do you know what z_alloc and z_free are defined as in this code? If it happens to be the zlib wrappers for alloc and free and it's a superfluous assignment, we might be able to get around this by comparing the function pointers.

Ah, found the repo, they are functions that wrap glib allocators. Yeah, this is...annoying. I don't know how backward to bend, here.

@mtl1979
Copy link
Collaborator

mtl1979 commented Apr 9, 2024

There has been a lot of discussion about differences between zlib and zlib-ng... Maybe we need to make it (even) more clear in the porting guide that some "broken" use of zlib API will need further fixing to be compatible with improvements in zlib-ng...

What comes to modern Linux distributions replacing zlib with zlib-ng, as long as "broken" libraries have source code available, it's only matter of packaging the fixed binary and setting minimum required version in the package metadata (either rpm or deb for most of Linux distributions). This obviously applies mostly to zlib-ng built in compatibility mode, as otherwise the whole executable needs to be rebuilt to use zlib-ng instead of "stock" zlib.

jo-oe added a commit to jo-oe/mono that referenced this issue Apr 9, 2024
@KungFuJesus
Copy link
Contributor

@jo-oe thanks for notifying mono so they can fix it on their end. I'm not sure what to do about the downstream consumers that are very unlikely to repackage fixed binaries such as those leveraging Unity for games.

@jo-oe
Copy link
Author

jo-oe commented Apr 9, 2024

I only realised a bit later that it is apparently the bundled mono library that is actually the culprit. Simply replacing the bundled libMonoPosixHelper.so with a fixed version (see my pr in the mono repo above) works. I will also try to notify the game's distributor, but I doubt they will update ...

@jo-oe
Copy link
Author

jo-oe commented Apr 9, 2024

Thanks everyone for your help! I'm just some amateur programmer so I'm quite proud I could find the problem and (apparently ...) fix it :)

@mtl1979
Copy link
Collaborator

mtl1979 commented Apr 9, 2024

We're here to help people use zlib-ng... Not all issues are bugs in zlib-ng itself, but we still try to help figuring out the root cause.

@mtl1979
Copy link
Collaborator

mtl1979 commented Apr 9, 2024

@jo-oe thanks for notifying mono so they can fix it on their end. I'm not sure what to do about the downstream consumers that are very unlikely to repackage fixed binaries such as those leveraging Unity for games.

One possible "workaround" is to compare distance between the two pointers inside z_free_aligned and if the distance is less than sizeof(void *), or more than 127, then call free hook with the original pointer... It should also work if free symbol is replaced by another function.

@nmoinvaz nmoinvaz added the Compatibility API/ABI Compatibility issue label Apr 9, 2024
@musicinmybrain
Copy link

One possible "workaround" is to compare distance between the two pointers inside z_free_aligned and if the distance is less than sizeof(void *), or more than 127, then call free hook with the original pointer... It should also work if free symbol is replaced by another function.

It sounds like you’re considering subtracting function pointer values.

The C standard only allows pointer arithmetic (-, +) and relational comparisons (<, <=, >, >=) between data pointers that point to the same underlying array (or one past the end).

For function pointers or for possibly-unrelated data pointers, all you are permitted to do is compare them for equality (==, !=). That’s assuming the pointers are of compatible types.

Anything else is undefined behavior.

You can avoid the undefined behavior by converting to intptr_t or uintptr_t; now you just have implementation-defined behavior, as all that is guaranteed is that the integer values round-trip back to the pointer values they were converted from. Plus, these types aren’t guaranteed to exist (some weird platform might have really wide pointers, I guess). On common platforms, you do simply get addresses into a flat memory address space that compare nicely, but there is no general guarantee about what these integer values actually are or whether arithmetic on them is meaningful.

@musicinmybrain
Copy link

I just saw #1710. At a glance, it does appear to avoid undefined behavior (as long as the conversion to ptrdiff_t produces nonnegative values, which I don’t think is guaranteed in general). However, it does certainly make assumptions about implementation-defined behavior as described in #1708 (comment) – plus the assumption that ptrdiff_t is big enough to store an integer value corresponding to a pointer, which isn’t guaranteed.

Maybe the zlib-ng codebase is already full of this kind of assumption that works in practice on typical platforms, and one more instance doesn’t matter. I don’t know.

@mtl1979
Copy link
Collaborator

mtl1979 commented Apr 10, 2024

I just saw #1710. At a glance, it does appear to avoid undefined behavior (as long as the conversion to ptrdiff_t produces nonnegative values, which I don’t think is guaranteed in general). However, it does certainly make assumptions about implementation-defined behavior as described in #1708 (comment) – plus the assumption that ptrdiff_t is big enough to store an integer value corresponding to a pointer, which isn’t guaranteed.

Maybe the zlib-ng codebase is already full of this kind of assumption that works in practice on typical platforms, and one more instance doesn’t matter. I don’t know.

If the result of conversion would be negative, it would mean that zlib-ng is used from kernel space... You can't cross boundary of user and kernel space in same allocation. The reason it needs to use signed type is because the original pointer can't be on higher address than the adjusted pointer after aligning the start of allocation. As mentioned in the pull request, modern compilers can optimize the two tests to compare only higher-most bits and discard the lowest 7 bits.

@Dead2 Dead2 linked a pull request Apr 23, 2024 that will close this issue
@fontivan
Copy link

fontivan commented Apr 30, 2024

Valheim is also affected by this, and is also a Unity game. I'm not sure what Unity version they use, but it seems like this could affect all Unity games?

I did try testing PR #1713 and that resolved the issue for me.

@iii-i
Copy link
Member

iii-i commented May 2, 2024

Re: casting to uintptr_t, I had a case recently where it did not help for GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114404. Similar issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93105. So I would avoid going in this direction.

@phprus
Copy link
Contributor

phprus commented May 2, 2024

Does memcpy from pointer to uintptr_t also triggered this GCC error or not?

@iii-i
Copy link
Member

iii-i commented May 3, 2024

I haven't tried this, but I don't think it would help. IIUC the problem is that GCC does alias analysis on RTL level, where casts and memcpy()s may be optimized out, but C-like rules still apply, e.g., two symbols can't point to the same memory. I just checked a very simple example:

$ cat 1.c
extern char x[], y[];

void f(char *a, char *b) {
        char *px = x, *py = y;
        long lx, ly;
        __builtin_memcpy(&lx, &x, 8);
        __builtin_memcpy(&ly, &y, 8);
        *a = b[ly - lx];
}

$ gcc -O3 -c 1.c -fdump-rtl-all-all

and already in 1.c.255r.expand there are no traces of __builtin_memcpy().

The bug I linked is, of course, a very narrow problem related to global variables, but it shows how subtracting unrelated pointers can backfire, even if precautions are taken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compatibility API/ABI Compatibility issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants