Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARC64 crash with Ubuntu 22.04 and gcc 11.3.0 #1379

Open
mtl1979 opened this issue Dec 8, 2022 · 19 comments
Open

SPARC64 crash with Ubuntu 22.04 and gcc 11.3.0 #1379

mtl1979 opened this issue Dec 8, 2022 · 19 comments
Labels

Comments

@mtl1979
Copy link
Collaborator

mtl1979 commented Dec 8, 2022

All tests fail or crash in Ubuntu 22.04 when using gcc 11.3.0 and qemu.
When building with gcc 9.4.0, all tests pass.

@nmoinvaz
Copy link
Member

I confirmed the issue. It appears to crash even with a simple main() that return 0;. I upgraded my WSL to Ubuntu 22 and tried GCC 11.3.0 & 10.4.0 & 9.5.0 and they all seg fault.

@nmoinvaz
Copy link
Member

@mtl1979 do you want to close this issue?

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jan 12, 2023

@nmoinvaz Like with the little-endian 64-bit PowerPC, this might be useful as converted to a discussion, so we don't need to create another one when eventually qemu is fixed... I'm not sure yet if the underlying issue is exactly same block of code for both architectures or is there possibly just some overlap.

Like I said to @Dead2 elsewhere, downgrading to "ubuntu-20.04" runner should be a temporary solution -- to avoid delaying next stable version for too long -- until the real issue is fixed and new packages have been uploaded.

@KungFuJesus
Copy link
Contributor

If there is a particular issue you maybe want me to test or explore, I do have an UltraSPARC T4 with Solaris 11.2 (or maybe it was 3) that I can test this on. I'm fairly certain the T4 is EOL from Oracle at the moment but it's probably a close enough approximation on newer variants.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jan 13, 2023

@KungFuJesus If it's qemu bug or regression, testing on real hardware doesn't make sense. If it's gcc issue, we need to know what flag is missing or incorrect.

@KungFuJesus
Copy link
Contributor

FWIW gtests pass with flying colors on OpenIndiana on an ancient Sun Fire V240. Though, the symbol versioning doesn't quite seem to be supported with those arguments to the linker, even with GNU's ld. We should probably look into that.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented May 7, 2023

FWIW gtests pass with flying colors on OpenIndiana on an ancient Sun Fire V240. Though, the symbol versioning doesn't quite seem to be supported with those arguments to the linker, even with GNU's ld. We should probably look into that.

You might want to create another issue about the symbol versioning issue with all the relevant logs.

@KungFuJesus
Copy link
Contributor

KungFuJesus commented Jul 4, 2023

So a somewhat interesting revelation, tests are passing but when I build with -mcpu=native, I do get segfaults. But to my surprise, the segfaults are in the benchmark library with some C++ string allocations:

Loading modules: [ libc.so.1 ld.so.1 ]
> ::stack
libc.so.1`realfree+0x38(100287890, 0, 0, ffffffff7fffea68, 1, 0)
libc.so.1`cleanfree+0x5c(0, 0, 10028ad90, 10028adb0, 0, c00)
libc.so.1`_malloc_unlocked+0x80(1002874d0, 10028ad50, 10028ad70, 10028ad90, 10028adb0, 0)
libc.so.1`malloc+0x3c(1f, 10028ad30, 10028ad50, 10028ad70, 10028ad90, 0)
libstdc++.so.6.0.29`_Znwm+0x18(1f, ffffffff7fffd638, 1e, 10028ad50, 10028ad70, 1f)
libstdc++.so.6.0.29`_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7reserveEm+0x30(ffffffff7fffd858, 1b, 10028ad10, 10028ad30, 10028ad50, ffffffff7fffd868)
_ZN9benchmark12_GLOBAL__N_14joinIJNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_S7_S7_S7_S7_S7_S7_EEES7_cDpRKT_+0x6c(ffffffff7fffd858, 2f, 10028acd0, 10028acf0, 10028ad10, 10028ad30)
_ZNK9benchmark13BenchmarkName3strB5cxx11Ev+0x74(ffffffff7fffd858, 10028acd0, 10028ad30, 10028ad10, 10028acf0, 10028acd0)
_ZN9benchmark8internal15BenchmarkRunner13DoNIterationsEv+0xc0(ffffffff7fffdc70, 100286820, 0, ffffffff7fffea68, 1, 10026ac10)
_ZN9benchmark8internal15BenchmarkRunner15DoOneRepetitionEv+0xd0(100286820, 12, ffffffff7fffea68, 3ce7f2, 0, 100287440)
_ZN9benchmark8internal12_GLOBAL__N_113RunBenchmarksERKSt6vectorINS0_17BenchmarkInstanceESaIS3_EEPNS_17BenchmarkReporterES9_+0xa08(ffffffff7ffff6e0, 100285410, 1000, 1000, 100285410, 100131290)
_ZN9benchmark22RunSpecifiedBenchmarksEPNS_17BenchmarkReporterES1_NSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x528(0, 0, ffffffff7ffffa40, 0, 0, 0)
_ZN9benchmark22RunSpecifiedBenchmarksEv+0x34(ffffffff7ffffb90, ffffffff7ffffc78, 1000c281c, 1000ba418, ffffffff6f745f28, ffffffff6f7460f8)
main+0x3c(0, ffffffff7ffffc78, ffffffff7ffffc88, ffffffff6f745660, 18, 145998)
_start_crt+0x7c(1, ffffffff7ffffc78, ffffffff6f61d960, 0, 0, 0)
_start+0x14(0, 0, 0, 0, 0, 0)

I should also mention this is on a illumos-derived openindiana with a barely supported configuration at the moment, not solaris proper. I do wonder what we'd see on Solaris 11 on the T4. It almost looks like a glibc C++ bug.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jul 4, 2023

@KungFuJesus I would assume heap corruption... I've seen that happen when unknown or bad instruction doesn't trap and instead gets decoded as unrelated instruction. If I'm correct, the real issue happens earlier than the function on the stack trace just below libstdc++.so.6.0.29.

@KungFuJesus
Copy link
Contributor

And of course using umem as the allocator to find an issue it fails to cause any at all. Could be an issue in glibc's allocator while on SPARC?

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jul 4, 2023

@KungFuJesus I would assume either gcc generated bad instruction or glibc was built targeting for too new processor and it doesn't detect that current processor doesn't support certain instruction it assumes is available. Might be possible to force gcc to target older processor to see which ones still run.

@KungFuJesus
Copy link
Contributor

This is running on a sun4v, I doubt we're seeing an illegal instruction. The sparcv9 abi hasn't moved a ton, and it's certainly not generating VIS (which is what I set out to make this thing do, initially). I'm seeing some other evidence it could be an issue somewhere at the allocator but I have no smoking gun. I've emailed the distribution maintainer with a link to this thread, hopefully he can shed some light.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jul 4, 2023

I'm seeing heap corruption on PPC64LE too, so it might be just buffer overrun or something similar. I tried switching to clang, but it still uses libraries from gcc 11 by default, unless I force it to use LLVM libc instead.

@thesamesam
Copy link

LLVM libc is barely a thing yet, you probably mean libc++ which is an implementation of the C++ standard library. But there's a few other libraries Clang will try to use from GCC, like the runtime lib & unwinding.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jul 4, 2023

@thesamesam I noticed when I tried installing clang on clean system, it just wouldn't work... I had to install quite a few packages from gcc to get it behave.

https://github.com/zlib-ng/zlib-ng/blob/develop/.github/workflows/cmake.yml#L256

@klausz65
Copy link

klausz65 commented Jul 5, 2023

For what it's worth. OpenIndiana on SPARC is still build using gcc-4.4.4:
CFLAGS: -mcpu=ultrasparc -mvis and the long list from Makefile.master for both 32 and 64bit code.
ASFLAGS (Note: at present this must be SunAS Assembler configured with gcc)
32Bit: -xarch=v8plusa -xarch=sparcvis
64Bit: -xarch=v9 -xarch=sparcvis
All the oi-userland stuff is compiled using gcc-11.3.0 with GAS: 2.39 or 2.40
CFLAGS: -O3 -mcpu=ultrasparc -mvis -mfsmuld and just recently again with: -mno-app-regs to be on a saver side,
I didn't really noticed any performance degradation by not using this option.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jul 5, 2023

@klausz65 We would need a "CI run" to test with what CFLAGS and CXXFLAGS allows Ubuntu 22.04 to compile usable binaries on 64-bit SPARC... Then we can rule if the issue is in gcc or qemu's SPARC64 emulation. As it works on Ubuntu 20.04, and there is known issues with qemu versions at least from 6.x series upward (4.x series is known to work), we already have some information to narrow the research.

Personally I've worked with some compiler bugs, but I'm not familiar enough with the compiler options for SPARC64 and I don't have SPARC64 hardware to test, so I can't help much further...

@glaubitz
Copy link

glaubitz commented Jul 5, 2023 via email

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jul 5, 2023

@glaubitz Compile farms are nice, but usually it's hard to have specific package versions on them. Like I already said, we need specific versions of both gcc and qemu. This allows us to "emulate" older hardware than the compile farm possibly actually use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants