New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic on aarch64 #75
Comments
Here is a full backtrace from a libgcc compiled with
The signal frame has been resolved successfully. But the following codes get wrong /* A signal frame will have a return address pointing to
__default_sa_restorer. This code is hardwired as:
0xd2801168 movz x8, #0x8b
0xd4000001 svc 0x0
*/
if (pc[0] != MOVZ_X8_8B || pc[1] != SVC_0)
{
return _URC_END_OF_STACK;
} In this stack frame, the |
The corrupted frame located at
Then we should calculate |
Here is the last several
Related dwarf information:
|
Thanks for @SchrodingerZhu . I have tried to use nongnu libunwind and the The modification on |
Great progress! I will also continue to help test |
Hi @YangKeao, what is your test env? Have you tested it on darwin+aarch64 (m1)? If not, I will make up for this part. |
glad to see this |
No. I only test it on raspberry pi. |
nongnu-libunwind seems to be a good solution then. it will also not break TiFlash (where rust and c++ transits), while llvm-libunwind has problem handling that solution. cc @YangKeao |
Cool. Then I could move on fixing this issue in these steps:
BTW, some magic (like rewriting the symbol name with What's your opinion on linking nongu-libunwind with |
Why Rust choose llvm libuwind in the first place? |
Default Rust unwind implementation is |
Is it possible to patch the nongnu libunwind code? For example, to avoid overriding |
I could try. |
@sticnarf @mornyx Building The libunwind-sys needs to be modified to support static link. I will try to modify it and submit a PR. |
|
Additional info: I previously wrote a minimal Rust demo to test |
The debug vs release mode difference may be related to inlining.. |
libunwind-rs inits its But from the nongnu libunwind doc, we need to use |
I use Here is my demo: use nix::libc;
use nix::sys::signal;
use rand::Rng;
use parking_lot::RwLock;
use smallvec::SmallVec;
// same as pprof-rs/src/timer.rs
mod timer;
const MAX_DEPTH: usize = 32;
lazy_static::lazy_static! {
static ref PROFILER: RwLock<()> = RwLock::new(());
}
fn main() {
// Register signal handler.
let h = signal::SigHandler::SigAction(perf_signal_handler);
let a = signal::SigAction::new(h, signal::SaFlags::SA_SIGINFO, signal::SigSet::empty());
unsafe {
signal::sigaction(signal::SIGPROF, &a).unwrap();
}
// Register SIGPROF.
let _t = timer::Timer::new(99);
// Run some workloads.
loop {
let mut rng = rand::thread_rng();
let mut vec: Vec<i32> = vec![];
for _ in 0..1000000 {
vec.push(rng.gen())
}
vec.sort();
}
}
#[no_mangle]
pub extern "C" fn perf_signal_handler(_: libc::c_int, _: *mut libc::siginfo_t, _: *mut libc::c_void) {
if let Some(mut guard) = PROFILER.try_write() {
let mut bt: SmallVec<[_; MAX_DEPTH]> = SmallVec::with_capacity(MAX_DEPTH);
let mut index = 0;
unsafe {
// in Cargo.toml:
// backtrace = { git = "https://github.com/YangKeao/backtrace-rs.git", features = ["nongnu-unwind"], branch = "master" }
backtrace::trace_unsynchronized_external_api(|frame| {
if index < MAX_DEPTH {
bt.push(frame.clone());
index += 1;
true
} else {
false
}
}, true); // signal_frame = true
}
}
} Crash stack:
|
@sticnarf Can you help me to check whether the released pprof-rs will cause the bug in macOS? It seems that the macos has its own unwind library, which "may" be better than @siddontang told me he cannot reproduce this issue (with the examples in |
Actually tikv/tikv#9957 (comment) still segfaults on latest macOS :(
Maybe he was using macOS 10.x. I remember the issue appears after macOS 11 |
This problem can be reproduced by executing the demo I posted above on my m1 mac.
|
On ARM+macOS, we can be confident that the frame-pointer exists and we can take advantage of this. And because our stacktrace scene only exists in signal handler, we can use the ucontext provided by kernel to skip the signal frame. Then I introduced this implementation in |
In fact I also tried parsing |
on macos, I think libunwind is referring to llvm's libunwind. As I early mentioned in the internal chat, at least on linux, llvm's libunwind would parse rust's debug CFI wrongly. This actually forbids the tiflash team from using the lib when migrating to LLVM toolchain.
Sent from ProtonMail mobile
\-------- Original Message --------
On Feb 25, 2022, 1:28 PM, Yilin Chen < ***@***.***> wrote:
> ***@***.***[sticnarf] Can you help me to check whether the released pprof-rs will cause the bug in macOS? It seems that the macos has its own unwind library, which "may" be better than `libgcc` . If the released pprof-rs and tikv-server can work properly on M1 Mac, then we only need to handle this issue on linux.
Actually [tikv/tikv\#9957 (comment)][tikv_tikv_9957 _comment] still segfaults on latest macOS :(
—
Reply to this email directly, [view it on GitHub][], or [unsubscribe][].
You are receiving this because you were mentioned.![AEZNMJJG54C5LAG2KMJLS3DU44HQDA5CNFSM5BOJFMC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOH2O63MQ.gif][]Message ID: ***@***.***>
[sticnarf]: https://github.com/sticnarf
[tikv_tikv_9957 _comment]: tikv/tikv#9957 (comment)
[view it on GitHub]: #75 (comment)
[unsubscribe]: https://github.com/notifications/unsubscribe-auth/AEZNMJLCOGRQPU6AIGFUXQ3U44HQDANCNFSM5BOJFMCQ
[AEZNMJJG54C5LAG2KMJLS3DU44HQDA5CNFSM5BOJFMC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOH2O63MQ.gif]: https://github.com/notifications/beacon/AEZNMJJG54C5LAG2KMJLS3DU44HQDA5CNFSM5BOJFMC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOH2O63MQ.gif
|
From the graph you posted, the result looks distorted. For example, I never see I guess many frame pointers are not actually emitted or they are not properly parsed. |
I suppose frame pointers cannot be used for inlined functions while dwarf section can still be used locate inline functions. |
I tested it on newer versions of macOS, and in fact the binary only contains the The demo below confirms this point: use unwind::{init_unwind_context, UnwindContext};
fn main() {
func1_inlined();
}
#[inline(always)]
fn func1_inlined() {
func2();
}
fn func2() {
unsafe {
let mut context = UnwindContext::default();
init_unwind_context(&mut context as _);
show(context.pc);
// jump to next frame
context.pc = *std::mem::transmute::<_, *const u64>(context.fp + 8);
context.fp = *std::mem::transmute::<_, *const u64>(context.fp);
show(context.pc);
}
}
unsafe fn show(pc: u64) {
println!("{:#x}", pc);
backtrace::resolve(std::mem::transmute(pc), |s| {
println!("{:?}", s.name());
});
} output:
Looking at |
I am not familiar with the hotspot distribution of TiKV before, what about this: I limited the sample pointer address to the address space of the tikv executable itself, without any dynamic link library. I guess the sample set from the previous image is the "real" one. |
Maybe you can perform a standard workload (like sysbench or tpcc) for a TiKV on Linux. Then you can compare with the output on MacOS. |
Update from TiFlash: after a very long run of continuous profiling, TiFlash's profiling also crashed. Notice that this is on normal x86_64 platform. The scenario of unwinding across languages is a little bit complicated, but it shows that currently |
After observing, I found that if we trace stack directly from the signal handler, there will always be strange problems. This is because there is always an opaque signal frame placed by the kernel between the signal handler's stack frame and the previous user mode stack frame. My solution is to use the Also, for any non-legal range address taken from I've added Linux + x86_64 support to my unwind implementation, and I'll make a PR to |
@mornyx Sorry, I don't understand how you implement it. It sounds really magic. I don't think it's possible to do offline dwarf unwind without saving the stack (e.g. the As I know, the dwarf can only help you to get the address of the callee-saved register on the stack, while running, the stack changed, and I don't know how you can find the remains of previous running. Oops! I understand! You were using the |
Right~ This also seems to be possible by modifying But this only solved half the problem for us (as you said, and also doesn't support macOS). So I reimplemented a minimal subset of libunwind, to try to avoid the problems you said. |
Of course, it also needs to be fully tested to verify that it works correctly... |
Don't worry. We have testing environments to run different complicated user payloads. It can be pretty helpful to discover problems and make it close to "battle-tested". |
The
It's always glad to see another implementation of |
I think we can mimic what is going on here but use ucontext to init the UnwindContext. |
There is some confusion in various comments on this issue so I'll try to clear up some of it. The difference between stack walking and symbolication, and how it relates to inline functions
The resolution of inline functions happens during symbolication: A single address can resolve to one or more functions. If the address is inside code which the compiler inlined into another function, then you get both the inlined function name (or even multiple inline function names, if the compiler inlined multiple levels deep), and then the outer function name. If you want to do stack walking "offline", you need to capture the entire stack bytes. A slightly confusing part here is that both stack walking and symbolication can make use of DWARF information. However, they're different subsets of DWARF information. DWARF stack walking information is stored in the Unwinding / stack walking on macOSOn macOS x86_64 and arm64, all system libraries are compiled with frame pointers enabled. And having framepointers enabled is also the default for clang and Xcode, unless you manually set -fomit-frame-pointer. So frame pointer stack walking mostly works fine, unless you're profiling a program that has been compiled with -fomit-frame-pointer. There is one exception: On arm64, leaf functions don't have frame pointers even if you compile with frame pointers enabled. This means that on arm64, if you just use frame pointer unwinding, you will be missing the second frame in the stack if you're currently inside the leaf function: The first frame will be correct (it comes straight from the instruction pointer), the immediate caller is missing, and the frame pointer gives you the caller's caller, i.e. the third frame. From that point on the rest of stack unwinding works fine. To unwind leaf functions correctly, you need to look at the compact unwind info in the Unwind information sectionsOn macOS, most binaries will have an Here's how it breaks down. On x86_64:
On arm64:
I've written a crate called |
@mstange Thanks for the clarification 🍻 . As I know, @mornyx is also creating a new dwarf unwind library. I also tried to unwind through frame pointer (for Linux) in #116 . I still have some confusion:
For non-leaf frame pointer, will the frame pointer register ( I will try https://github.com/mstange/framehop/ these days 😄 , as #116 has already gave us a chance to provide more options of unwinding method for the user. |
Just to clarify: I am talking about the subset of functions which, on arm64, will not create a "frame record" for themselves even if you compile with frame pointers enabled. This happens for functions which do not call other functions (i.e. which are "leaf" functions) and which also don't need to save and restore any registers. These functions leave x29 ("fp") unchanged. And because they don't call any other functions, the lr register also stays unchanged. So unwinding from these functions only means "get the return address from lr and leave all other registers unchanged".
If it did that then yes, frame pointer unwinding would not work at all and there would be no way to get the rest of the stack. You would need to use some kind of unwind information to recover a usable frame pointer value. But luckily these functions leave the frame pointer from the parent frame intact.
Yes, in code compiled with -fomit-frame-pointer, the easy solution fails. Luckily macOS has established a culture of always enabling frame pointers.
Great, please file issues if you run into any trouble. Framehop only solves a subset of the problem; it's still up to the user to find where libraries are mapped in memory, to get their unwind section data, and to read the stack memory in a way that doesn't cause segfaults. Framehop is mostly about speed, caching, handling multiple types of unwind data, and supporting the offline use case on different machines and architectures. |
I did a test and it does exactly what @mstange says, here is a simple demo: int func1() {
int x = 1;
int y = 2;
return x + y;
}
int func2() {
return func1() + 1;
}
int func3() {
return func2() + 1;
} On my macOS, the ARM64 asm code generated by clang is as follows: .section __TEXT,__text,regular,pure_instructions
.build_version macos, 12, 0 sdk_version 12, 3
.globl _func1 ; -- Begin function func1
.p2align 2
_func1: ; @func1
.cfi_startproc
; %bb.0:
sub sp, sp, #16
.cfi_def_cfa_offset 16
mov w8, #1
str w8, [sp, #12]
mov w8, #2
str w8, [sp, #8]
ldr w8, [sp, #12]
ldr w9, [sp, #8]
add w0, w8, w9
add sp, sp, #16
ret
.cfi_endproc
; -- End function
.globl _func2 ; -- Begin function func2
.p2align 2
_func2: ; @func2
.cfi_startproc
; %bb.0:
stp x29, x30, [sp, #-16]! ; 16-byte Folded Spill
mov x29, sp
.cfi_def_cfa w29, 16
.cfi_offset w30, -8
.cfi_offset w29, -16
bl _func1
add w0, w0, #1
ldp x29, x30, [sp], #16 ; 16-byte Folded Reload
ret
.cfi_endproc
; -- End function
.globl _func3 ; -- Begin function func3
.p2align 2
_func3: ; @func3
.cfi_startproc
; %bb.0:
stp x29, x30, [sp, #-16]! ; 16-byte Folded Spill
mov x29, sp
.cfi_def_cfa w29, 16
.cfi_offset w30, -8
.cfi_offset w29, -16
bl _func2
add w0, w0, #1
ldp x29, x30, [sp], #16 ; 16-byte Folded Reload
ret
.cfi_endproc
; -- End function
.subsections_via_symbols Assuming we are executing But in a regular backtracking scenario, this problem can be easily avoided. We usually wrap the stack backtrace as a function like The only thing that needs to be done is to ensure that the The rust demo below proves this conclusion: use unwind::{unwind_init_registers, Registers};
#[inline(never)]
fn main() {
func1();
}
#[inline(never)]
fn func1() -> i32 {
func2() + 1
}
#[inline(never)]
fn func2() -> i32 {
func3() + 1
}
#[inline(never)]
fn func3() -> i32 {
let x = 1;
let y = 2;
backtrace();
x + y
}
#[inline(never)]
fn backtrace() {
let mut registers = Registers::default();
unsafe {
// Similar to `unw_getcontext()`.
unwind_init_registers(&mut registers as _);
}
// Do stack backtrace.
while registers[29] != 0 {
let pc = load::<u64>(registers[29] + 8); // x29 + 8 points to `Return Address`
registers[29] = load::<u64>(registers[29]);
// Show function name.
println!("{:#x}", pc);
backtrace::resolve(pc as _, |s| {
println!(" {:?}", s.name());
});
}
}
#[inline]
fn load<T: Copy>(address: u64) -> T {
unsafe { *(address as *const T) }
} The output is (on my ARM64 macOS):
This is exactly what we expected. However, when the scene comes to CPU Profiling, if the leaf function is interrupted by the |
tikv/tikv#10658
TiKV on HUAWEI,Kunpeng 920 failed to profile and got an panic.
The text was updated successfully, but these errors were encountered: