Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to profile TiKV in aarch64 #10658

Closed
youjiali1995 opened this issue Aug 3, 2021 · 8 comments · Fixed by #12480
Closed

Fail to profile TiKV in aarch64 #10658

youjiali1995 opened this issue Aug 3, 2021 · 8 comments · Fixed by #12480
Assignees
Labels
severity/minor sig/diagnosis SIG: Diagnosis type/bug Type: Issue - Confirmed a bug

Comments

@youjiali1995
Copy link
Contributor

youjiali1995 commented Aug 3, 2021

Bug Report

What version of TiKV are you using?

TiKV v5.1.0
tiup v1.5.2
libgcc_s-7.3.0-20190804.so.1

What operating system and CPU are you using?

Linux localhost.localdomain 4.19.90-23.8.v2101.ky10.aarch64 #1 SMP Mon May 17 17:07:38 CST 2021 aarch64 aarch64 aarch64 GNU/Linux
processor       : 63
model name      : HUAWEI,Kunpeng 920
BogoMIPS        : 200.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
CPU implementer : 0x48
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0xd01
CPU revision    : 0

Steps to reproduce

Run a TiKV and use the http API to profile it.

What did you expect?

Succeed in profiling.

What did happened?

TiKV cores dump.

#0  0x0000fffd7b6aceb4 in ?? () from /lib64/libgcc_s.so.1
#1  0x0000fffd7b6ae534 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#2  0x0000aaac01eedb58 in backtrace::backtrace::libunwind::trace (cb=...) at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/libunwind.rs:88
#3  backtrace::backtrace::trace_unsynchronized (cb=...) at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/mod.rs:66
#4  pprof::profiler::perf_signal_handler (_signal=<optimized out>) at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/pprof-0.4.2/src/profiler.rs:128
#5  <signal handler called>
@youjiali1995
Copy link
Contributor Author

cc @YangKeao

@youjiali1995 youjiali1995 added the type/bug Type: Issue - Confirmed a bug label Aug 3, 2021
@github-actions github-actions bot added this to Need Triage in Question and Bug Reports Aug 3, 2021
@Lily2025
Copy link

Lily2025 commented Aug 3, 2021

/severity major

@sticnarf
Copy link
Contributor

sticnarf commented Aug 5, 2021

I can reproduce it on my own ARM server:

Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `bin/tikv-server --addr 0.0.0.0:20160 --advertise-addr 172.31.59.15:20160 --stat'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000ffffb6315670 in aarch64_fallback_frame_state (context=0xffffab18a090, context=0xffffab18a090, fs=0xffffab18a450) at ./md-unwind-support.h:74
74        if (pc[0] != MOVZ_X8_8B || pc[1] != SVC_0)
[Current thread is 1 (Thread 0xffffab18f670 (LWP 2081))]
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /data/deploy/tikv-20160/bin/tikv-server.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) bt
#0  0x0000ffffb6315670 in aarch64_fallback_frame_state (context=0xffffab18a090, context=0xffffab18a090, fs=0xffffab18a450) at ./md-unwind-support.h:74
#1  uw_frame_state_for (context=context@entry=0xffffab18a090, fs=fs@entry=0xffffab18a450) at ../../../libgcc/unwind-dw2.c:1257
#2  0x0000ffffb6316d48 in _Unwind_Backtrace (trace=0xaaaae073b594<error reading variable>, trace_argument=0xffffab18b770) at ../../../libgcc/unwind.inc:290
#3  0x0000aaaae0a98e4c in backtrace::backtrace::libunwind::trace (cb=...) at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/libunwind.rs:88
#4  backtrace::backtrace::trace_unsynchronized (cb=...) at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/mod.rs:66
#5  pprof::profiler::perf_signal_handler (_signal=<optimized out>) at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/pprof-0.4.2/src/profiler.rs:128
#6  <signal handler called>
Dwarf Error: Cannot find DIE at 0xae370 referenced from DIE at 0x15942c [in module /data/deploy/tikv-20160/bin/tikv-server]
(gdb) p pc
$1 = (unsigned int *) 0x97700
(gdb) p pc[0]
Cannot access memory at address 0x97700

Hopefully this provides some more debug information.

It seems to go into the no dwarf info routine at:https://github.com/gcc-mirror/gcc/blob/releases/gcc-7.3.0/libgcc/unwind-dw2.c#L1257, and the return address is not correct:https://github.com/gcc-mirror/gcc/blob/releases/gcc-7.3.0/libgcc/config/aarch64/linux-unwind.h#L63

This is quite like #9765, but happens on aarch64.
The root cause should be different because stack probing is not implemented for aarch64.

@YangKeao
Copy link
Member

YangKeao commented Aug 9, 2021

I guass the problem is that backtrace-rs cannot handle an ARM signal frame. Handling a signal frame on ARM needs some special way (e.g. in rustc ).

@zhangjinpeng87 zhangjinpeng87 added the sig/diagnosis SIG: Diagnosis label Nov 23, 2021
@tonyxuqqi
Copy link
Contributor

/assign tonyxuqqi

@YangKeao
Copy link
Member

YangKeao commented Jan 4, 2022

@tonyxuqqi The detailed mechanism of why libgcc cannot handle the arm signal frame (occasionally) is still unclear. Some previous dig inside the libgcc has been recorded here: tikv/pprof-rs#75 .

@breezewish
Copy link
Member

@mornyx is also working on this.

@mornyx
Copy link
Contributor

mornyx commented Jan 28, 2022

/assign

mornyx added a commit to mornyx/tikv that referenced this issue May 10, 2022
Close tikv#10658

Signed-off-by: mornyx <mornyx.z@gmail.com>
Question and Bug Reports automation moved this from Need Triage to Closed(This Week) May 19, 2022
ti-chi-bot added a commit that referenced this issue May 19, 2022
ref #9957, close #10658, ref #10658, ref #11964

Signed-off-by: mornyx <mornyx.z@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
fengou1 pushed a commit to fengou1/tikv that referenced this issue May 26, 2022
ref tikv#9957, close tikv#10658, ref tikv#10658, ref tikv#11964

Signed-off-by: mornyx <mornyx.z@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/minor sig/diagnosis SIG: Diagnosis type/bug Type: Issue - Confirmed a bug
Projects
Question and Bug Reports
  
Closed(This Week)
Development

Successfully merging a pull request may close this issue.

10 participants