Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stack quality: investigate user stack unwinding issues #439

Closed
javierhonduco opened this issue Jun 8, 2022 · 7 comments
Closed

stack quality: investigate user stack unwinding issues #439

javierhonduco opened this issue Jun 8, 2022 · 7 comments
Labels
area/stack-unwinding Something on stack unwinding feature/profiler This feature describes support for a new type of profiler. P0 The most important and urgent

Comments

@javierhonduco
Copy link
Contributor

(Opening in parca-agent but this issue might be on parca)

I noticed that some Kernel stacks aren't correct with the current master of both parca and parca-agent. The test program and correct result that we used to get can be found in this PR (#395).

Note the Kernel stack for the open() codepath which is not in the right merged stacktrace:

image

I am currently bisecting both projects, will report back once I have more information

@javierhonduco javierhonduco changed the title stack quality: stack quality: some Kernel stacks aren't merged properly Jun 8, 2022
@javierhonduco
Copy link
Contributor Author

javierhonduco commented Jun 8, 2022

Bisecting parca and parca-agent did not help, even when running the well-known versions in the PR mentioned above.

As the results did not make sense, I manually bisected this once again but ran make clean in between attempts, in case there was some code in parca that was not being re-compiled, which could break symbolication.

Identical configuration across machines:

  • Kernel 5.17.11-300.fc36
  • Repo versions for both parca and parca-agent (whist bisecting)
  • Kernel configuration seems the same for all the variables that we care about (perf subsystem, BPF, etc)
  • The test program in CPP is the same in both machines, compiled with the same flags and compiler version

I couldn't spot any meaningful differences or bugs in our code, so I ran perf to see whether it could produce correct stack traces, and...

image

It can't! This is odd!

Thinking it could be skid somehow being the problem, I tried with SAMPLE_IP=3 to have 0 skid with perf record -a -e cpu-cycles:ppp, as well as other events, such as instructions, but they all produce the flamegraph above.

@javierhonduco
Copy link
Contributor Author

javierhonduco commented Jun 8, 2022

Unwinding the stack with DWARF instead of frame pointers (perf's default) works fine.

As @brancz pointed out, there's probably something off with the frame pointers in some of libc's frames that preventing the right stack walking.

$ sudo perf record -g --call-graph=dwarf -p `pidof parca-demo`

image

Something to check out is what's the difference between hosts then, is it that some libc functions in one of the hosts are inlined or don't have the correct function preamble, or something else?

@javierhonduco
Copy link
Contributor Author

Oddly enough, even though we continue following the stack when walking with FPs from __close, it doesn't seem to set up a frame:

Breakpoint 1, 0x00007fc3b8f286b0 in close () from target:/lib/x86_64-linux-gnu/libc.so.6
(gdb) disassemble 0x00007fc3b8f286b0
Dump of assembler code for function close:
=> 0x00007fc3b8f286b0 <+0>:	mov    %fs:0x18,%eax
   0x00007fc3b8f286b8 <+8>:	test   %eax,%eax
   0x00007fc3b8f286ba <+10>:	jne    0x7fc3b8f286d0 <close+32>
   0x00007fc3b8f286bc <+12>:	mov    $0x3,%eax
   0x00007fc3b8f286c1 <+17>:	syscall

How does it work? What am I missing? 🤔

@brancz
Copy link
Member

brancz commented Jun 8, 2022

When using dwarf to unwind it doesn't use frame pointers at all, it uses unwind tables (.eh_frame etc.) is my understanding.

@javierhonduco
Copy link
Contributor Author

@brancz sorry if my last comment was confusing. The thing that I find odd is how perf record -g --call-graph=fp (so, no dwarf unwinding) is able to Unwind the close() stack, and generate the correct profile for it, even though there's no setting up of the stack frame.

The open() stack is broken and this makes sense, as it's also missing the set up of the stack frame

@brancz
Copy link
Member

brancz commented Jun 8, 2022

Gotcha.

@kakkoyun kakkoyun added feature/profiler This feature describes support for a new type of profiler. P0 The most important and urgent area/stack-unwinding Something on stack unwinding labels Jun 20, 2022
@javierhonduco javierhonduco changed the title stack quality: some Kernel stacks aren't merged properly stack quality: investigate user stack unwinding issues Jun 23, 2022
@javierhonduco
Copy link
Contributor Author

TL;DR: The issue, as we suspected, lies in the lack of frame pointers in libc, which results in broken stack traces. The reason why close(2) looked correct (it isn't, more details below) it's due to pure chance as the compiler did not generate code that uses the rbp register, so the previous value was never overridden.

Context

There are a couple of things that @Sylfrena, @kakkoyun and I wanted to understand:

  1. Is the frame pointer set up correctly in every function?
  2. Why close(2) (__close) stacks look correct while open(2) (open64) does not? As they are both from libc, they should have the same behaviour, right?

Frame pointer setup

All the functions in our program do set up the frame pointer correctly:

(gdb) disassemble main
Dump of assembler code for function main():
   0x0000000000401300 <+0>:     push   %rbp
   0x0000000000401301 <+1>:     mov    %rsp,%rbp
[...]

(gdb) disassemble a1
Dump of assembler code for function _Z2a1v:
   0x0000000000401280 <+0>:     push   %rbp
   0x0000000000401281 <+1>:     mov    %rsp,%rbp
[...]

(gdb) disassemble a2
Dump of assembler code for function _Z2a2v:
   0x00000000004012b0 <+0>:     push   %rbp
   0x00000000004012b1 <+1>:     mov    %rsp,%rbp
[...]

(gdb) disassemble a2
Dump of assembler code for function _Z2a2v:
   0x00000000004012b0 <+0>:     push   %rbp
   0x00000000004012b1 <+1>:     mov    %rsp,%rbp
[...]

(gdb) disassemble top
Dump of assembler code for function _Z3topv:
   0x0000000000401220 <+0>:     push   %rbp
   0x0000000000401221 <+1>:     mov    %rsp,%rbp
[...]

However, libc functions lack the frame pointer:

(gdb) disassemble __close
Dump of assembler code for function close:
   0x00007fc2b9c356b0 <+0>:     mov    %fs:0x18,%eax
   0x00007fc2b9c356b8 <+8>:     test   %eax,%eax
   0x00007fc2b9c356ba <+10>:    jne    0x7fc2b9c356d0 <close+32>
   0x00007fc2b9c356bc <+12>:    mov    $0x3,%eax
   0x00007fc2b9c356c1 <+17>:    syscall 
[...]

(gdb) disassemble open64
Dump of assembler code for function open64:
   0x00007fc2b9c34b90 <+0>:     push   %r12
   0x00007fc2b9c34b92 <+2>:     mov    %esi,%r10d
   0x00007fc2b9c34b95 <+5>:     mov    %esi,%r12d
   0x00007fc2b9c34b98 <+8>:     push   %rbp
   0x00007fc2b9c34b99 <+9>:     mov    %rdi,%rbp
   0x00007fc2b9c34b9c <+12>:    sub    $0x68,%rsp
[...]

So how come close(2) managed to continue walking the stack??

Unwinding the stack by hand (using frame pointers)

Let's see the return address of the caller of c1 and see which function it is:

(gdb) b c1
Breakpoint 1 at 0x401264: file main.cpp, line 14.
(gdb) c
Continuing.

Breakpoint 1, c1 () at main.cpp:14
14      in main.cpp
(gdb) info symbol *(uint64_t*)($rbp + sizeof(uint64_t))
b1() + 9 in section .text of target:/app/parca-demo
(gdb) 

It makes sense that we see b1, and not c1, because we haven't executed the stack preamble that sets the frame pointer, yet! Note that we are reading $rbp + $word_size because we want to just fetch the return address of the call-site. If we wanted to continue walking the stack, we would have to dereference $rbp and continue the process until we hit the bottom frame.

If we execute a few more instructions, then we see c1:

(gdb) info symbol *(uint64_t*)($rbp + sizeof(uint64_t))
c1() + 9 in section .text of target:/app/parca-demo

So far, so good, as we expected, everything works fine in the functions from our binary!

Let's take a look at the libc functions:

Libc functions

(gdb) b __close
Breakpoint 1 at 0x7f90433816b0
(gdb) c
Continuing.
Breakpoint 1, 0x00007f90433816b0 in close () from target:/lib/x86_64-linux-gnu/libc.so.6
(gdb) info symbol *(uint64_t*)($rbp + sizeof(uint64_t))
c1() + 9 in section .text of target:/app/parca-demo

Here we are seeing c1, which is the same frame we saw in the profiles we recorded with parca-agent and perf. Unfortunately this not correct, and I originally missed this, but the top frame should be the one showing up here instead!!

So back to the original question, why did close(2) "skipped a frame" and why was open(2) unable to continue walking the user stack at all?? We already know the answer to the more general question of, are these profiles correct, and if not, why? The answer is that the profiles when walking with frame pointers are not correct and that this is due to the lack of frame pointers in libc.

Why was __close able to generate a broken stack that continued walking the user stack?

When compiling without frame pointers, the compiler can use rbpas a general-purpose register, but if it happens not to use it in the leaf function, it will look like a frame was skipped (as we saw in the example above where we set up a breakpoint in a function with frame pointers, just before the frame was set).

(gdb) pipe disassemble __close | grep rbp
(gdb) 
(gdb) pipe disassemble open64 | grep rbp
   0x00007f9043380b98 <+8>:     push   %rbp
   0x00007f9043380b99 <+9>:     mov    %rdi,%rbp
   0x00007f9043380bd8 <+72>:    mov    %rbp,%rsi
   0x00007f9043380c0b <+123>:   pop    %rbp
   0x00007f9043380c52 <+194>:   mov    %rbp,%rsi
(gdb) 

As we can see in the disassembly, this is exactly what's happening here!! The compiler didn't emit any instruction that writes over the previously written value in rbp.

@parca-dev parca-dev locked and limited conversation to collaborators Jun 29, 2022
@kakkoyun kakkoyun converted this issue into discussion #522 Jun 29, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
area/stack-unwinding Something on stack unwinding feature/profiler This feature describes support for a new type of profiler. P0 The most important and urgent
Projects
None yet
Development

No branches or pull requests

3 participants