Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] fail to unwind stack trace, lose data in one thread #2060

Open
joeyleeeeeee97 opened this issue Sep 25, 2023 · 20 comments
Open

[BUG] fail to unwind stack trace, lose data in one thread #2060

joeyleeeeeee97 opened this issue Sep 25, 2023 · 20 comments
Labels
bug Something isn't working

Comments

@joeyleeeeeee97
Copy link

joeyleeeeeee97 commented Sep 25, 2023

Describe the bug

Run a simple cpp demo, but failing to parse symbols and failing to get profiling info from one thread.

I am build the agent and parca server from the latest main branch.

To Reproduce

#include <iostream>
#include <thread>
#include <chrono>
#include <cstring>
#include <unistd.h>
#include <sys/epoll.h>

void epollf() {
    int epollFd = epoll_create1(0);
    if (epollFd == -1) {
        std::cerr << "Failed to create epoll instance: " << std::strerror(errno) << std::endl;
        return;
    }

    struct epoll_event event;
    event.events = EPOLLIN;

    while (true) {
        struct epoll_event events[10];
        int numEvents = epoll_wait(epollFd, events, 1000, 0);

        if (numEvents == -1) {
            std::cerr << "epoll_wait error: " << std::strerror(errno) << std::endl;
            break;
        }

    }

    close(epollFd);
}

int main() {
    std::thread epollThread(epollf);

    while (true) {

    }

    epollThread.join();

    return 0;
}

clang++ -g -o jjjttt main.cpp -lpthread
./jjjttt

Expected behavior

Profiling with all symobls.

Screenshots or Profiles (using https://pprof.me)

image

Only one thread is proflied...

image

The other thread has

Logs

sudo cat /sys/kernel/debug/tracing/trace_pipe

          jjjttt-2177205 [005] d.h.. 5107453.799553: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2177205 [005] d.h.. 5107453.799553: bpf_trace_printk: ~checking shards now
          jjjttt-2177205 [005] d.h.. 5107453.799554: bpf_trace_printk: [info] found chunk
          jjjttt-2177205 [005] d.h.. 5107453.799554: bpf_trace_printk: pid 2177205 tgid 2177205
          jjjttt-2177205 [005] d.h.. 5107453.799555: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2177205 [005] d.h.. 5107453.799555: bpf_trace_printk: traversing stack using .eh_frame information!!
          jjjttt-2177205 [005] d.h.. 5107453.799556: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2177205 [005] d.h.. 5107453.799556: bpf_trace_printk: [debug] Within unwinding machinery loop
          jjjttt-2177205 [005] d.h.. 5107453.799557: bpf_trace_printk: ## frame: 0
          jjjttt-2177205 [005] d.h.. 5107453.799557: bpf_trace_printk: 	current pc: 559071098394
          jjjttt-2177205 [005] d.h.. 5107453.799557: bpf_trace_printk: 	current sp: 7fff9cabaa10
          jjjttt-2177205 [005] d.h.. 5107453.799558: bpf_trace_printk: 	current bp: 7fff9cabaa20
          jjjttt-2177205 [005] d.h.. 5107453.799558: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2177205 [005] d.h.. 5107453.799558: bpf_trace_printk: ~checking shards now
          jjjttt-2177205 [005] d.h.. 5107453.799559: bpf_trace_printk: [info] found chunk
          jjjttt-2177205 [005] d.h.. 5107453.799559: bpf_trace_printk: le offset: 559071097000
          jjjttt-2177205 [005] d.h.. 5107453.799560: bpf_trace_printk: ========== left 74625 right 74883
          jjjttt-2177205 [005] d.h.. 5107453.799560: bpf_trace_printk: 	.done
          jjjttt-2177205 [005] d.h.. 5107453.799560: bpf_trace_printk: 	=> table_index: 74651
          jjjttt-2177205 [005] d.h.. 5107453.799560: bpf_trace_printk: 	=> adjusted pc: 1394
          jjjttt-2177205 [005] d.h.. 5107453.799561: bpf_trace_printk: 	cfa type: 1, offset: 16 (row pc: 1374)
          jjjttt-2177205 [005] d.h.. 5107453.799563: bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7fff9cabaa28.

agent log no obvious error.

parca server log(some symbols are missing as well)

Has
storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"
and
ErrUnknownPC

level=debug name=parca ts=2023-09-25T08:57:31.519863966Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=69389d485a9793dbe873f0ea2c93e02efaa9aa3d pc_range_start=0x28700 pc_range_end=0x1bc4c1
level=debug name=parca ts=2023-09-25T08:57:31.519933345Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=3146e5a5e66e1fd6ab59a39e486a9d246bc675c0 pc_range_start=0x2090 pc_range_end=0x2b375
level=debug name=parca ts=2023-09-25T08:57:31.519963794Z caller=symbolizer.go:299 component=symbolizer buildid=unknown msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"
level=debug name=parca ts=2023-09-25T08:57:31.520031807Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=e22ba7829a55a0dec2201a0b6dac7ba236118561 pc_range_start=0xe030 pc_range_end=0x1e19b
level=debug name=parca ts=2023-09-25T08:57:31.520112609Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=f90a926c28da963c0888cca1180ef28bb88f059b pc_range_start=0x2030 pc_range_end=0x2488
level=debug name=parca ts=2023-09-25T08:57:31.520184515Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=d7435f9680bf6c38f11856638bbbc8a2b755e054 pc_range_start=0x2030 pc_range_end=0x554d0
level=debug name=parca ts=2023-09-25T08:57:31.520219354Z caller=symbolizer.go:299 component=symbolizer buildid=unknown msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"
level=debug name=parca ts=2023-09-25T08:57:31.520289037Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=927b5ce7c32980646eaf7527a8b3d68c3da5ffea pc_range_start=0x10a0 pc_range_end=0x12d8
level=debug name=parca ts=2023-09-25T08:57:31.520358214Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=5ea13eaa57911db0 pc_range_start=0xafb0 pc_range_end=0x4c023
level=debug name=parca ts=2023-09-25T08:57:31.520386623Z caller=symbolizer.go:299 component=symbolizer buildid=unknown msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"
level=debug name=parca ts=2023-09-25T08:57:31.520453982Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=4b60568bc25641b07e93f84f98c77093f3b279f7 pc_range_start=0x6030 pc_range_end=0x1833e
level=debug name=parca ts=2023-09-25T08:57:31.520527563Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=7ab150049702ea494c7c9b681d86dc2d59c89ee1 pc_range_start=0xb030 pc_range_end=0x4ebcd
level=debug name=parca ts=2023-09-25T08:57:31.520600582Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=0665411b8b4fc787710290aacecac4b069bf89fe pc_range_start=0x4030 pc_range_end=0x4718
level=debug name=parca ts=2023-09-25T08:57:31.520681385Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=55447a37514c4a317439786251326b5f762d31392f6f6b7835704d635279724779346e4d6e78414a6a2f6d647a5247354a536e6e38616e6c31636c5954612f38496435624e72466c744770475332794967704c pc_range_start=0x46aaa0 pc_range_end=0xd53aa8
level=debug name=parca ts=2023-09-25T08:57:31.520760407Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=7bf66e3d4dd23edad286982c17e7df08d1ab6fb2 pc_range_start=0x10a0 pc_range_end=0x12e2
level=debug name=parca ts=2023-09-25T08:57:31.520839323Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=7d78d330de33842118d405b4afe4d58878d71b41 pc_range_start=0xb030 pc_range_end=0x73617
**level=debug name=parca ts=2023-09-25T08:57:31.520871855Z caller=symbolizer.go:299 component=symbolizer buildid=unknown msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"**
level=debug name=parca ts=2023-09-25T08:57:31.520939133Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=cc278076d1c0ecf82b5248dfffe7e2d43feafbb7 pc_range_start=0x7030 pc_range_end=0x7ee8
level=debug name=parca ts=2023-09-25T08:57:31.521048469Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=cf6dd909194b0a9dd65eb30f7ec27c4002d0329d pc_range_start=0x3f030 pc_range_end=0x1fdb34
level=debug name=parca ts=2023-09-25T08:57:31.521126984Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=3192c14e4fe2624ff4675ae5a5edfa8a599aa9db pc_range_start=0x2030 pc_range_end=0x2578
level=debug name=parca ts=2023-09-25T08:57:31.521195289Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=d11886324c0b0501c46b595089d667e6894943af pc_range_start=0x3030 pc_range_end=0xa8e7
level=debug name=parca ts=2023-09-25T08:57:31.521273608Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=6673306d4d734c383466637842373044627642442f5a4b76785f546b716f435f2d506957696b447a492f4270345f73557834666d616445414b796b64555f2f4e524d525f67667a4c365a554361577041657232 pc_range_start=0x401e40 pc_range_end=0x49b0a85
level=debug name=parca ts=2023-09-25T08:57:31.521462171Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=229b7dc509053fe4df5e29e8629911f0c3bc66dd pc_range_start=0x28700 pc_range_end=0x1bc501
level=debug name=parca ts=2023-09-25T08:57:31.521507517Z caller=symbolizer.go:299 component=symbolizer buildid=036dde3b7b7d0159 msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"
level=debug name=parca ts=2023-09-25T08:57:31.521520533Z caller=symbolizer.go:289 component=symbolizer msg="mapping of location is empty, skipping"
level=debug name=parca ts=2023-09-25T08:57:31.521590103Z caller=symbolizer.go:516 component=symbolizer msg="symbolizing locations" build_id=e37fe1a879783838de78cbc8c80621fa685d58a2 count=2
level=debug name=parca ts=2023-09-25T08:57:31.522622451Z caller=symbolizer.go:590 component=symbolizer **msg="symbolized location" build_id=e37fe1a879783838de78cbc8c80621fa685d58a2 address=651472 lines_count=0 err="seek to PC: ErrUnknownPC" liner_type=*addr2line.DwarfLiner
level=debug name=parca ts=2023-09-25T08:57:31.523523449Z caller=symbolizer.go:590 component=symbolizer msg="symbolized location" build_id=e37fe1a879783838de78cbc8c80621fa685d58a2 address=654128 lines_count=0 err="seek to PC: ErrUnknownPC" liner_type=*addr2line.DwarfLiner**
level=debug name=parca ts=2023-09-25T08:57:31.52356687Z caller=symbolizer.go:191 component=symbolizer msg="symbolization loop completed"
level=debug name=parca ts=2023-09-25T08:57:37.245414226Z caller=badger.go:452 msg=GetOrCreateStacktraces stacktrace_keys_len=49
level=debug name=parca ts=2023-09-25T08:57:37.246795774Z caller=badger.go:58 msg="writeRequests called. Writing to value log"
level=debug name=parca ts=2023-09-25T08:57:37.246872034Z caller=badger.go:58 msg="Writing to memtable"
level=debug name=parca ts=2023-09-25T08:57:37.246937884Z caller=badger.go:58 msg="Sending updates to subscribers"
level=debug name=parca ts=2023-09-25T08:57:37.246969165Z caller=badger.go:58 msg="2 entries written"
level=debug name=parca ts=2023-09-25T08:57:37.266391517Z caller=server.go:314 msg="finished call" protocol=grpc grpc.component=server grpc.service=parca.profilestore.v1alpha1.ProfileStoreService grpc.method=WriteRaw grpc.method_type=unary peer.address=127.0.0.1:59764 grpc.start_time=2023-09-25T08:57:37Z grpc.request.deadline=2023-09-25T08:59:37Z grpc.code=OK grpc.time_ms=24.529

Software (please complete the following information):

  • Parca Agent Version: [parca-agent --version]
  • Parca Server Version (if applicable): [parca --version]

Workload (please complete the following information):

  • Runtime (if applicable):
  • Compiler (if applicable):

Environment (please complete the following information):

Additional context

@joeyleeeeeee97
Copy link
Author

joeyleeeeeee97 commented Sep 25, 2023

@kakkoyun @javierhonduco Hi could you take a look at this? Thanks

There might be two issues:

  1. failed to symbolize stack
  2. missing profiling from some thread

@joeyleeeeeee97
Copy link
Author

joeyleeeeeee97 commented Sep 25, 2023

We could simplify the reproduction by one thread only.

#include <iostream>
#include <thread>
#include <chrono>
#include <cstring>
#include <unistd.h>
#include <sys/epoll.h>

void epollf() {
    int epollFd = epoll_create1(0);
    if (epollFd == -1) {
        std::cerr << "Failed to create epoll instance: " << std::strerror(errno) << std::endl;
        return;
    }

    struct epoll_event event;
    event.events = EPOLLIN;

    while (true) {
        struct epoll_event events[10];
        int numEvents = epoll_wait(epollFd, events, 1000, 0);

        if (numEvents == -1) {
            std::cerr << "epoll_wait error: " << std::strerror(errno) << std::endl;
            break;
        }

    }

    close(epollFd);
}

int main() {
// not run the epoll thread
    while (true) {

    }


    return 0;
}

and we got this in sudo cat /sys/kernel/debug/tracing/trace_pipe

          jjjttt-2186346 [032] d.h.. 5108702.762783: bpf_trace_printk: ## frame: 2
          jjjttt-2186346 [032] d.h.. 5108702.762783: bpf_trace_printk: 	current pc: 7fed17ba7e40
          jjjttt-2186346 [032] d.h.. 5108702.762783: bpf_trace_printk: 	current sp: 7fff0f88d630
          jjjttt-2186346 [032] d.h.. 5108702.762783: bpf_trace_printk: 	current bp: 1
          jjjttt-2186346 [032] d.h.. 5108702.762784: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108702.762784: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108702.762784: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108702.762785: bpf_trace_printk: le offset: 7fed17b7e000
          jjjttt-2186346 [032] d.h.. 5108702.762785: bpf_trace_printk: ========== left 29 right 29075
          jjjttt-2186346 [032] d.h.. 5108702.762785: bpf_trace_printk: 	.done
          jjjttt-2186346 [032] d.h.. 5108702.762786: bpf_trace_printk: 	=> table_index: 266
          jjjttt-2186346 [032] d.h.. 5108702.762786: bpf_trace_printk: 	=> adjusted pc: 29e40
          jjjttt-2186346 [032] d.h.. 5108702.762786: bpf_trace_printk: 	cfa type: 2, offset: 80 (row pc: 29ddd)
          jjjttt-2186346 [032] d.h.. 5108702.762787: bpf_trace_printk: 	(bp_offset: -48, bp value stored at 7fff0f88d650)
          jjjttt-2186346 [032] d.h.. 5108702.762788: bpf_trace_printk: 	previous ip: 55a475d9a135 (@ 7fff0f88d678)
          jjjttt-2186346 [032] d.h.. 5108702.762788: bpf_trace_printk: 	previous sp: 7fff0f88d680
          jjjttt-2186346 [032] d.h.. 5108702.762788: bpf_trace_printk: 	previous bp: 0
          jjjttt-2186346 [032] d.h.. 5108702.762788: bpf_trace_printk: [debug] Within unwinding machinery loop
          jjjttt-2186346 [032] d.h.. 5108702.762789: bpf_trace_printk: ## frame: 3
          jjjttt-2186346 [032] d.h.. 5108702.762789: bpf_trace_printk: 	current pc: 55a475d9a135
          jjjttt-2186346 [032] d.h.. 5108702.762789: bpf_trace_printk: 	current sp: 7fff0f88d680
          jjjttt-2186346 [032] d.h.. 5108702.762790: bpf_trace_printk: 	current bp: 0
          jjjttt-2186346 [032] d.h.. 5108702.762790: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108702.762790: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108702.762790: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108702.762791: bpf_trace_printk: le offset: 55a475d99000
          jjjttt-2186346 [032] d.h.. 5108702.762791: bpf_trace_printk: ========== left 74596 right 74624
          jjjttt-2186346 [032] d.h.. 5108702.762791: bpf_trace_printk: 	.done
          jjjttt-2186346 [032] d.h.. 5108702.762792: bpf_trace_printk: 	=> table_index: 74613
          jjjttt-2186346 [032] d.h.. 5108702.762792: bpf_trace_printk: 	=> adjusted pc: 1135
          jjjttt-2186346 [032] d.h.. 5108702.762793: bpf_trace_printk: 	cfa type: 2, offset: 8 (row pc: 1114)
          jjjttt-2186346 [032] d.h.. 5108702.762793: bpf_trace_printk: [info] null return address, end of stack
          jjjttt-2186346 [032] d.h.. 5108702.762793: bpf_trace_printk: ======= reached main! =======
          jjjttt-2186346 [032] d.h.. 5108702.762794: bpf_trace_printk: native stack hash 2006046535
          jjjttt-2186346 [032] d.h.. 5108702.762795: bpf_trace_printk: [debug] not an interpreter
          jjjttt-2186346 [032] d.h.. 5108702.762795: bpf_trace_printk: [MMR] aggregate stack count for pid: 2186346
          jjjttt-2186346 [032] d.h.. 5108702.815397: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108702.815398: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108702.815399: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108702.815399: bpf_trace_printk: pid 2186346 tgid 2186346
          jjjttt-2186346 [032] d.h.. 5108702.815399: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2186346 [032] d.h.. 5108702.815400: bpf_trace_printk: traversing stack using .eh_frame information!!
          jjjttt-2186346 [032] d.h.. 5108702.815400: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2186346 [032] d.h.. 5108702.815401: bpf_trace_printk: [debug] Within unwinding machinery loop
          jjjttt-2186346 [032] d.h.. 5108702.815401: bpf_trace_printk: ## frame: 0
          jjjttt-2186346 [032] d.h.. 5108702.815401: bpf_trace_printk: 	current pc: 55a475d9a30b
          jjjttt-2186346 [032] d.h.. 5108702.815402: bpf_trace_printk: 	current sp: 7fff0f88d580
          jjjttt-2186346 [032] d.h.. 5108702.815402: bpf_trace_printk: 	current bp: 7fff0f88d580
          jjjttt-2186346 [032] d.h.. 5108702.815402: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108702.815403: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108702.815403: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108702.815403: bpf_trace_printk: le offset: 55a475d99000
          jjjttt-2186346 [032] d.h.. 5108702.815404: bpf_trace_printk: ========== left 74596 right 74624
          jjjttt-2186346 [032] d.h.. 5108702.815404: bpf_trace_printk: 	.done
          jjjttt-2186346 [032] d.h.. 5108702.815404: bpf_trace_printk: 	=> table_index: 74622
          jjjttt-2186346 [032] d.h.. 5108702.815405: bpf_trace_printk: 	=> adjusted pc: 130b
          jjjttt-2186346 [032] d.h.. 5108702.815405: bpf_trace_printk: 	cfa type: 1, offset: 16 (row pc: 1304)
          jjjttt-2186346 [032] d.h.. 5108702.815406: bpf_trace_printk: 	(bp_offset: -16, bp value stored at 7fff0f88d580)
          jjjttt-2186346 [032] d.h.. 5108702.815407: bpf_trace_printk: 	previous ip: 7fed17ba7d90 (@ 7fff0f88d588)
          jjjttt-2186346 [032] d.h.. 5108702.815407: bpf_trace_printk: 	previous sp: 7fff0f88d590
          jjjttt-2186346 [032] d.h.. 5108702.815407: bpf_trace_printk: 	previous bp: 1
          jjjttt-2186346 [032] d.h.. 5108702.815408: bpf_trace_printk: [debug] Within unwinding machinery loop
          jjjttt-2186346 [032] d.h.. 5108702.815408: bpf_trace_printk: ## frame: 1
          jjjttt-2186346 [032] d.h.. 5108702.815408: bpf_trace_printk: 	current pc: 7fed17ba7d90
          jjjttt-2186346 [032] d.h.. 5108702.815409: bpf_trace_printk: 	current sp: 7fff0f88d590
          jjjttt-2186346 [032] d.h.. 5108702.815409: bpf_trace_printk: 	current bp: 1
          jjjttt-2186346 [032] d.h.. 5108702.815409: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108702.815410: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108702.815410: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108702.815410: bpf_trace_printk: le offset: 7fed17b7e000
          jjjttt-2186346 [032] d.h.. 5108702.815411: bpf_trace_printk: ========== left 29 right 29075
          jjjttt-2186346 [032] d.h.. 5108702.815411: bpf_trace_printk: 	.done
          jjjttt-2186346 [032] d.h.. 5108702.815411: bpf_trace_printk: 	=> table_index: 257
          jjjttt-2186346 [032] d.h.. 5108702.815412: bpf_trace_printk: 	=> adjusted pc: 29d90
          jjjttt-2186346 [032] d.h.. 5108702.815412: bpf_trace_printk: 	cfa type: 2, offset: 160 (row pc: 29d19)
          jjjttt-2186346 [032] d.h.. 5108702.815413: bpf_trace_printk: 	previous ip: 7fed17ba7e40 (@ 7fff0f88d628)
          jjjttt-2186346 [032] d.h.. 5108702.815413: bpf_trace_printk: 	previous sp: 7fff0f88d630
          jjjttt-2186346 [032] d.h.. 5108702.815414: bpf_trace_printk: 	previous bp: 1
          jjjttt-2186346 [032] d.h.. 5108702.815414: bpf_trace_printk: [debug] Within unwinding machinery loop
          jjjttt-2186346 [032] d.h.. 5108702.815414: bpf_trace_printk: ## frame: 2
          jjjttt-2186346 [032] d.h.. 5108702.815414: bpf_trace_printk: 	current pc: 7fed17ba7e40
          jjjttt-2186346 [032] d.h.. 5108702.815415: bpf_trace_printk: 	current sp: 7fff0f88d630
          jjjttt-2186346 [032] d.h.. 5108702.815415: bpf_trace_printk: 	current bp: 1
          jjjttt-2186346 [032] d.h.. 5108702.815415: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108702.815416: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108702.815416: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108702.815416: bpf_trace_printk: le offset: 7fed17b7e000
          jjjttt-2186346 [032] d.h.. 5108702.815417: bpf_trace_printk: ========== left 29 right 29075
          jjjttt-2186346 [032] d.h.. 5108702.815417: bpf_trace_printk: 	.done
          jjjttt-2186346 [032] d.h.. 5108702.815417: bpf_trace_printk: 	=> table_index: 266
          jjjttt-2186346 [032] d.h.. 5108702.815417: bpf_trace_printk: 	=> adjusted pc: 29e40
          jjjttt-2186346 [032] d.h.. 5108702.815418: bpf_trace_printk: 	cfa type: 2, offset: 80 (row pc: 29ddd)
          jjjttt-2186346 [032] d.h.. 5108702.815419: bpf_trace_printk: 	(bp_offset: -48, bp value stored at 7fff0f88d650)
          jjjttt-2186346 [032] d.h.. 5108702.815419: bpf_trace_printk: 	previous ip: 55a475d9a135 (@ 7fff0f88d678)
          jjjttt-2186346 [032] d.h.. 5108702.815419: bpf_trace_printk: 	previous sp: 7fff0f88d680
          jjjttt-2186346 [032] d.h.. 5108702.815420: bpf_trace_printk: 	previous bp: 0
          jjjttt-2186346 [032] d.h.. 5108702.815420: bpf_trace_printk: [debug] Within unwinding machinery loop
          jjjttt-2186346 [032] d.h.. 5108702.815420: bpf_trace_printk: ## frame: 3
          jjjttt-2186346 [032] d.h.. 5108702.815421: bpf_trace_printk: 	current pc: 55a475d9a135
          jjjttt-2186346 [032] d.h.. 5108702.815421: bpf_trace_printk: 	current sp: 7fff0f88d680
          jjjttt-2186346 [032] d.h.. 5108702.815421: bpf_trace_printk: 	current bp: 0
          jjjttt-2186346 [032] d.h.. 5108702.815421: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108702.815422: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108702.815422: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108702.815422: bpf_trace_printk: le offset: 55a475d99000
          jjjttt-2186346 [032] d.h.. 5108702.815423: bpf_trace_printk: ========== left 74596 right 74624
          jjjttt-2186346 [032] d.h.. 5108702.815423: bpf_trace_printk: 	.done
          jjjttt-2186346 [032] d.h.. 5108702.815423: bpf_trace_printk: 	=> table_index: 74613
          jjjttt-2186346 [032] d.h.. 5108702.815423: bpf_trace_printk: 	=> adjusted pc: 1135
          jjjttt-2186346 [032] d.h.. 5108702.815424: bpf_trace_printk: 	cfa type: 2, offset: 8 (row pc: 1114)
          jjjttt-2186346 [032] d.h.. 5108702.815424: bpf_trace_printk: [info] null return address, end of stack
          jjjttt-2186346 [032] d.h.. 5108702.815425: bpf_trace_printk: ======= reached main! =======
          jjjttt-2186346 [032] d.h.. 5108702.815425: bpf_trace_printk: native stack hash 2006046535
          jjjttt-2186346 [032] d.h.. 5108702.815426: bpf_trace_printk: [debug] not an interpreter
          jjjttt-2186346 [032] d.h.. 5108702.815427: bpf_trace_printk: [MMR] aggregate stack count for pid: 2186346
          jjjttt-2186346 [032] d.h.. 5108702.868029: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108702.868031: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108702.868031: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108702.868031: bpf_trace_printk: pid 2186346 tgid 2186346
          jjjttt-2186346 [032] d.h.. 5108702.868032: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2186346 [032] d.h.. 5108702.868032: bpf_trace_printk: traversing stack using .eh_frame information!!
          jjjttt-2186346 [032] d.h.. 5108702.868033: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2186346 [032] d.h.. 5108702.868033: bpf_trace_printk: [debug] Within unwinding machinery loop
          jjjttt-2186346 [032] d.h.. 5108702.868034: bpf_trace_printk: ## frame: 0
          jjjttt-2186346 [032] d.h.. 5108702.868034: bpf_trace_printk: 	current pc: 55a475d9a30b
          jjjttt-2186346 [032] d.h.. 5108702.868034: bpf_trace_printk: 	current sp: 7fff0f88d580
          jjjttt-2186346 [032] d.h.. 5108702.868035: bpf_trace_printk: 	current bp: 7fff0f88d580
          jjjttt-2186346 [032] d.h.. 5108702.868035: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108702.868035: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108702.868036: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108702.868036: bpf_trace_printk: le offset: 55a475d99000
          jjjttt-2186346 [032] d.h.. 5108702.868037: bpf_trace_printk: ========== left 74596 right 74624
          jjjttt-2186346 [032] d.h.. 5108702.868037: bpf_trace_printk: 	.done
          jjjttt-2186346 [032] d.h.. 5108702.868037: bpf_trace_printk: 	=> table_index: 74622
          jjjttt-2186346 [032] d.h.. 5108702.868038: bpf_trace_printk: 	=> adjusted pc: 130b
          jjjttt-2186346 [032] d.h.. 5108702.868038: bpf_trace_printk: 	cfa type: 1, offset: 16 (row pc: 1304)
          jjjttt-2186346 [032] d.h.. 5108702.868043: bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7fff0f88d588.
          jjjttt-2186346 [032] d.h.. 5108702.920660: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108702.920661: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108702.920661: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108702.920661: bpf_trace_printk: pid 2186346 tgid 2186346
          jjjttt-2186346 [032] d.h.. 5108702.920662: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2186346 [032] d.h.. 5108702.920662: bpf_trace_printk: traversing stack using .eh_frame information!!
          jjjttt-2186346 [032] d.h.. 5108702.920663: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2186346 [032] d.h.. 5108702.920663: bpf_trace_printk: [debug] Within unwinding machinery loop
          jjjttt-2186346 [032] d.h.. 5108702.920663: bpf_trace_printk: ## frame: 0
          jjjttt-2186346 [032] d.h.. 5108702.920664: bpf_trace_printk: 	current pc: 55a475d9a30b
          jjjttt-2186346 [032] d.h.. 5108702.920664: bpf_trace_printk: 	current sp: 7fff0f88d580
          jjjttt-2186346 [032] d.h.. 5108702.920665: bpf_trace_printk: 	current bp: 7fff0f88d580
          jjjttt-2186346 [032] d.h.. 5108702.920665: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108702.920665: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108702.920666: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108702.920666: bpf_trace_printk: le offset: 55a475d99000
          jjjttt-2186346 [032] d.h.. 5108702.920667: bpf_trace_printk: ========== left 74596 right 74624
          jjjttt-2186346 [032] d.h.. 5108702.920667: bpf_trace_printk: 	.done
          jjjttt-2186346 [032] d.h.. 5108702.920667: bpf_trace_printk: 	=> table_index: 74622
          jjjttt-2186346 [032] d.h.. 5108702.920668: bpf_trace_printk: 	=> adjusted pc: 130b
          jjjttt-2186346 [032] d.h.. 5108702.920668: bpf_trace_printk: 	cfa type: 1, offset: 16 (row pc: 1304)
          jjjttt-2186346 [032] d.h.. 5108702.920671: bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7fff0f88d588.
          jjjttt-2186346 [032] d.h.. 5108702.973291: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108702.973291: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108702.973292: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108702.973292: bpf_trace_printk: pid 2186346 tgid 2186346
          jjjttt-2186346 [032] d.h.. 5108702.973293: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2186346 [032] d.h.. 5108702.973293: bpf_trace_printk: traversing stack using .eh_frame information!!
          jjjttt-2186346 [032] d.h.. 5108702.973294: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2186346 [032] d.h.. 5108702.973294: bpf_trace_printk: [debug] Within unwinding machinery loop
          jjjttt-2186346 [032] d.h.. 5108702.973294: bpf_trace_printk: ## frame: 0
          jjjttt-2186346 [032] d.h.. 5108702.973295: bpf_trace_printk: 	current pc: 55a475d9a30b
          jjjttt-2186346 [032] d.h.. 5108702.973295: bpf_trace_printk: 	current sp: 7fff0f88d580
          jjjttt-2186346 [032] d.h.. 5108702.973295: bpf_trace_printk: 	current bp: 7fff0f88d580
          jjjttt-2186346 [032] d.h.. 5108702.973296: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108702.973296: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108702.973296: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108702.973297: bpf_trace_printk: le offset: 55a475d99000
          jjjttt-2186346 [032] d.h.. 5108702.973297: bpf_trace_printk: ========== left 74596 right 74624
          jjjttt-2186346 [032] d.h.. 5108702.973297: bpf_trace_printk: 	.done
          jjjttt-2186346 [032] d.h.. 5108702.973298: bpf_trace_printk: 	=> table_index: 74622
          jjjttt-2186346 [032] d.h.. 5108702.973298: bpf_trace_printk: 	=> adjusted pc: 130b
          jjjttt-2186346 [032] d.h.. 5108702.973298: bpf_trace_printk: 	cfa type: 1, offset: 16 (row pc: 1304)
          jjjttt-2186346 [032] d.h.. 5108702.973301: bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7fff0f88d588.
          jjjttt-2186346 [032] d.h.. 5108703.025922: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108703.025923: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108703.025923: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108703.025923: bpf_trace_printk: pid 2186346 tgid 2186346
          jjjttt-2186346 [032] d.h.. 5108703.025924: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2186346 [032] d.h.. 5108703.025924: bpf_trace_printk: traversing stack using .eh_frame information!!
          jjjttt-2186346 [032] d.h.. 5108703.025924: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2186346 [032] d.h.. 5108703.025925: bpf_trace_printk: [debug] Within unwinding machinery loop
          jjjttt-2186346 [032] d.h.. 5108703.025925: bpf_trace_printk: ## frame: 0
          jjjttt-2186346 [032] d.h.. 5108703.025925: bpf_trace_printk: 	current pc: 55a475d9a30b
          jjjttt-2186346 [032] d.h.. 5108703.025926: bpf_trace_printk: 	current sp: 7fff0f88d580
          jjjttt-2186346 [032] d.h.. 5108703.025926: bpf_trace_printk: 	current bp: 7fff0f88d580
          jjjttt-2186346 [032] d.h.. 5108703.025927: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108703.025927: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108703.025927: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108703.025927: bpf_trace_printk: le offset: 55a475d99000
          jjjttt-2186346 [032] d.h.. 5108703.025928: bpf_trace_printk: ========== left 74596 right 74624
          jjjttt-2186346 [032] d.h.. 5108703.025928: bpf_trace_printk: 	.done
          jjjttt-2186346 [032] d.h.. 5108703.025929: bpf_trace_printk: 	=> table_index: 74622
          jjjttt-2186346 [032] d.h.. 5108703.025929: bpf_trace_printk: 	=> adjusted pc: 130b
          jjjttt-2186346 [032] d.h.. 5108703.025929: bpf_trace_printk: 	cfa type: 1, offset: 16 (row pc: 1304)
          jjjttt-2186346 [032] d.h.. 5108703.025932: bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7fff0f88d588.
          jjjttt-2186346 [032] d.h.. 5108703.078553: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108703.078554: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108703.078554: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108703.078554: bpf_trace_printk: pid 2186346 tgid 2186346
          jjjttt-2186346 [032] d.h.. 5108703.078555: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2186346 [032] d.h.. 5108703.078555: bpf_trace_printk: traversing stack using .eh_frame information!!
          jjjttt-2186346 [032] d.h.. 5108703.078556: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2186346 [032] d.h.. 5108703.078556: bpf_trace_printk: [debug] Within unwinding machinery loop
          jjjttt-2186346 [032] d.h.. 5108703.078556: bpf_trace_printk: ## frame: 0
          jjjttt-2186346 [032] d.h.. 5108703.078557: bpf_trace_printk: 	current pc: 55a475d9a30b
          jjjttt-2186346 [032] d.h.. 5108703.078557: bpf_trace_printk: 	current sp: 7fff0f88d580
          jjjttt-2186346 [032] d.h.. 5108703.078557: bpf_trace_printk: 	current bp: 7fff0f88d580
          jjjttt-2186346 [032] d.h.. 5108703.078558: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2186346 [032] d.h.. 5108703.078558: bpf_trace_printk: ~checking shards now
          jjjttt-2186346 [032] d.h.. 5108703.078558: bpf_trace_printk: [info] found chunk
          jjjttt-2186346 [032] d.h.. 5108703.078558: bpf_trace_printk: le offset: 55a475d99000
          jjjttt-2186346 [032] d.h.. 5108703.078559: bpf_trace_printk: ========== left 74596 right 74624
          jjjttt-2186346 [032] d.h.. 5108703.078559: bpf_trace_printk: 	.done
          jjjttt-2186346 [032] d.h.. 5108703.078559: bpf_trace_printk: 	=> table_index: 74622
          jjjttt-2186346 [032] d.h.. 5108703.078560: bpf_trace_printk: 	=> adjusted pc: 130b
          jjjttt-2186346 [032] d.h.. 5108703.078560: bpf_trace_printk: 	cfa type: 1, offset: 16 (row pc: 1304)
          jjjttt-2186346 [032] d.h.. 5108703.078562: bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7fff0f88d588.
         

@joeyleeeeeee97
Copy link
Author

normal unwind vs error unwind
image
image

@javierhonduco
Copy link
Contributor

Thanks for opening this issue. Will take a look over the next few days.

@javierhonduco
Copy link
Contributor

Something that stands out is this error image

errno -14 is EFAULT which means that the data isn't paged in, that can be for a variety of reasons, such as that area paged out to SWAP. Some BPF programs, such as the tracing ones we use here can't bring pages back to memory.

Could you try running this program with swap disabled to see if this is what could be happening?

@dreamerlzl
Copy link
Contributor

dreamerlzl commented Sep 26, 2023

I meet a similar problem in ubuntu 22.04 as well. I try to disable swap for a process with cgroup v2 like the following:

cgcreate -g memory:parca
cd /sys/fs/cgroup/parca/ && echo 0 > memory.swap.max
cgexec -g memory:parca <comm>

more symbols are resolved after disabling swap, but I can still see errors for the above process

bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7ffde661c8f8.

@javierhonduco
Copy link
Contributor

more symbols are resolved after disabling swap

More symbols are resolved or more stacks are unwound and sent to the server?

Thanks for trying disabling SWAP! Could you try running https://github.com/javierhonduco/bookmark on your machine while your applications is running and send the statistics belonging to your process' main executable and dynamic libraries? Sorry that I don't provide good instructions on how to install it or run it yet, but cargo run should do

@dreamerlzl
Copy link
Contributor

dreamerlzl commented Sep 26, 2023

more symbols are resolved after disabling swap

More symbols are resolved or more stacks are unwound and sent to the server?

Thanks for trying disabling SWAP! Could you try running https://github.com/javierhonduco/bookmark on your machine while your applications is running and send the statistics belonging to your process' main executable and dynamic libraries? Sorry that I don't provide good instructions on how to install it or run it yet, but cargo run should do

it's weird that the symbol resolution becomes as poor as it used to be with another a few runs...

/path/to/my/program PageStats { swapped: 0, present: 4, unmapped: 5, total: 5 }
/usr/lib/x86_64-linux-gnu/libc.so.6 PageStats { swapped: 0, present: 231, unmapped: 539, total: 539 }
[vvar] PageStats { swapped: 0, present: 0, unmapped: 4, total: 4 }
[vdso] PageStats { swapped: 0, present: 1, unmapped: 2, total: 2 }
/usr/lib/x86_64-linux-gnu/libm.so.6 PageStats { swapped: 0, present: 80, unmapped: 231, total: 231 }
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30 PageStats { swapped: 0, present: 349, unmapped: 553, total: 553 }
anon PageStats { swapped: 0, present: 16, unmapped: 2072, total: 2072 }
[vsyscall] PageStats { swapped: 0, present: 0, unmapped: 1, total: 1 }
/usr/lib/x86_64-linux-gnu/libgcc_s.so.1 PageStats { swapped: 0, present: 25, unmapped: 32, total: 32 }
/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 PageStats { swapped: 0, present: 59, unmapped: 59, total: 59 }
[stack] PageStats { swapped: 0, present: 3, unmapped: 33, total: 33 }
[heap] PageStats { swapped: 0, present: 2, unmapped: 33, total: 33 }

@javierhonduco
Copy link
Contributor

javierhonduco commented Sep 26, 2023

Two thoughts, your program is totally reasonable and on my system it has perfectly valid DWARF information. I run Fedora 38 with clang++ version 16.0.6 (Fedora 16.0.6-2.fc38) on x86_64.

$ clang++ -fomit-frame-pointer first_repro.c -o repro

https://pprof.me/1e5151a

Also got no errors, and the profile looks reasonable to me (minus symbolization). Of course, this depends on a system-to-systems basis depending on what pages might be already present in memory or not

@javierhonduco
Copy link
Contributor

@dreamerlzl will take a closer look at these metrics next week, thanks for gathering them :)

@dreamerlzl
Copy link
Contributor

dreamerlzl commented Sep 28, 2023

update: with another environment, the symbol resolution looks fine here and no "rip should not be zero" for the simple example.

and the bookmark looks like this(without disabling swap):

[vdso] PageStats { swapped: 0, present: 1, unmapped: 2, total: 2 }
[vsyscall] PageStats { swapped: 0, present: 0, unmapped: 1, total: 1 }
[vvar] PageStats { swapped: 0, present: 0, unmapped: 4, total: 4 }
[stack] PageStats { swapped: 1, present: 2, unmapped: 33, total: 33 }
/path/to/the/example PageStats { swapped: 1, present: 2, unmapped: 5, total: 5 }
/usr/lib/libgcc_s.so.1 PageStats { swapped: 2, present: 24, unmapped: 37, total: 37 }
/usr/lib/libm.so.6 PageStats { swapped: 2, present: 72, unmapped: 237, total: 237 }
[heap] PageStats { swapped: 2, present: 0, unmapped: 33, total: 33 }
/usr/lib/ld-linux-x86-64.so.2 PageStats { swapped: 4, present: 49, unmapped: 53, total: 53 }
/usr/lib/libc.so.6 PageStats { swapped: 7, present: 259, unmapped: 576, total: 576 }
anon PageStats { swapped: 12, present: 514, unmapped: 2072, total: 2072 }
/usr/lib/libstdc++.so.6.0.32 PageStats { swapped: 14, present: 348, unmapped: 631, total: 631 }

environment:

Linux ***** 6.4.12-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 24 Aug 2023 00:38:14 +0000 x86_64 GNU/Linux
parca: 0.19
parca-agent: 0.25.1

@javierhonduco
Copy link
Contributor

javierhonduco commented Sep 28, 2023

Could you also provide the info per physical page just for the stack memory mapping for the process that fails with rip should not be zero?

@dreamerlzl
Copy link
Contributor

Could you also provide the info per physical page just for the stack memory mapping for the process that fails with rip should not be zero?

physical_with_swap.txt
physical_without_swap.txt

@dreamerlzl
Copy link
Contributor

dreamerlzl commented Oct 13, 2023

It's strange that the problem can always be reproduced on an AWS bare metal with ubuntu 22.04.3 but not in a VM(with the same bare metal). Could it be the case that hardware security makes a difference?

@joeyleeeeeee97
Copy link
Author

@javierhonduco .

The latest update is this issue seems to be related to some security enhancements we patched on ubuntu. Still trying to identify the specific change.

@brancz
Copy link
Member

brancz commented Oct 13, 2023

Oh interesting, let us know as you learn more! (FYI Javier is on PTO this and next week)

@javierhonduco
Copy link
Contributor

@joeyleeeeeee97 Let us know what you find out. One more thing that could potentially be useful here would be the output of sudo bpftool prog dump xlated id $ID | grep "call bpf_probe" where $ID is the program id that can be found with sudo bpftool prog

@kakkoyun kakkoyun added the bug Something isn't working label Nov 22, 2023
@dreamerlzl
Copy link
Contributor

dreamerlzl commented Feb 1, 2024

@joeyleeeeeee97 Let us know what you find out. One more thing that could potentially be useful here would be the output of sudo bpftool prog dump xlated id $ID | grep "call bpf_probe" where $ID is the program id that can be found with sudo bpftool prog

@javierhonduco On my side it's

266: (85) call bpf_probe_read_user#-69216
287: (85) call bpf_probe_read_user#-69216
698: (85) call bpf_probe_read_user#-69216
802: (85) call bpf_probe_read_user#-69216

for parca-agent commit: 1a09d01

BTW, I still suspect the hardware plays a role here. This problem only occurs on physical host but not VM of the same OS

@javierhonduco
Copy link
Contributor

Hi @dreamerlzl, I no longer work at Polar Signals and I am no longer contributing to this project. Feel free to contact another maintainer

@dreamerlzl
Copy link
Contributor

Hi @dreamerlzl, I no longer work at Polar Signals and I am no longer contributing to this project. Feel free to contact another maintainer

Wish you all the best! Thank you for all your work to open-source world.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants