Support kernel stack map #2671

i-Pear · 2024-04-01T16:17:35Z

It's with some hacking, and only for test purpose.

i-Pear · 2024-04-01T16:22:01Z

Currently, I am unsure about who should hold the stackIDMap (perhaps the front end which prints the stack? Is it localManager or IgManager?), nor do I know how the stack should be printed. I need further understanding of the project's structure.

i-Pear · 2024-04-01T16:26:40Z

The method mentioned in #2553 seems overly complicated; I am unsure why it requires mapping fd:

struct gadget_stack_ref {
    int stack_map; // fd of the stack map
    int stack_id; // value returned by bpf_get_stackid()
}

Furthermore, its purpose differs from that of MntNsFilter. I think we cannot simply copy the code from MntNsFilter because MntNsFilter needs to be accessed by facilities like localManager responsible for filtering container access, whereas stackIdMap is only used by the frontend and does not need to be accessed in various parts of the project.

alban · 2024-04-01T17:53:31Z

I have not thought of defining the ebpf map in a header (include/gadget/stack_map.h). I was thinking it should be the responsibility of the gadget itself to define its stack map.

With your method, there can be only one stack map per gadget, it has to be named "gadget_stack_trace_map" and that's part of the ABI between IG and the gadget. That ABI would need to be documented on gadget-helper-api.md.

The headers in include/gadget are meant to be helpers but are not mandatory. It should be possible for gadget authors to write their gadgets in a different way, for example in Rust and compiled into eBPF and packaged in the OCI image. IG should be agnostic about that.

I am unsure why it requires mapping fd:

struct gadget_stack_ref {
int stack_map; // fd of the stack map
int stack_id; // value returned by bpf_get_stackid()
}

I added stack_map as a way to write a reference to the stack map because I didn't want to limit this to one stack map per gadget.

Also, note that a uprobe program such as:

SEC("uprobe/libc:free")

could be attached to several containers with different versions of libc. So the address of a function on the stack has to be interpreted differently depending on the libc version. This can be resolved by using a different stack map for each uprobe attachment. In this case, it is useful to have the field stack_map to distinguish which stack map it is referring to.

If you look at the pkg/networktracer/tracer.go as example, it has its own listen(), eventHandler() and SetEventHandler() functions. If pkg/uprobetracer/tracer.go does the same, then it has access to the []byte from the ring buffer and you also have access to the ebpf stack maps. But that's probably not the right place to add this because stack maps should work for different kinds of ebpf programs...

@flyth I don't know where to add the code after the refactoring. I see field accessors have a method Set() which takes []byte as input, but in the case of a stack passed in a ring buffer, the serialized bytes are not enough because we need to have access to the stack map and do a bpf(BPF_MAP_LOOKUP_ELEM). Could you shed some light on this?

i-Pear · 2024-04-02T05:45:56Z

containers with different versions of libc. So the address of a function on the stack has to be interpreted differently depending on the libc version. This can be resolved by using a different stack map for each uprobe attachment.

I may lack the necessary knowledge, but why can't stacks from different libraries & different gadgets be mixed within the same stack map? As I understand, the stack map simply associates a stackID with a stack (a set of addresses). In the example eBPF program attached to this PR, stackIDs are recorded alongside PIDs, and then the frontend uses the stackID to find the corresponding stack data, interpreting it using symbols associated with the PID. This doesn't seem to cause any confusion.

alban · 2024-04-02T07:31:26Z

containers with different versions of libc. So the address of a function on the stack has to be interpreted differently depending on the libc version. This can be resolved by using a different stack map for each uprobe attachment.

I may lack the necessary knowledge, but why can't stacks from different libraries & different gadgets be mixed within the same stack map? As I understand, the stack map simply associates a stackID with a stack (a set of addresses).

Correct.

In the example eBPF program attached to this PR, stackIDs are recorded alongside PIDs, and then the frontend uses the stackID to find the corresponding stack data, interpreting it using symbols associated with the PID. This doesn't seem to cause any confusion.

Do we have to use the PID? I am concerned that if the target process terminates before ig could open /proc/$pid/maps, it can't work. I thought that if we know that the event comes from /bin/bash and /lib64/libc.so.6, we can resolve the addresses to the symbols even when the process terminated. In this case, we need to know if a probe comes from /lib64/libc.so.6 or another version of libc. But I guess I didn't account for memory relocations of dynamic libraries, so maybe my idea does not work.

flyth · 2024-04-02T09:24:56Z

@flyth I don't know where to add the code after the refactoring. I see field accessors have a method Set() which takes []byte as input, but in the case of a stack passed in a ring buffer, the serialized bytes are not enough because we need to have access to the stack map and do a bpf(BPF_MAP_LOOKUP_ELEM). Could you shed some light on this?

I think I need more info here - but it sounds like you would want to receive the stackID from the ring buffer, do the lookup in userspace and then send whatever you receive through the DataSource (and not (just) the stackID).

alban · 2024-04-02T10:39:54Z

@flyth I don't know where to add the code after the refactoring. I see field accessors have a method Set() which takes []byte as input, but in the case of a stack passed in a ring buffer, the serialized bytes are not enough because we need to have access to the stack map and do a bpf(BPF_MAP_LOOKUP_ELEM). Could you shed some light on this?

I think I need more info here - but it sounds like you would want to receive the stackID from the ring buffer, do the lookup in userspace and then send whatever you receive through the DataSource (and not (just) the stackID).

Yes.

In the ring buffer, we get i.e. stack_id = 42.
We lookup stack_id = 42 in the stack map and we get the value []uintptr{0x123, 0x456, 0x789} (if the stack has a depth of 3).
We look at the target process to resolve those 3 addresses. And return the stack []string{"getchar", "readline", "main"} to the user.

flyth · 2024-04-02T15:57:44Z

@flyth I don't know where to add the code after the refactoring. I see field accessors have a method Set() which takes []byte as input, but in the case of a stack passed in a ring buffer, the serialized bytes are not enough because we need to have access to the stack map and do a bpf(BPF_MAP_LOOKUP_ELEM). Could you shed some light on this?

I think I need more info here - but it sounds like you would want to receive the stackID from the ring buffer, do the lookup in userspace and then send whatever you receive through the DataSource (and not (just) the stackID).

Yes.

In the ring buffer, we get i.e. stack_id = 42.

We lookup stack_id = 42 in the stack map and we get the value []uintptr{0x123, 0x456, 0x789} (if the stack has a depth of 3).

We look at the target process to resolve those 3 addresses. And return the stack []string{"getchar", "readline", "main"} to the user.

What should the UX be for now, then? Just a list of function names returned as string in a "stack" field - in JSON + columns?

Here's how I'd approach it (afaiu the PR):

prep: this needs to be extended to support maps in a generic way (right now we do it for hardcoded names, but we should also populate maps with a given prefix, IMHO); for a quick test, just add another exception a couple of lines below
create a new operator named "StackOperator" or something like it; make sure the operator registers itself on init() of the file like most other operators do and include it where other operators are included (for testing, could manually add in both occurrences in cmd/common/oci.go)
need to implement DataOperator interface and additionally the DataOperatorInstance interface on another type that can hold the stack map and is returned by the InstantiateDataOperator() func
in the InstantiateDataOperator(), check for the stack-map reference (and create/set/cache it) by using GetVar("mapname") & SetVar(); if it's there, also check DataSources in gadgetCtx for stackId (type:gadget_stack_id); if found, add a new field called "stack" to the DataSource and cache the accessor.
in PreStart(), subscribe to DataSources with the stackId; in the callbacks, extract stackId using the accessor, do the map lookup + extract info from the target process and set info using the accessor for the "stack" field (for initial version, I'd suggest just a concatenated, comma-separated string)
in Stop(), destroy the map.

See other operators like OciHandler, pkg/datasource/compat (used by KubeManager + LocalManager) and pkg/operators/formatters for general info on how operators are used.

I just talked to @alban and he said it might be much more complex than what I proposed since there could be multiple stack maps per gadget run (per container + per target lib version...). So if you prefer, feel free to go ahead and do a PoC that just prints the results to stdout and we'll find a way to integrate it properly afterwards.

alban · 2024-04-03T16:52:48Z

What should the UX be for now, then? Just a list of function names returned as string in a "stack" field - in JSON + columns?

I think yes. There could be an option for a multiline output, such as bcc's tcpdrop tool:
https://github.com/iovisor/bcc/blob/6a5602cef2ebd97c351554d53a4f95532db6a568/tools/tcpdrop_example.txt#L7-L38

i-Pear · 2024-04-25T08:06:28Z

I've pushed a demo version where IG can currently read the stack and print it to stdout. The architectural design of this version is likely to be unreasonable; it's just my initial idea, I'll later look again into the comments above. Currently, I'm using a "Converter" to translate stackID into stack data.

pkg/operators/ebpf/converters.go

pkg/datasource/compat/wrapper.go

i-Pear · 2024-05-03T14:36:05Z

Also added trace_capabilities gadget, see #173 and #1319. But I don't know how to test it.

Update: tested with --host, and it looks well.

pkg/operators/ebpf/converters.go

alban · 2024-05-12T20:28:40Z

pkg/operators/ebpf/ebpf.go

+			ValueSize:  8 * PerfMaxStackDepth,
+			MaxEntries: 10000,


PerfMaxStackDepth and MAX_ENTRIES should be kept in sync between the Go source and the header file.

Add a comment // Keep in sync with ....

I am also concerned with the growing API surface between ig and the gadget. Since third-party gadgets and ig are not to be released in lock-steps, we could have a gadget compiled with stack_map.h from an older version of ig, and then run it with a newer version of ig.

It might not matter for a map of type StackTrace, but for other kind of maps, this can cause problems. So the code pattern makes me uneasy.

I think we can accept it for now, but I am hoping we can use ebpf extensions later on.

cc @mauriciovasquezbernal

I agree, since I have no experience in ebpf extensions, I will try it with USDT arguments first.

i-Pear · 2024-05-13T14:15:14Z

TODO: use ebpf extension to refactor this

i-Pear · 2024-05-16T16:52:04Z

The failure in documentation checks could be ignored, once this got merged, the link will be available.

i-Pear · 2024-05-18T15:12:57Z

As extension: https://github.com/i-Pear/inspektor-gadget/tree/stack_map_as_extension

i-Pear · 2024-05-23T17:42:26Z

Changed kernel stack map name to ig_kstack.

alban

Thanks!

gadgets/trace_capabilities/gadget.yaml

gadgets/trace_capabilities/program.bpf.c

include/gadget/kernel_stack_map.h

pkg/operators/ebpf/converters.go

alban · 2024-05-28T13:06:35Z

gadgets/trace_capabilities/program.bpf.c

+		bpf_map_update_elem(&current_syscall, &pid_tgid, &sc_ctx,
+				    BPF_ANY);


Does it work with multithreaded applications running execve?

See:

container-hook: fix sys_exit_execve from thread #2804

trace exec: fix sys_exit_execve from thread #2454

image-based trace exec gadget: fix sys_exit_execve from thread #2475

Could we put the code in #2475 in a common header? For example, we can add fix_execve.h and provide two helper functions hook_execve_enter and hook_execve_exit. Also, seems the execveat syscall needs the same fix.

Unfortunately the implementation is slightly different for container-hook vs trace-exec. I didn't find a way to have common code... Maybe it's easier to fix it separately in this trace-capabilities gadget, and do the refactoring in a separate PR.

About execveat: it seems that the trace-exec gadgets (builtin and image-based) miss events from execveat, but that's a separate bug. The ebpf maps are not getting full in that case.

The implementations in trace-exec and trace_capabilities should be the same? Container-hook might have different requirements, but I just want to provide a common implementation for gadgets.

Opened #2965

alban · 2024-05-28T13:16:59Z

gadgets/trace_capabilities/program.bpf.c

+	if (LINUX_KERNEL_VERSION >= KERNEL_VERSION(5, 1, 0)) {
+		event->audit = (ap->cap_opt & CAP_OPT_NOAUDIT) == 0;
+		event->insetid = (ap->cap_opt & CAP_OPT_INSETID) != 0;
+	} else {
+		event->audit = ap->cap_opt;
+		event->insetid = -1;


For a future PR:

It might be possible to use bpf_core_type_matches instead of LINUX_KERNEL_VERSION:

This comes from: torvalds/linux@c1a85a0

$ sudo bpftool btf dump id 1 format c

union security_list_options { - int (*capable)(const struct cred *, struct user_namespace *, int, int); + int (*capable)(const struct cred *, struct user_namespace *, int, unsigned int); }

I don't recommend to use bpf_core_type_matches here, because struct cred also changed in Linux 6.7-rc6 [1].

Because the vmlinux.h in IG is version 6.6, and my local environment is 6.10, I spent a lot of time looking for why the bpf_core_type_matches returns false. In such cases, seems we need to trace all the nested structures, and provide headers for each version. What do you think?

[1] torvalds/linux@f8fa5d7

I guess your motivation for using bpf_core_type_matches is that some distributions may pick patches or do certain backports, making it better to judge the structure than the version number.

However, we cannot predict which of the two patches (1: updating struct cred, 2: updating union security_list_options) is included in the user's Linux distribution. This leads to four scenarios, with potentially more combinations in the future.

docs/reference/gadget-helper-api.md

Co-authored-by: Alban Crequy <albancrequy@linux.microsoft.com> Signed-off-by: Tianyi Liu <i.pear@outlook.com>

i-Pear force-pushed the stack_map branch 3 times, most recently from d9819ef to 1f27ea4 Compare April 25, 2024 07:55

i-Pear force-pushed the stack_map branch 3 times, most recently from 4bd9e2a to 5a0f4a7 Compare April 28, 2024 15:53

i-Pear changed the title ~~[WIP] Support stack map~~ [WIP] Support kernel stack map Apr 29, 2024

flyth reviewed Apr 29, 2024

View reviewed changes

pkg/operators/ebpf/converters.go Outdated Show resolved Hide resolved

flyth reviewed Apr 29, 2024

View reviewed changes

pkg/datasource/compat/wrapper.go Outdated Show resolved Hide resolved

i-Pear force-pushed the stack_map branch 2 times, most recently from 489974a to 29ad10a Compare April 30, 2024 12:15

i-Pear changed the title ~~[WIP] Support kernel stack map~~ Support kernel stack map Apr 30, 2024

i-Pear marked this pull request as ready for review April 30, 2024 12:15

i-Pear requested review from mauriciovasquezbernal and alban as code owners April 30, 2024 12:15

i-Pear requested a review from flyth April 30, 2024 12:15

i-Pear force-pushed the stack_map branch from 29ad10a to 321cc67 Compare May 3, 2024 10:39

i-Pear force-pushed the stack_map branch from 5d0a72b to 344ce7e Compare May 12, 2024 16:30

alban reviewed May 12, 2024

View reviewed changes

i-Pear force-pushed the stack_map branch from 344ce7e to 7837eb3 Compare May 13, 2024 14:14

i-Pear requested a review from alban May 13, 2024 14:15

i-Pear force-pushed the stack_map branch from 7837eb3 to 7993603 Compare May 14, 2024 02:46

i-Pear force-pushed the stack_map branch from 7993603 to 9194fcd Compare May 18, 2024 15:12

i-Pear force-pushed the stack_map branch 3 times, most recently from 6857285 to 77e7552 Compare May 23, 2024 17:27

i-Pear force-pushed the stack_map branch 2 times, most recently from b99d6b5 to 1d1e7e6 Compare May 27, 2024 02:20

alban reviewed May 28, 2024

View reviewed changes

alban reviewed May 29, 2024

View reviewed changes

docs/reference/gadget-helper-api.md Show resolved Hide resolved

i-Pear force-pushed the stack_map branch from 1d1e7e6 to bfe331b Compare May 29, 2024 18:27

i-Pear and others added 3 commits May 31, 2024 00:53

pkg/operators/ebpf: Support kernel stack map

e84cde1

Co-authored-by: Alban Crequy <albancrequy@linux.microsoft.com> Signed-off-by: Tianyi Liu <i.pear@outlook.com>

gadgets/trace_tcpdrop: Add kernel stack field

259cd0e

Co-authored-by: Alban Crequy <albancrequy@linux.microsoft.com> Signed-off-by: Tianyi Liu <i.pear@outlook.com>

gadgets: Add trace_capabilities

a63d66e

Co-authored-by: Alban Crequy <albancrequy@linux.microsoft.com> Signed-off-by: Tianyi Liu <i.pear@outlook.com>

i-Pear force-pushed the stack_map branch from bfe331b to a63d66e Compare May 30, 2024 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support kernel stack map #2671

Support kernel stack map #2671

i-Pear commented Apr 1, 2024

i-Pear commented Apr 1, 2024

i-Pear commented Apr 1, 2024

alban commented Apr 1, 2024

i-Pear commented Apr 2, 2024

alban commented Apr 2, 2024

flyth commented Apr 2, 2024

alban commented Apr 2, 2024

flyth commented Apr 2, 2024

alban commented Apr 3, 2024

i-Pear commented Apr 25, 2024

i-Pear commented May 3, 2024 •

edited

alban May 12, 2024

i-Pear May 13, 2024

i-Pear commented May 13, 2024

i-Pear commented May 16, 2024

i-Pear commented May 18, 2024

i-Pear commented May 23, 2024

alban left a comment

alban May 28, 2024

i-Pear May 30, 2024

alban Jun 3, 2024

i-Pear Jun 4, 2024

i-Pear Jun 4, 2024

alban May 28, 2024

i-Pear Jun 7, 2024

i-Pear Jun 7, 2024

		bpf_map_update_elem(&current_syscall, &pid_tgid, &sc_ctx,
		BPF_ANY);

Support kernel stack map #2671

Are you sure you want to change the base?

Support kernel stack map #2671

Conversation

i-Pear commented Apr 1, 2024

i-Pear commented Apr 1, 2024

i-Pear commented Apr 1, 2024

alban commented Apr 1, 2024

i-Pear commented Apr 2, 2024

alban commented Apr 2, 2024

flyth commented Apr 2, 2024

alban commented Apr 2, 2024

flyth commented Apr 2, 2024

alban commented Apr 3, 2024

i-Pear commented Apr 25, 2024

i-Pear commented May 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

i-Pear commented May 13, 2024

i-Pear commented May 16, 2024

i-Pear commented May 18, 2024

i-Pear commented May 23, 2024

alban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

i-Pear commented May 3, 2024 •

edited