-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support kernel stack map #2671
base: main
Are you sure you want to change the base?
Support kernel stack map #2671
Conversation
Currently, I am unsure about who should hold the |
The method mentioned in #2553 seems overly complicated; I am unsure why it requires mapping fd:
Furthermore, its purpose differs from that of |
I have not thought of defining the ebpf map in a header ( With your method, there can be only one stack map per gadget, it has to be named "gadget_stack_trace_map" and that's part of the ABI between IG and the gadget. That ABI would need to be documented on gadget-helper-api.md. The headers in
I added Also, note that a uprobe program such as:
could be attached to several containers with different versions of libc. So the address of a function on the stack has to be interpreted differently depending on the libc version. This can be resolved by using a different stack map for each uprobe attachment. In this case, it is useful to have the field If you look at the pkg/networktracer/tracer.go as example, it has its own @flyth I don't know where to add the code after the refactoring. I see field accessors have a method |
I may lack the necessary knowledge, but why can't stacks from different libraries & different gadgets be mixed within the same stack map? As I understand, the stack map simply associates a stackID with a stack (a set of addresses). In the example eBPF program attached to this PR, stackIDs are recorded alongside PIDs, and then the frontend uses the stackID to find the corresponding stack data, interpreting it using symbols associated with the PID. This doesn't seem to cause any confusion. |
Correct.
Do we have to use the PID? I am concerned that if the target process terminates before ig could open |
I think I need more info here - but it sounds like you would want to receive the stackID from the ring buffer, do the lookup in userspace and then send whatever you receive through the DataSource (and not (just) the stackID). |
Yes.
|
What should the UX be for now, then? Just a list of function names returned as string in a "stack" field - in JSON + columns? Here's how I'd approach it (afaiu the PR):
See other operators like I just talked to @alban and he said it might be much more complex than what I proposed since there could be multiple stack maps per gadget run (per container + per target lib version...). So if you prefer, feel free to go ahead and do a PoC that just prints the results to stdout and we'll find a way to integrate it properly afterwards. |
I think yes. There could be an option for a multiline output, such as bcc's tcpdrop tool: |
d9819ef
to
1f27ea4
Compare
I've pushed a demo version where IG can currently read the stack and print it to |
4bd9e2a
to
5a0f4a7
Compare
489974a
to
29ad10a
Compare
pkg/operators/ebpf/ebpf.go
Outdated
ValueSize: 8 * PerfMaxStackDepth, | ||
MaxEntries: 10000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PerfMaxStackDepth and MAX_ENTRIES should be kept in sync between the Go source and the header file.
Add a comment // Keep in sync with ...
.
I am also concerned with the growing API surface between ig and the gadget. Since third-party gadgets and ig are not to be released in lock-steps, we could have a gadget compiled with stack_map.h
from an older version of ig, and then run it with a newer version of ig.
It might not matter for a map of type StackTrace, but for other kind of maps, this can cause problems. So the code pattern makes me uneasy.
I think we can accept it for now, but I am hoping we can use ebpf extensions later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, since I have no experience in ebpf extensions, I will try it with USDT arguments first.
TODO: use ebpf extension to refactor this |
The failure in documentation checks could be ignored, once this got merged, the link will be available. |
6857285
to
77e7552
Compare
Changed kernel stack map name to |
b99d6b5
to
1d1e7e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
bpf_map_update_elem(¤t_syscall, &pid_tgid, &sc_ctx, | ||
BPF_ANY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work with multithreaded applications running execve?
See:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we put the code in #2475 in a common header? For example, we can add fix_execve.h
and provide two helper functions hook_execve_enter
and hook_execve_exit
. Also, seems the execveat
syscall needs the same fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the implementation is slightly different for container-hook vs trace-exec. I didn't find a way to have common code... Maybe it's easier to fix it separately in this trace-capabilities gadget, and do the refactoring in a separate PR.
About execveat: it seems that the trace-exec gadgets (builtin and image-based) miss events from execveat, but that's a separate bug. The ebpf maps are not getting full in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementations in trace-exec
and trace_capabilities
should be the same? Container-hook
might have different requirements, but I just want to provide a common implementation for gadgets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #2965
if (LINUX_KERNEL_VERSION >= KERNEL_VERSION(5, 1, 0)) { | ||
event->audit = (ap->cap_opt & CAP_OPT_NOAUDIT) == 0; | ||
event->insetid = (ap->cap_opt & CAP_OPT_INSETID) != 0; | ||
} else { | ||
event->audit = ap->cap_opt; | ||
event->insetid = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a future PR:
It might be possible to use bpf_core_type_matches
instead of LINUX_KERNEL_VERSION
:
This comes from: torvalds/linux@c1a85a0
$ sudo bpftool btf dump id 1 format c
union security_list_options {
- int (*capable)(const struct cred *, struct user_namespace *, int, int);
+ int (*capable)(const struct cred *, struct user_namespace *, int, unsigned int);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recommend to use bpf_core_type_matches
here, because struct cred
also changed in Linux 6.7-rc6 [1].
Because the vmlinux.h
in IG is version 6.6
, and my local environment is 6.10
, I spent a lot of time looking for why the bpf_core_type_matches
returns false. In such cases, seems we need to trace all the nested structures, and provide headers for each version. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess your motivation for using bpf_core_type_matches
is that some distributions may pick patches or do certain backports, making it better to judge the structure than the version number.
However, we cannot predict which of the two patches (1: updating struct cred
, 2: updating union security_list_options
) is included in the user's Linux distribution. This leads to four scenarios, with potentially more combinations in the future.
Co-authored-by: Alban Crequy <albancrequy@linux.microsoft.com> Signed-off-by: Tianyi Liu <i.pear@outlook.com>
Co-authored-by: Alban Crequy <albancrequy@linux.microsoft.com> Signed-off-by: Tianyi Liu <i.pear@outlook.com>
Co-authored-by: Alban Crequy <albancrequy@linux.microsoft.com> Signed-off-by: Tianyi Liu <i.pear@outlook.com>
Solves #2553
It's with some hacking, and only for test purpose.