features: add HaveProgramHelper API #375

rgo3 · 2021-08-13T09:21:13Z

HaveProgHelper(pt ebpf.ProgramType, helper asm.BuiltinFunc) error
allows to probe the available BPF helpers to a given BPF program
type. Probe results are cached and run at most once.

Signed-off-by: Robin Gögge r.goegge@gmail.com

rgo3 · 2021-08-13T09:27:45Z

Compared to bpftool I just check for EACCES and EINVAL for now instead of checking the verifier log. Not sure if that suffices and if there is an (future) edge case that could break this and result in a false negative/positive. cc @ti-mo @qmonnet

qmonnet · 2021-08-13T11:07:52Z

Did you compare the results with bpftool's (minus the ones for the few program types that aren't working properly on bpftool)? Do you get the same thing?

rgo3 · 2021-08-13T15:43:48Z

Initially I just checked a few helpers for equality randomly, but now did the full comparison and I actually come up short for like 5 helper calls regarding the bpf_get_current_task_btf helper. So I guess just checking for EINVAL isn't enough :(...

qmonnet · 2021-08-13T15:49:53Z

So there's just the bpf_get_current_task_btf() that is not detected as available with your PR, is this correct? I suppose your kernel is >= 5.11? (Asking because it would be nice to be sure where this EINVAL comes from)

rgo3 · 2021-08-13T15:54:09Z

Yes, just bpf_get_current_task_btf(). I think the EINVAL comes from the verifier here. I modified bpftool so it outputs the verifier log and the error message for bpf_get_current_task_btf() is: invalid return type 11 of func bpf_get_current_task_btf#158

qmonnet · 2021-08-13T16:05:09Z

A few lines above then, because (again) there's no BTF. invalid return type is not the same as unknown return type. The unknown return type case would mean the helper would never pass the verifier. Anyway :).

Yeah, in that case it's not sure there's a way other than parsing the log. We could start making cases for helpers too, but this will be harder to maintain I'm afraid.

rgo3 · 2021-08-13T16:12:54Z

Oh yeah I selected the wrong line 🙈.
Timo and I talked about this yesterday, how likely are the verifier messages to change aka how stable is it to use them as an API?

qmonnet · 2021-08-13T16:17:01Z

Tricky question. But then the error codes from the verifier are hardly a stable API either 🤷.

lmb · 2021-08-23T15:34:26Z

Drive by observation: at least for error codes developers tend to try to preserve them. The same can't be said for the verifier log, and I think we shouldn't rely on it.

rgo3 · 2021-10-19T17:50:19Z

Hey! 👋 Apologies for letting this PR go stale a bit, but I would like to pick it up again so that I eventually can make use of it in cilium.

To sum up the current status: Just relying on error codes can result in false negatives, e.g. bpf_get_current_task_btf(). We could rely on verifier log output, but I agree with Lorenz that verifier output is more likely subject to change (But I am new, so that is just a feeling to be honest).

To get this PR merged I can think of two possible solutions:

If we want to rely on error codes, we would have to make sure EINVAL is only ever returned when the helper isn't available. We can do this by adding exception cases to various helper probes, similar to what we have with the other probe APIs that were merged. As there are so many helpers, the maintenance burden of these exception cases might not actually be better than relying on the verifier output, because we have to cover more code paths of the verifier now.
Go with verifier log, accepting the risk that the messages might change, but this API then only codes against a few specific code paths in the verifier reducing complexity of the probes.

I hope @ti-mo and @lmb can give their opinions on the two solutions I proposed or maybe come up with an even better option.

lmb · 2021-10-20T15:06:02Z

If at all possible we should avoid having to maintain "quirks" for individual helpers. Some ideas:

Parse kernel BTF and look for the appropriate function signatures.

Needs extra code for kernels without BTF (but something like btfhub could help)
Can't tell you whether a helper is supported for a specific program type

Parse /proc/kallsyms and look for function / variable names.

Dead simple, works on old kernels
Breaks due to function renaming

$ grep map_lookup_elem /proc/kallsyms 
0000000000000000 t map_lookup_elem
0000000000000000 T bpf_map_lookup_elem
0000000000000000 t __htab_map_lookup_elem
0000000000000000 t htab_percpu_map_lookup_elem
0000000000000000 t htab_lru_map_lookup_elem_sys
0000000000000000 t htab_map_lookup_elem
0000000000000000 t htab_of_map_lookup_elem
0000000000000000 t htab_lru_percpu_map_lookup_elem
0000000000000000 t htab_lru_map_lookup_elem
0000000000000000 T bpf_fd_htab_map_lookup_elem
0000000000000000 t fd_array_map_lookup_elem
0000000000000000 t array_map_lookup_elem
0000000000000000 t array_of_map_lookup_elem
0000000000000000 t percpu_array_map_lookup_elem
0000000000000000 T bpf_fd_array_map_lookup_elem
0000000000000000 t queue_stack_map_lookup_elem
0000000000000000 t ringbuf_map_lookup_elem
0000000000000000 t dev_map_lookup_elem
0000000000000000 T __dev_map_lookup_elem
0000000000000000 t cpu_map_lookup_elem
0000000000000000 T __cpu_map_lookup_elem
0000000000000000 t stack_map_lookup_elem
0000000000000000 t bpf_struct_ops_map_lookup_elem
0000000000000000 t xsk_map_lookup_elem
0000000000000000 t xsk_map_lookup_elem_sys_only
0000000000000000 D bpf_map_lookup_elem_proto

Maye a combination of "stupid" BPF program probes and either one of these approaches works?

ti-mo · 2021-10-25T13:06:12Z

@rgo3 Welcome back!

If at all possible we should avoid having to maintain "quirks" for individual helpers.

This should indeed be the general approach, but there will be exceptions. :) (as there were with the other probe types)

Parse kernel BTF and look for the appropriate function signatures.
* Needs extra code for kernels without BTF (but something like btfhub could help)

Requiring an external service/db just to get probes to work doesn't sound very appealing IMO.

Some helpers (like bpf_snprintf_btf, to name one) rely on BTF, so absence of BTF support in the kernel is already a signal that this helper won't be available. This is case-by-case, though.

Parse /proc/kallsyms and look for function / variable names.

Expensive, but generally effective, have resorted to this before. Good last-resort solution for older kernels.

Maye a combination of "stupid" BPF program probes and either one of these approaches works?

Naturally, but this is what eventually causes death by a 1000 papercuts, as the fallback detection strategy required will vary depending on the helper. This can only be decided when actually implementing these, so let's discuss on actual code.

lmb · 2021-10-25T15:17:48Z

Naturally, but this is what eventually causes death by a 1000 papercuts

The answer to that is to not provide arbitrary helper feature tests, my preferred solution :P

rgo3 · 2021-10-27T14:29:02Z

Is it an option to move forward with this PR keeping it the way it is currently, basically not trying to create a catch-em-all solution right from the start? We already cover lots of possible progtype-helper combinations the way it is, which would suffice for its first use-case in cilium. If people have future use-cases with combinations we don't probe accurately at the moment, we can defer adding fallback strategies or extra probes until then?

As a side node, I played a bit with /proc/kallsyms and the results differ from bpftools output. E.g.:

$ grep bpf_get_current_task_btf /proc/kallsyms
0000000000000000 T bpf_get_current_task_btf
0000000000000000 r bpf_get_current_task_btf_proto

bpftool shows this helper to be available to multiple prog types on my kernel while /proc/kallsyms doesn't.

ti-mo · 2021-11-02T11:12:51Z

Is it an option to move forward with this PR keeping it the way it is currently, basically not trying to create a catch-em-all solution right from the start?

I'm okay with the initial implementation not being perfect, as long as we have a way of documenting what the errata are. Preferably in tests, with a GH issue to track the follow-up work. That way, if anyone has the need for a particular broken probe, they can grab the issue and go ahead with the implementation.

lmb · 2022-02-18T14:59:20Z

What's going to happen to this?

rgo3 · 2022-02-18T16:50:02Z

I'd still like to get some version of this in, considering this would be the most useful when trying to integrate the features API into cilium (which was the original goal of my GSoC project).

I got stuck looking for ways to create tests that nicely show what the errata are because of all the possible combinations between prog types and available helpers on respective kernel versions. I could add tests per prog type, start with the prog types I'd need to use this in cilium and add more over time? Like that I wouldn't have to test the full matrix of possibilities for now but only a smaller subset. If yes I'd try to navigate along this list of versions and available helpers per prog type to incrementally add tests?

If that's not a clean enough solution to move forward with this PR, I might need to rethink how we could add helper probes to this API.

rgo3 · 2022-05-12T10:22:45Z

Alright, I think it's finally ready to graduate from its draft status.
There's still a test-case that I added that's failing, but that's related to the kernel config in CI:

--- FAIL: TestHaveProgHelper (0.04s)
    ...
    --- FAIL: TestHaveProgHelper/CGroupSockAddr/FnGetCgroupClassid (0.00s)
    ...

Locally this test works for me, as my kernel has this helper type compiled in.

Additionally there are two more test-cases I couldn't add because the lib doesn't support those particular builtins yet (just a matter of updating the BuiltinFunc enum I think). I've mentioned those in comments above the test-cases.

I've also built a small tool to compare the features API to bpftools feature probes. Currently just living on a branch of my fork of the repo. Running that tool yields following output:

❯ go run -exec sudo cmd/bpftoolcmp/main.go
Running bpftool -j feature ...

Comparing available program types ...

Comparing available helper functions ...
    False negative: API got different result than bpftool for: RawTracepoint/FnGetFuncIp
    False negative: API got different result than bpftool for: Syscall/FnSkStorageGet
    False negative: API got different result than bpftool for: Syscall/FnSkStorageDelete
    False negative: API got different result than bpftool for: Syscall/FnDPath
    False negative: API got different result than bpftool for: Syscall/FnGetFuncIp
    False negative: API got different result than bpftool for: TracePoint/FnGetFuncIp
    False negative: API got different result than bpftool for: PerfEvent/FnGetFuncIp
    False negative: API got different result than bpftool for: RawTracepointWritable/FnGetFuncIp

I haven't fully investigated why those false negatives exist, but I'd create an issue to track and investigate if we choose to accept that these differ from bpftool currently.
This output excludes false negatives for program types LSM , Extension, StructOps and Tracing because we can't probe those prog types correctly.

cc @florianl

lmb · 2022-05-12T15:26:18Z

Could bpftoolcmp just be part of the testsuite? We could skip tests if the binary doesn't exist.

rgo3 · 2022-05-12T15:44:42Z

So instead of having bpftoolcmp add a test like e.g. TestBPFToolCompare? And skip on missing binary and known exceptions?

florianl

Thanks for bringing this forward! 👍
The PR as is works fine here.

For possible future work wrt bpftoolcmp I would also vote for having this in a test case like TestBPFToolCompare. Rather than calling an external tool that exec.Cmd() another external tool (bpftool).

rgo3 · 2022-05-13T10:50:17Z

Sorry, had to repush as I found some typos...

As for the FnGetFuncIp helper probes: I think these are actually false positives on bpftools/libbpfs end. The verifier has a check that only allows Kprobe and Tracing progs to call it. This API returns an error as ENOTSUPP is returned from the verifier. libbpf misses it because checking the verifier log output doesn't match in this case. cc @qmonnet

ti-mo

Love it.

Could bpftoolcmp just be part of the testsuite? We could skip tests if the binary doesn't exist.

This can be done in a follow-up. 👍

features/prog.go

features/prog_test.go

lmb · 2022-05-18T13:04:12Z

Bit of a late bike shed: can you call the function HaveProgramHelper?

rgo3 · 2022-05-18T16:04:16Z

I can, sort of diverges from HaveProgType then though. Do we care about that?

lmb · 2022-05-19T08:10:38Z

We can fix that up by introducing HaveProgramType and depreciating the shortened version.

`HaveProgramHelper(pt ebpf.ProgramType, helper asm.BuiltinFunc) error` allows to probe the available BPF helpers to a given BPF program type. Probe results are cached and run at most once. Signed-off-by: Robin Gögge <r.goegge@gmail.com>

In order to be in line with other naming schemes throughout the library this commit deprecates HaveProgType() in favor of HaveProgamType(). Signed-off-by: Robin Gögge <r.goegge@gmail.com>

rgo3 force-pushed the HaveProgHelpersAPI branch from a4253d4 to 994a15b Compare May 3, 2022 21:01

rgo3 force-pushed the HaveProgHelpersAPI branch from 994a15b to d858fa2 Compare May 12, 2022 10:18

rgo3 marked this pull request as ready for review May 12, 2022 10:19

lmb requested a review from ti-mo May 12, 2022 15:25

florianl approved these changes May 12, 2022

View reviewed changes

rgo3 force-pushed the HaveProgHelpersAPI branch from d858fa2 to fa11cb7 Compare May 13, 2022 10:50

ti-mo mentioned this pull request May 16, 2022

CI: Test against kernel 5.18 #668

Merged

1 task

ti-mo approved these changes May 16, 2022

View reviewed changes

rgo3 force-pushed the HaveProgHelpersAPI branch from fa11cb7 to 916d9b1 Compare May 17, 2022 15:47

rgo3 force-pushed the HaveProgHelpersAPI branch from 916d9b1 to 4dbee6c Compare May 17, 2022 17:26

rgo3 added 2 commits May 19, 2022 11:00

features: add HaveProgramHelper API

dda07c0

`HaveProgramHelper(pt ebpf.ProgramType, helper asm.BuiltinFunc) error` allows to probe the available BPF helpers to a given BPF program type. Probe results are cached and run at most once. Signed-off-by: Robin Gögge <r.goegge@gmail.com>

features: rename HaveProgType API

83afbae

In order to be in line with other naming schemes throughout the library this commit deprecates HaveProgType() in favor of HaveProgamType(). Signed-off-by: Robin Gögge <r.goegge@gmail.com>

rgo3 force-pushed the HaveProgHelpersAPI branch from 4dbee6c to 83afbae Compare May 19, 2022 09:25

rgo3 changed the title ~~features: add HaveProgHelper API~~ features: add HaveProgramHelper API May 19, 2022

ti-mo merged commit 951bb28 into cilium:master May 23, 2022

rgo3 mentioned this pull request May 23, 2022

Comparing bpf helper probe results with bpftool #681

Closed

features: add HaveProgramHelper API #375

features: add HaveProgramHelper API #375

Conversation

rgo3 commented Aug 13, 2021

Uh oh!

rgo3 commented Aug 13, 2021

Uh oh!

qmonnet commented Aug 13, 2021

Uh oh!

rgo3 commented Aug 13, 2021

Uh oh!

qmonnet commented Aug 13, 2021

Uh oh!

rgo3 commented Aug 13, 2021

Uh oh!

qmonnet commented Aug 13, 2021

Uh oh!

rgo3 commented Aug 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qmonnet commented Aug 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lmb commented Aug 23, 2021

Uh oh!

rgo3 commented Oct 19, 2021

Uh oh!

lmb commented Oct 20, 2021

Uh oh!

ti-mo commented Oct 25, 2021

Uh oh!

lmb commented Oct 25, 2021

Uh oh!

rgo3 commented Oct 27, 2021

Uh oh!

ti-mo commented Nov 2, 2021

Uh oh!

lmb commented Feb 18, 2022

Uh oh!

rgo3 commented Feb 18, 2022

Uh oh!

rgo3 commented May 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lmb commented May 12, 2022

Uh oh!

rgo3 commented May 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

florianl left a comment

Choose a reason for hiding this comment

Uh oh!

rgo3 commented May 13, 2022

Uh oh!

ti-mo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lmb commented May 18, 2022

Uh oh!

rgo3 commented May 18, 2022

Uh oh!

lmb commented May 19, 2022

Uh oh!

rgo3 commented Aug 13, 2021 •

edited

Loading

qmonnet commented Aug 13, 2021 •

edited

Loading

rgo3 commented May 12, 2022 •

edited

Loading

rgo3 commented May 12, 2022 •

edited

Loading