Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix haveProgTestRun on kernel version 5.15.65 #788

Merged
merged 1 commit into from Sep 14, 2022

Conversation

player-two
Copy link

A recent patch0 to the kernel prevents bpf_prog_test_run_skb() from running the bpf program with with an empty skb. This makes the existing feature check fail, so to fix it an additional byte of input data is needed.

I tried tracing through the kernel code, and to be honest I'm only partially confident that I understand the issue. I can't quite find the place where the skb->len is getting reset to zero, because before the call to convert___skb_to_skb() is a call to __skb_put() with the size of the input data. The patch I linked is essentially a good guess at the root cause.

I am confident that this issue is introduced in 5.15.65 (changelog).

root@localhost:~/ebpf# uname -r
5.15.65
root@localhost:~/ebpf# go test -v -run=TestHaveProgTestRun ./...
=== RUN   TestHaveProgTestRun
    prog_test.go:539: Feature 'BPF_PROG_TEST_RUN' isn't supported even though kernel v5.15.65 is newer than v4.12
--- FAIL: TestHaveProgTestRun (0.00s)

I did take the time to build 5.15.64 and run the test above; it passed without this patch.

That test also passes on 5.15.65 with this patch applied. It looks like the CI for this repo won't catch the issue because it is pinned to 5.15.19 (ref).

A recent patch[0] to the kernel prevents bpf_prog_test_run_skb() from running
the bpf program with with an empty skb.  This makes the existing feature check
fail, so to fix it an additional byte of input data is needed.

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fd1894224407c484f652ad456e1ce423e89bb3eb
@lmb
Copy link
Collaborator

lmb commented Sep 8, 2022

What a bummer. Does 5.15.64 + this patch also cause the same error? If yes we should take this upstream, there are tons of places that use an empty 14 byte buffer.

@player-two
Copy link
Author

Does 5.15.64 + this patch also cause the same error?

It does :/

Here's from the master branch of this repo:

root@localhost:~/ebpf# uname -r
5.15.64-dev
root@localhost:~/ebpf# go test -v -run=TestHaveProgTestRun ./...
=== RUN   TestHaveProgTestRun
    prog_test.go:539: Feature 'BPF_PROG_TEST_RUN' isn't supported even though kernel v5.15.64 is newer than v4.12
--- FAIL: TestHaveProgTestRun (0.00s)

@NHAS
Copy link

NHAS commented Sep 14, 2022

I believe I am also getting stung by this same issue:

ebpf_test.go:50: program failed can't test program: BPF_PROG_TEST_RUN not supported (requires >= v4.12)
FAIL
FAIL    wag/router      0.003s
FAIL

Despite being on 5.19:

Linux  5.19.8-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Thu, 08 Sep 2022 18:07:40 +0000 x86_64 GNU/Linux

@lmb
Copy link
Collaborator

lmb commented Sep 14, 2022

Sorry for the delay, I've been prepping for Linux Plumbers conference. Thanks for the fix.

I've also sent a mail upstream to hopefully avoid this from breaking more users: https://lore.kernel.org/bpf/20220914111936.19881-1-oss@lmb.io/T/#t

@lmb lmb merged commit a38fb6b into cilium:master Sep 14, 2022
@player-two player-two deleted the fix-5.15.65 branch September 14, 2022 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants