Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix arm64 testsuite failures #745

Merged
merged 7 commits into from Jul 27, 2022
Merged

Fix arm64 testsuite failures #745

merged 7 commits into from Jul 27, 2022

Conversation

lmb
Copy link
Collaborator

@lmb lmb commented Jul 21, 2022

Tests now pass on arm64:

$ go test -exec sudo ./...
ok  	github.com/cilium/ebpf	0.473s
ok  	github.com/cilium/ebpf/asm	0.014s
ok  	github.com/cilium/ebpf/btf	0.671s
ok  	github.com/cilium/ebpf/cmd/bpf2go	0.717s
ok  	github.com/cilium/ebpf/cmd/bpf2go/test	0.040s
ok  	github.com/cilium/ebpf/features	0.143s
ok  	github.com/cilium/ebpf/internal	0.024s
?   	github.com/cilium/ebpf/internal/cmd/gentypes	[no test files]
ok  	github.com/cilium/ebpf/internal/epoll	1.020s
ok  	github.com/cilium/ebpf/internal/sys	0.015s
?   	github.com/cilium/ebpf/internal/testutils	[no test files]
?   	github.com/cilium/ebpf/internal/unix	[no test files]
ok  	github.com/cilium/ebpf/link	1.653s
ok  	github.com/cilium/ebpf/perf	0.260s
ok  	github.com/cilium/ebpf/ringbuf	0.108s
ok  	github.com/cilium/ebpf/rlimit	0.007s

Best review on a per-commit basis.

Updates #266

@lmb
Copy link
Collaborator Author

lmb commented Jul 21, 2022

@rgo3 can you please take a look at e2d485b ?

@timo 90411de needs your attention, I think it might make tests fail on older kernels.

@rgo3
Copy link
Contributor

rgo3 commented Jul 21, 2022

@lmb The HaveProgramHelper API in the features package also returns a few more unwrapped errors, specifically:

  • func validateProgramHelper
  • func haveProgramHelper

@lmb
Copy link
Collaborator Author

lmb commented Jul 22, 2022

Thanks! Just saw the following as well:

// EPERM is kept as-is and is not converted or wrapped.

Why is that?

@rgo3
Copy link
Contributor

rgo3 commented Jul 22, 2022

Only real explanation I can give you is in @ti-mo's commit message. To me returning EPERM directly feels reasonable, I'm not sure if there is any benefit to wrapping it.

@lmb
Copy link
Collaborator Author

lmb commented Jul 23, 2022

Ok, I read that as "don't let EPERM fall into the default switch case". Wrapping the error forces use of errors.Is which gives you more flexibility in the future if you need to change the returned error somehow.

@lmb lmb force-pushed the arm64 branch 4 times, most recently from cd71052 to 3c1038c Compare July 26, 2022 10:52
lmb added 2 commits July 26, 2022 10:53
ENOTSUPP is not included in x/sys/unix since it's meant to be kernel
internal. Move it from our wrapper package into internal/sys.
Also provide a descriptive error message.
Linux on arm64 returns ENOTSUPP when trying to create a tracing link
via BPF_LINK_CREATE and BPF_RAW_TRACEPOINT_OPEN. Turn this into ErrNotSupported.
@lmb lmb force-pushed the arm64 branch 2 times, most recently from 029442b to cbd720f Compare July 26, 2022 11:24
lmb added 2 commits July 26, 2022 12:30
Use symbols and tracepoints which are portable across amd64 and arm64.

Based on ubuntu 5.13.0-52-generic aarch64.
Don't return unwrapped sentinel errors from various feature tests.

testutils.SkipIfNotSupported will bail out if it encounters an unwrapped
ErrNotSupported.
@lmb
Copy link
Collaborator Author

lmb commented Jul 26, 2022

Ugh, finally a green build. I'll merge this tomorrow-ish unless someone objects.

@ti-mo
Copy link
Collaborator

ti-mo commented Jul 27, 2022

// EPERM is kept as-is and is not converted or wrapped.

Why is that?

Ha, I just asked myself the same question today while working on #746. We should probably remove it, it complicates the logic unnecessarily.

I want to wrap everything except ErrNotSupported using a defer closure, as it'll make the switch statement much more straightforward and easier to extend. Looking at your patches here, we might as well wrap every returned error and set the error string containing %w depending on the error type.

lmb added 3 commits July 27, 2022 14:30
Trying to create a perf_event uprobe on arm64 doesn't seem to allow
passing an address in the first page. Use the first byte of the second
page instead.
Passing flags to BPF_MAP_LOOKUP_AND_DELETE_ELEM is only possible
from 5.14. Skip the test on earlier kernels.
TestKprobeOffset used to hardcode specific offsets to test against,
which meant we had to make it amd64 specific. It turns out that even
on a single arch the offsets aren't stable due to changes in the
compiler. Hence the test was changed to brute force a valid offset,
but we forgot to run it on all arches.

Trying to execute the test on Linux 5.13 on arm64 triggers EINVAL
instead of the expected EILSEQ however. This is because arm64 returns
EINVAL when trying to use an offset that is not a multiple of four.

Relying on EILSEQ is therefore not portable, and probably shouldn't
be encouraged. Drop the check for EILSEQ from TestKprobeOffset and
execute it on all arches.
@ti-mo
Copy link
Collaborator

ti-mo commented Jul 27, 2022

@lmb I've dropped eb06124 from the PR since it breaks API, and from what I understand this is more cosmetic than functional. Callers might already rely on ENOENT specifically being returned for missing symbols across all kernels, as well as when scanning for valid offsets within a known symbol.

The original meaning of ENOENT in this context is 'symbol not found', which, with some creativity, could be interpreted as 'offset not found' when trying to attach to specific offsets, although that might be a stretch. Could you maybe try to tackle this from another angle? I don't see a simple way out here.

@ti-mo ti-mo merged commit a293887 into cilium:master Jul 27, 2022
@lmb
Copy link
Collaborator Author

lmb commented Jul 27, 2022

Turning EINVAL into ENOENT is just plain wrong unfortunately, and does obscure bugs. There are a ton of reasons why perf event open will return EINVAL, some of them arch specific, that don't come from the symbol being missing. Trying to paper them over doesn't work, so I think the breakage is inevitable.

@ti-mo
Copy link
Collaborator

ti-mo commented Jul 28, 2022

@lmb Fair enough, will add it back in through a separate PR. This'll make it pop up in the changelog, so we won't forget to add a breaking change to the relnotes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants