Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

features: add misc probes #541

Merged
merged 2 commits into from Feb 14, 2022
Merged

features: add misc probes #541

merged 2 commits into from Feb 14, 2022

Conversation

florianl
Copy link
Collaborator

@florianl florianl commented Jan 8, 2022

Signed-off-by: Florian Lehner dev@der-flo.net

Extend the features API with macros. So similar tobpftool feature probe macros one can now check if large instructions, v2 or v3 ISA or bounded loops are supported.

@florianl florianl requested review from lmb and ti-mo January 8, 2022 12:46
@florianl
Copy link
Collaborator Author

florianl commented Jan 8, 2022

From the CI information that are shared publicly I can't tell, why KERNEL_VERSION=5.4 is not passing. It seems to me the test is running into a timeout rather than failing.

features/macros.go Outdated Show resolved Hide resolved
// adjustJumpOffset updates the first jump instruction in insns with the
// given newJumpOffset.
// If no jump instruction is found and error is returned.
func adjustJumpOffset(insns asm.Instructions, newJumpOffset int16) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? Marshaling the instructions should do the correct fix ups, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, insns.Marshal() is not enough.

In the usual workflow via newProgramWithOptions() this adjustment happens via fixupJumpsAndCalls().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgo3 @ti-mo why can't we use NewProgram from the features package?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we decided on not using it because originally one of the goals was to use the features package in the main ebpf package to replace some of the current feature tests we have in there, I just never got to that during GSoC. So to not create any import cycles and as I didn't have to copy the entire function into the features package we ended up with what we have now. Additionally, at the time we decided that for my purposes it was enough to just use the ProgLoad syscall

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context. I think I'd opt for avoiding copypasta now, and deal with possible import cycles later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid copypasta by exposing fixupJumpsAndCalls() somehow, maybe via internal and using it here? NewProgram() does not return all the errors from the syscall which we use for the features error interface if I see that correctly, so refactoring everything to use it would be much more effort than having a shared function in internal that takes care of fixupJumpsAndCalls().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixupJumpsAndCalls performs two actions: adjusting mumps and replacing bpf helper functions (for mordern kernels).
Splitting this function up and moving the adjusting of jumps into internal sounds like a good idea 👍 so it can be reused here in package features as well. wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make jump fixup part of asm instead, maybe in Instructions.Marshal?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of scenarios, where people hand craft their own instructions and with Instructions.Marshal() they get something different than they would expect. Therefore doing the jump fixup just in Instructions.Marshal() isn't a good idea, I think.
But having it in asm instead of internal sounds good to me. What about Instructions.RewriteJumps(symbolOffsets map[string]asm.RawInstructionOffset)?
If the given symbolOffsets is nil the offsets are calculated as it is happening right now. Otherwise the given offsets are used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced with @lmb in slack about this. Therefore updated the PR to use Instructions.RewriteJumps().

features/macros.go Outdated Show resolved Hide resolved
features/macros.go Outdated Show resolved Hide resolved
@lmb
Copy link
Collaborator

lmb commented Jan 14, 2022

Yeah the tests are due to semaphore CI bugs that I haven't been able to iron out.

@florianl florianl changed the title features: add macros probes features: add misc probes Jan 17, 2022
@lmb
Copy link
Collaborator

lmb commented Jan 28, 2022

@ti-mo can you take a look please?

@florianl florianl force-pushed the flo-feature-macros branch 3 times, most recently from 3a5ee37 to 5216d82 Compare February 7, 2022 20:55
asm/instruction.go Outdated Show resolved Hide resolved
asm/instruction.go Outdated Show resolved Hide resolved
asm/instruction.go Outdated Show resolved Hide resolved
@lmb
Copy link
Collaborator

lmb commented Feb 8, 2022

There is a real test failure:

--- FAIL: TestHaveMisc (0.00s) 00:30
    --- FAIL: TestHaveMisc/misc-2 (0.00s) 00:30
        misc_test.go:37: Feature misc-2 isn't supported even though kernel is at least 4.13: not supported 00:30
FAIL

Signed-off-by: Florian Lehner <dev@der-flo.net>
Signed-off-by: Florian Lehner <dev@der-flo.net>
@florianl
Copy link
Collaborator Author

florianl commented Feb 8, 2022

There is a real test failure:

--- FAIL: TestHaveMisc (0.00s) 00:30
    --- FAIL: TestHaveMisc/misc-2 (0.00s) 00:30
        misc_test.go:37: Feature misc-2 isn't supported even though kernel is at least 4.13: not supported 00:30
FAIL

my fault 🙈 the jump offset for this probe was not correct. I did test it yesterday only on a 5.16. kernel. fixed this.

But now Run unit tests - KERNEL_VERSION=5.10 fails:

Testing on 5.1000:26
[..]
exit status 4200:39
FAIL	github.com/cilium/ebpf/features	0.339s
[..]

From the CI I can't tell exactly which test in github.com/cilium/ebpf/features failed. Locally I was able to reproduce the failure once - see output https://gist.github.com/florianl/cb22bc3d9b00e65dfe2d43b84cb6c8e7
But the failing tests are TestHaveMapType/InodeStorage and TestHaveProgType/LircMode2. But this was only once from multiple (passed) runs :/ is there a way to get more info from CI or rerun the job?

@lmb
Copy link
Collaborator

lmb commented Feb 9, 2022

exit 42 is an issue that happens on CI somewhat regularly unfortunately. Not necessarily related to your PR.

Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ti-mo can you please take a look. I'm not very familiar with the features stuff.

// EINVAL occurs when attempting to create a program with an unknown type.
// E2BIG occurs when ProgLoadAttr contains non-zero bytes past the end
// of the struct known by the running kernel, meaning the kernel is too old
// to support the given map type.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a copy paste error inherited from my program feature probes PR, should be prog type in both cases.

case errors.Is(err, unix.EINVAL), errors.Is(err, unix.E2BIG):
err = ebpf.ErrNotSupported

// EPERM is kept as-is and is not converted or wrapped.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling is the very same as applied to the prog feature probes:

ebpf/features/prog.go

Lines 132 to 150 in 7816b93

switch {
// EINVAL occurs when attempting to create a program with an unknown type.
// E2BIG occurs when ProgLoadAttr contains non-zero bytes past the end
// of the struct known by the running kernel, meaning the kernel is too old
// to support the given map type.
case errors.Is(err, unix.EINVAL), errors.Is(err, unix.E2BIG):
err = ebpf.ErrNotSupported
// EPERM is kept as-is and is not converted or wrapped.
case errors.Is(err, unix.EPERM):
break
// Wrap unexpected errors.
case err != nil:
err = fmt.Errorf("unexpected error during feature probe: %w", err)
default:
fd.Close()
}

I can think of non-privileged eBPF use cases, where this should be kept. please correct me on this @rgo3

Should I try to combine it with the prog feature probe error handling?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a discussion around EPERM on my first PR, and I think returning EPERM as-is was the simple implementation to make sure that the caller can check against this specific case and that it is distinguishable from ErrNotSupported. To make it more useful though we might want to add code to clear an (EPERM) error from the cache, so that potentially users can change caps and rerunning the probe will not just return the cached EPERM, but that's another PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning EPERM is OK, but it should be wrapped to discourage err == EPERM comparisons. Something for another PR.

Copy link
Contributor

@rgo3 rgo3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I thought I also should give this a full review: Aside from the few unresolved comments and my nit regarding testing it looks good to me. Cross-referenced with the bpftool implementation.

for misc, test := range tests {
test := test
probe := fmt.Sprintf("misc-%d", misc)
t.Run(probe, func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use t.Parallel()? I think we also run the other HaveProgType tests in parallel.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't make a difference since the probes end up taking a lock on the cache, therefore only a single probe will ever run concurrently.

@lmb
Copy link
Collaborator

lmb commented Feb 14, 2022

API seems fine and this has been sitting around for a while, so I'll merge. Thanks for your patience Florian!

@lmb lmb merged commit 6981110 into master Feb 14, 2022
@lmb lmb deleted the flo-feature-macros branch February 14, 2022 10:30
ti-mo added a commit to ti-mo/ebpf that referenced this pull request Feb 16, 2022
The error string was changed when it was exported in cilium#541 and is potentially
ambiguous.

Signed-off-by: Timo Beckers <timo@isovalent.com>
ti-mo added a commit to ti-mo/ebpf that referenced this pull request Feb 16, 2022
The error string was changed when it was exported in cilium#541 and is potentially
ambiguous.

Signed-off-by: Timo Beckers <timo@isovalent.com>
ti-mo added a commit that referenced this pull request Feb 17, 2022
The error string was changed when it was exported in #541 and is potentially
ambiguous.

Signed-off-by: Timo Beckers <timo@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants