New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ambient: fix nil pointer when pod cache is stale #50878
base: master
Are you sure you want to change the base?
Conversation
I ran into this in an *extremely* bespoke and unsupported environment, but I think it could occur in real world. We are looping outside of GetPodIfAmbient for the pod to show up, but if it fails we panic. We want to instead get an error.
8da0621
to
db9c0c9
Compare
I think this logic isn't quite right. will fix it up tomorrow |
🤔 🐛 You appear to be fixing a bug in Go code, yet your PR doesn't include updates to any test files. Did you forget to add a test? Courtesy of your friendly test nag. |
I assume you mean |
yes |
@@ -200,3 +186,32 @@ func (s *CniPluginServer) ReconcileCNIAddEvent(ctx context.Context, addCmd CNIPl | |||
|
|||
return nil | |||
} | |||
|
|||
func (s *CniPluginServer) getPodWithRetry(log *istiolog.Scope, name, namespace string) (*corev1.Pod, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was mentioned in other PRs, but if this is now its own function, it would be good to cover behavior with unit tests in cni-watcher_test.go
(the lack of this is probably why we had this bug in the first place)
We didn't do that before because it would be a slow test due to the timeouts, but if this is a private func, can we just pass the timeouts in as args and test quick iterations in a unit test to codify correct behavior here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we have things abstracted actually still makes this pretty hard since we don't actually call this except in cni-watcher where all them mocking makes i tricky to flow through
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it could be a copypaste of one of the existing tests (e.g. TestCNIPluginServer
), but we simply call getPodWithRetry
directly without adding an underlying pod to the fake k8s client, right?
That looks like that would simulate querying for a pod the server client lacks.
@@ -82,12 +82,19 @@ func setupHandlers(ctx context.Context, kubeClient kube.Client, dataplane MeshDa | |||
return s | |||
} | |||
|
|||
// GetPodIfAmbient looks up a pod. It returns: | |||
// * An error if the pod cannot be found | |||
// * nil if the pod is found, but does not have ambient enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just using nil, nil
as a sentinel value? Should we just return a different error value for when the pod is found but isn't configured as we expect?
|
||
func (s *CniPluginServer) getPodWithRetry(log *istiolog.Scope, name, namespace string) (*corev1.Pod, error) { | ||
log.Debugf("Checking pod: %s in ns: %s is enabled for ambient", name, namespace) | ||
maxStaleRetries := 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this wasn't the case in the previous code, but does it make sense to declare these as constants?
@@ -200,3 +186,32 @@ func (s *CniPluginServer) ReconcileCNIAddEvent(ctx context.Context, addCmd CNIPl | |||
|
|||
return nil | |||
} | |||
|
|||
func (s *CniPluginServer) getPodWithRetry(log *istiolog.Scope, name, namespace string) (*corev1.Pod, error) { | |||
log.Debugf("Checking pod: %s in ns: %s is enabled for ambient", name, namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Debugf("Checking pod: %s in ns: %s is enabled for ambient", name, namespace) | |
log.Debugf("Checking if pod %s/%s is enabled for ambient", name, namespace) |
nit suggestion to better match the non-colon strings we use elsewhere in this PR.
Current:
Checking pod: foo in ns: bar is enabled for ambient
Suggestion:
Checking if pod foo/bar is enabled for ambient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double nit, can we do namespace/name
if we're going to make a change?
I ran into this in an extremely bespoke and unsupported environment,
but I think it could occur in real world. We are looping outside of
GetPodIfAmbient for the pod to show up, but if it fails we panic. We
want to instead get an error.