New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sync_proxy_rules_no_endpoints_total metric #108930
Conversation
Skipping CI for Draft Pull Request. |
/test all |
/assign andrewsykim |
pkg/proxy/ipvs/proxier.go
Outdated
localNoEndpoints int64 | ||
// externalNoEndpoints represents the number of rules that couldn't be applied due to | ||
// the absence of any endpoints. | ||
externalNoEndpoints int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would name this clusterNoEndpoints instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/proxy/iptables/proxier.go
Outdated
@@ -1327,6 +1329,11 @@ func (proxier *Proxier) syncProxyRules() { | |||
} | |||
|
|||
if !hasEndpoints { | |||
if svc.NodeLocalInternal() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI there may be some changes need after #106497 is merged. I don't expect the conflict to be big but just a heads up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
pkg/proxy/ipvs/proxier.go
Outdated
@@ -273,6 +273,12 @@ type Proxier struct { | |||
// Inject for test purpose. | |||
networkInterfacer utilproxy.NetworkInterfacer | |||
gracefuldeleteManager *GracefulTerminationManager | |||
// localNoEndpoints represents the number of rules that couldn't be applied due to | |||
// the absence of local endpoints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a comment reference the metric this is used for would be helpful, same for clusterNoEndpoints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done for both.
pkg/proxy/iptables/proxier.go
Outdated
@@ -967,6 +967,8 @@ func (proxier *Proxier) syncProxyRules() { | |||
} | |||
} | |||
|
|||
serviceNoEndpointsTotalInternal := 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment here for the metric this is used for would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/retest unrelated |
@MaxRenaud: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
2 similar comments
/retest |
/retest |
/assign @dgrisonnet |
Done |
pkg/proxy/iptables/proxier.go
Outdated
args = append(args[:0], | ||
"-A", string(policyLocalChain), | ||
"-m", "comment", "--comment", | ||
fmt.Sprintf(`"%s has no local endpoints"`, svcNameString), | ||
"-j", | ||
string(KubeMarkDropChain), | ||
) | ||
proxier.natRules.Write(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm... that use of args is unnecessary with the current code...
args = append(args[:0], | |
"-A", string(policyLocalChain), | |
"-m", "comment", "--comment", | |
fmt.Sprintf(`"%s has no local endpoints"`, svcNameString), | |
"-j", | |
string(KubeMarkDropChain), | |
) | |
proxier.natRules.Write(args) | |
proxier.natRules.Write( | |
"-A", string(policyLocalChain), | |
"-m", "comment", "--comment", | |
fmt.Sprintf(`"%s has no local endpoints"`, svcNameString), | |
"-j", | |
string(KubeMarkDropChain), | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just moved the code. Is it OK if we keep it as is for this PR? I'm getting an exception since we are past the freeze and my justification for the low risk is that we are simply adding a metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will tickle a rebase on my PR which touches the args usage, too, I think
{"10.0.1.3", "host2"}, | ||
}, | ||
expectedSyncProxyRulesNoLocalEndpointsTotalInternal: 1, | ||
expectedSyncProxyRulesNoLocalEndpointsTotalExternal: 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to have some test where the metric is more than 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this as a quick follow-up? It involves refactoring the test to include dynamically generating services. Since we're after the freeze, I'd like to get this one in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
I think this is OK, even if some minor fixups would be nice. /lgtm |
/milestone v1.24 (exception for this PR was approved) |
/retest |
@MaxRenaud: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This is ready to merge. It's been rebased, modified to fit with the conflicts, tested, and all comments were addressed. The absolute latest this can be merged is "01:00 UTC Saturday 2nd April 2022" in order to make it into 1.24. That translates to 18:00 PDT or 21:00 EDT today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks @MaxRenaud!
What type of PR is this?
/kind feature
What this PR does / why we need it: Adds a metric when a service with a local traffic_policy has no available endpoints. The metric has 2 labels:
Which issue(s) this PR fixes:
Fixes #2086
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: