-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added benchmarks for pod affinity NamespaceSelector #101329
Conversation
@ahg-g: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @adtac |
@alculquicondor @Huang-Wei this is needed for beta graduation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/sig scheduling
} | ||
} | ||
if err != nil { | ||
klog.Fatalf("Creating namespace: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better not use klog.Fatal in a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should return here, updated.
@@ -681,6 +802,7 @@ func createPods(namespace string, cpo *createPodsOp, clientset clientset.Interfa | |||
if err != nil { | |||
return err | |||
} | |||
klog.Infof("Creating %d pods in namespace %q", cpo.Count, namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b.Info is easier to deal with on debugging tools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logs get truncated, not sure if there is an option to prevent that.
b.Fatalf("op %d: %v", opIndex, err) | ||
} | ||
if err := nsPreparer.prepare(); err != nil { | ||
b.Fatalf("op %d: %v", opIndex, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if some namespaces were successfully created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I get what is the concern, this is a fatal error, so the whole the test case will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything else clearing the namespaces? Isn't the etcd db shared for the entire test suite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, ok, I added a call to cleanup()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits below.
One q: are you going to compose a baseline test to compare the results? For example, create $initNamespaces
namespaces, and run workloads specifying spec.affinity...namespaces
.
measurePods: 1000 | ||
|
||
|
||
- name: SchedulingPreferredAffinityWithNSSelector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated with L553?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicates should be L627 and L553 instead of here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, removed the duplicate.
} | ||
klog.Infof("Making %d namespaces with prefix %q and template %v", p.count, p.prefix, *base) | ||
|
||
retries := 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may wrap this by reusing:
import "k8s.io/client-go/util/retry"
retry.RetryOnConflict(retry.DefaultRetry, fn)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
measurePods: 1000 | ||
|
||
|
||
- name: SchedulingPreferredAffinityWithNSSelector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicates should be L627 and L553 instead of here.
// Number of namespaces to create. Parameterizable through CountParam. | ||
Count int | ||
// Template parameter for Count. | ||
CountParam string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both "Count" and "CountParam" are semantically identical, possible to just use one instead, maybe if the "CountParam" is not set it could be parsed as "1" for the measured namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an established pattern across all operations.
namespaceTemplatePath: config/namespace-with-labels.yaml | ||
- opcode: createNamespaces | ||
prefix: measure-ns | ||
count: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe something like this?
countParam: $measureNamespaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the parameter is not reused across workloads, we want to explicitly use a single namespace, hence its hardcoded.
func (cpso createPodSetsOp) patchParams(w *workload) (realOp, error) { | ||
if cpso.CountParam != "" { | ||
var ok bool | ||
if cpso.Count, ok = w.Params[cpso.CountParam[1:]]; !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider the case that both "cpso.CountParam" and "cpso.Count" are set in the template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an established pattern in the file, CountParam takes precedence, I added a comment.
LGTM for me after squash |
The comparison is against the existing affinity benchmarks. The current benchmarks put all existing pods in one namespace, the new ones split them across 100 namespaces and use namespace selector, I am showing that there is no performance drop. |
Sounds good. |
commits squashed and I updated the PR description with the results. |
/lgtm /hold |
/retest |
1 similar comment
/retest |
for i := 0; i < p.count; i++ { | ||
n := base.DeepCopy() | ||
n.Name = fmt.Sprintf("%s-%d", p.prefix, i) | ||
testutils.RetryWithExponentialBackOff(func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the returned error is discarded. IMO we should abort the loop to return the (timeout) error? In current logic, prepare() always return nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the function returns an error, RetryWithExponentialBackOff will directly return and not retry. Ideally there should be a way to check if the error is not retry-able and only in that case return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the functions in
kubernetes/test/utils/create_resources.go
Line 56 in 2115852
func CreatePodWithRetries(c clientset.Interface, namespace string, obj *v1.Pod) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the function returns an error, RetryWithExponentialBackOff will directly return and not retry
True, but the inner function doesn't return any error, right? So the only non-nil error we may get from testutils.RetryWithExponentialBackOff
is timeout error, and in that case, should we abort the test?
are actually not doing any retries on errors
yes, the namings (CreatePodWithRetries and others) are confusing and I proposed #100688.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but the inner function doesn't return any error, right?
you mean line 1038 below? correct. I am simplifying things here and I am assuming that all errors are retry-able because we don't have a method that tells us whether the error is retry-able (in which case we would return nil) or not retry-able (in which case we would return the error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to capture the error returned by RetryWithExponentialBackOff
and return it.
/lgtm |
/retest |
/hold cancel |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Adds two operations to scheduler perf benchmarks integration test: 1) create namespace, 2) create multiple sets of pods.
Those were necessary to create pod (anti)affinity benchmarks with NamespaceSelector
The benchmark results are in the following file: BenchmarkPerfScheduling.txt
The comparison is against the existing affinity benchmarks. The current affinity benchmarks put all existing pods in one namespace, the new ones split them across 100 namespaces and use namespace selector, the results show that there is no performance drop.
Which issue(s) this PR fixes:
Part of kubernetes/enhancements#2249 #97203
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: