Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e framework: consolidate timeouts and intervals #114783

Merged
merged 6 commits into from
Jan 12, 2023

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Jan 3, 2023

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Some timeouts were defined in TestContext, others in TimeoutContext. Some were constants in source code. The long-term goal is to move all of those durations into TimeoutContext and provide a uniform configuration mechanism for them. This PR is a first step towards that.

Special notes for your reviewer:

This was motivated by kubernetes/community#7021 (comment)

Does this PR introduce a user-facing change?

NONE

Filling in the default values directly in the struct eliminates the need to
define constants that aren't used anywhere else.
If we were to add new fields in TimeoutContext, the current users of
NewFrameworkWithCustomTimeouts might run into failures unless they get modified
to also set those new fields. This is error-prone.

A better approach is to let users of NewFrameworkWithCustomTimeouts override
fields by setting just those and use the normal defaults for the others.
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 3, 2023
@k8s-ci-robot
Copy link
Contributor

@pohly: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test sig/network Categorizes an issue or PR as relevant to SIG Network. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 3, 2023
@pohly
Copy link
Contributor Author

pohly commented Jan 5, 2023

/retest

1 similar comment
@pohly
Copy link
Contributor Author

pohly commented Jan 5, 2023

/retest

@pohly
Copy link
Contributor Author

pohly commented Jan 5, 2023

@bertinatto : you initially introduced timeouts.go. Perhaps you can have a look at this update for it?

@pohly
Copy link
Contributor Author

pohly commented Jan 5, 2023

/retest

}
// Make a copy, otherwise the caller would have the ability to
// modify the defaults.
copy := TestContext.timeouts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this no longer have the defaults, does it?

are not the flags overrding the defaults?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the defaults are copy := defaultTimeouts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that we should not return &TestContext.timeouts - that would give the caller read-write access. Let me change the comment to:

Make a copy, otherwise the caller would have the ability to modify the values.

With "defaults" I was following the function name ("WithDefaults"). That "the defaults" are configurable gets ignored here. It's debatable whether that function name is a good name - probably not. But changing it doesn't seem worthwhile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When interpreting "default" as "what tests should use unless they override it", then it still makes sense.

It's just not "defaults" as in "hard-coded in the source code".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and renaming the function isn't affecting that much code - let's do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, let me see if I'm reading this wrong:
Before: NewTimeoutContextWithDefaults always returned the timeouts constants defined here, they had to use NewFrameworkWithCustomTimeouts to use the custom timeouts
Now: they CustomTimeouts are always used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before: NewTimeoutContextWithDefaults returned a subset of the timeouts provided by the framework, with defaults that were hard-coded in the source code of the framework.

After: NewTimeoutContext returns all timeouts provided by the framework, of which some are configurable via command line parameters. We still need to decide whether we want to make all of them configurable, and how (more command line parameters or config file?).

The "all timeouts" may be a bit too optimistic, though, but it is the goal. There are several hard-coded timeouts (for example, in e2e/framework/pod/wait.go) that either should be replaced by the ones in TimeoutContext (most likely) or need to be added there (less likely). This needs to be checked on a case-by-case basis after we agree on the general concept.

That the new timeout context then gets passed back to the framework by the CSI driver setup code without changes (as far as I can tell) is a bit odd, but I guess it was added because some CSI driver might want to run with different timeouts.

Copy link
Contributor Author

@pohly pohly Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that test/e2e/storage/drivers/in_tree.go overwrites some timeouts - so that probably explains this aspect of the CSI driver setup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified offline, only the Timeouts passed in the flags are modified

This consolidates timeout handling. In the future, configuration of all
timeouts via a configuration file might get added. For now, the same three
legacy command line flags for the timeouts that get moved continue to be
supported.
Various different tests all have their own poll intervals. As a start towards
consolidating that, the interval from test/e2e/framework/pod (as one of the
most common cases for polling) is moved into the framework.

Changing other helper packages and tests needs to follow.
Primarily this protects against accidentally polling with the default interval
of 10ms. Setting these defaults may also make some tests simpler because they
don't need to override the defaults.
@k8s-ci-robot k8s-ci-robot added the sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. label Jan 9, 2023
Copy link
Member

@oomichi oomichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @oomichi

if !value.IsZero() {
out.Field(i).Set(value)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the above change is necessary?
Callers of framework.NewFrameworkWithCustomTimeouts expect the timeout values should be replaced with the specified timeout values for each driver.
and this pull request doesn't change those caller sides.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose some code uses this function like this:

myTimeouts := framework.TimeoutContext{
    // Initialize all fields here.
    PodStart: ...
    ...
}
f := framework.NewFrameworkWithCustomTimeouts("foo", myTimeouts)

The API allows that, so we can't be sure that it isn't done. Now we add some new timeout field. With the previous f.Timeouts = timeouts, the tests would run with 0 as value for that new field. With the new code, that field will have a sane default.

The alternative is to not let users create TimeoutContext structs (API change) or require that they obtain one from NewTimeoutContext[FromDefaults] and then change some field (hard to enforce).

This solution seems simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to explain this change is this question:

Which is better, a NewFrameworkWithCustomTimeouts where the caller must set all fields or a NewFrameworkWithCustomTimeouts where the caller only needs to set those fields it cares about?

The latter is easier to use and makes this function usable without calling NewTimeoutContext first - perhaps I should make that change. We might even remove that function entirely. I can see some value in having it, but nothing in our code base will use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of this patch:

diff --git a/test/e2e/storage/drivers/in_tree.go b/test/e2e/storage/drivers/in_tree.go
index 04c4e08285e..51468fb040c 100644
--- a/test/e2e/storage/drivers/in_tree.go
+++ b/test/e2e/storage/drivers/in_tree.go
@@ -1967,9 +1967,10 @@ func (v *azureFileVolume) DeleteVolume(ctx context.Context) {
 }
 
 func (a *azureDiskDriver) GetTimeouts() *framework.TimeoutContext {
-	timeouts := framework.NewTimeoutContext()
-	timeouts.PodStart = time.Minute * 15
-	timeouts.PodDelete = time.Minute * 15
-	timeouts.PVDelete = time.Minute * 20
+	timeouts := &framework.TimeoutContext{
+		PodStart:  time.Minute * 15,
+		PodDelete: time.Minute * 15,
+		PVDelete:  time.Minute * 20,
+	}
 	return timeouts
 }
diff --git a/test/e2e/storage/external/external.go b/test/e2e/storage/external/external.go
index 7150a3d73b8..20c33615d6f 100644
--- a/test/e2e/storage/external/external.go
+++ b/test/e2e/storage/external/external.go
@@ -311,7 +311,7 @@ func (d *driverDefinition) GetDynamicProvisionStorageClass(ctx context.Context,
 }
 
 func (d *driverDefinition) GetTimeouts() *framework.TimeoutContext {
-	timeouts := framework.NewTimeoutContext()
+	timeouts := &framework.TimeoutContext{}
 	if d.Timeouts == nil {
 		return timeouts
 	}
diff --git a/test/e2e/storage/framework/testdriver.go b/test/e2e/storage/framework/testdriver.go
index 6614e8d0280..7910c780708 100644
--- a/test/e2e/storage/framework/testdriver.go
+++ b/test/e2e/storage/framework/testdriver.go
@@ -141,7 +141,7 @@ func GetDriverTimeouts(driver TestDriver) *framework.TimeoutContext {
 	if d, ok := driver.(CustomTimeoutsTestDriver); ok {
 		return d.GetTimeouts()
 	}
-	return framework.NewTimeoutContext()
+	return &framework.TimeoutContext{}
 }
 
 // Capability represents a feature that a volume plugin supports

It would be sufficient for using the resulting struct with NewFrameworkWithCustomTimeouts. However, there might be other usages of the struct, so let's keep GetDriverTimeouts as-is (= returns fully populated struct).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your explanation @pohly

Now we add some new timeout field. With the previous f.Timeouts = timeouts, the tests would run with 0 as value for that new field. With the new code, that field will have a sane default.

I see, that makes sense.
It is nice to specify necessary customized timeouts only from caller side as this pull request does.

This is simpler, no need to construct an entirely new struct anymore.
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 10, 2023
@pohly
Copy link
Contributor Author

pohly commented Jan 10, 2023

/retest

Copy link
Member

@oomichi oomichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change itself seems good for me, I just want to see @aojea reply before merging.

LGTM

if !value.IsZero() {
out.Field(i).Set(value)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your explanation @pohly

Now we add some new timeout field. With the previous f.Timeouts = timeouts, the tests would run with 0 as value for that new field. With the new code, that field will have a sane default.

I see, that makes sense.
It is nice to specify necessary customized timeouts only from caller side as this pull request does.

// Reconfigure gomega defaults. The poll interval should be suitable
// for most tests. The timeouts are more subjective and tests may want
// to override them, but these defaults are still better for E2E than the
// ones from Gomega (1s timeout, 10ms interval).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is too aggresive for us

@aojea
Copy link
Member

aojea commented Jan 12, 2023

/lgtm
/approve

Thanks

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 12, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 5ea6c9800259e626174927740868d265fc1ebbcc

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 0d6dc14 into kubernetes:master Jan 12, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants