Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/node_labeller.go: Fix test_id: 6247 #11848

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Barakmor1
Copy link
Member

@Barakmor1 Barakmor1 commented May 5, 2024

What this PR does

We are setting kubevirt.configuration.ObsoleteCPUModels to nil to align with test expectations. The test assumes that kubevirt.configuration.ObsoleteCPUModels is not set.
Before this PR, if kubevirt.configuration.ObsoleteCPUModels was set before the tests started running, then the test would fail.
After this PR, if kubevirt.configuration.ObsoleteCPUModels was set before the tests started running, it will be set to nil to prevent failure.

Additionally, we will wait for 30 seconds before failing to give the node labeller enough time to set the new labels on the nodes.

Fixes #

Why we need it and why it was done in this way

The following tradeoffs were made:

The following alternatives were considered:

Links to places where the discussion took place:

Special notes for your reviewer

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note

NONE

@kubevirt-bot kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels May 5, 2024
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign enp0s3 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@orelmisan orelmisan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @Barakmor1.

Could you please give a few more words about the current code (whats wrong with it) and how does the change fixes it?

tests/infrastructure/node-labeller.go Outdated Show resolved Hide resolved
@Barakmor1
Copy link
Member Author

Thank you for the PR @Barakmor1.

Could you please give a few more words about the current code (whats wrong with it) and how does the change fixes it?

In my opinion, the description is clear enough. Please let me know if there's anything that isn't clear to you.

@orelmisan
Copy link
Member

Thank you for the PR @Barakmor1.
Could you please give a few more words about the current code (whats wrong with it) and how does the change fixes it?

In my opinion, the description is clear enough. Please let me know if there's anything that isn't clear to you.

What was the reason to explicitly set the KubeVirt CR?
What is the reason node is assigned nodesWithKVM[0] and is then overridden?
What was the reason to use Eventually?

@Barakmor1
Copy link
Member Author

Thank you for the PR @Barakmor1.
Could you please give a few more words about the current code (whats wrong with it) and how does the change fixes it?

In my opinion, the description is clear enough. Please let me know if there's anything that isn't clear to you.

What was the reason to explicitly set the KubeVirt CR? What is the reason node is assigned nodesWithKVM[0] and is then overridden? What was the reason to use Eventually?

Hey i modified the PR description i hope it is clearer now

@orelmisan
Copy link
Member

Thank you for adding additional context.
Could you please update the commit message as well?

Comment on lines +210 to +212
node, err = virtClient.CoreV1().Nodes().Get(context.Background(), node.Name, metav1.GetOptions{})
Expect(err).ToNot(HaveOccurred())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU, the test will immediately fail in case there was an error fetching the node.
Please consider returning an error, so there will be a retry.
Also, please consider giving the fetched node another name, so the details of the original node will not be lost in case of an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Get API call shouldn't fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a temporary network hiccup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case i don't think we should retry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?
Failing immediately due to an API call failure could make this test flaky.

Copy link
Member Author

@Barakmor1 Barakmor1 May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any other place in the functional tests, I never saw that we retry if the get API call fails. I prefer to keep it consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @orelmisan!
This is an Eventually block and we should satisfy the "eventually" behavior.
As is, if an error occurs the block is never retried.
I suggest you to switch to:

Eventually(func(g Gomega) {
	...
	g.Expect(err).ToNot(HaveOccurred())
	...

Or, as suggested, to return the error, so that it will be wrapped by the Eventually.
IMHO if there are other places where things are done like this, we should fix them instead of spreading it. :)
Thank you!

Copy link
Member Author

@Barakmor1 Barakmor1 May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fossedihelm I disagree.
Eventually retries doesn't fit all cases. For instance, in the scenario of a connectivity issue causing the get API call to fail, I would prefer to know about it as soon as possible. Such failures might suggest other issues due to instability. The Eventually retries should be used in cases where we expect components to eventually become consistent. In the mentioned scenario, the Eventually retries fits perfectly because we change the configuration and wait for the node labeller to propagate the new configuration and label the nodes accordingly. Additionally, if we would retry every function that returns an error, the code would include significant amount boilerplate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All over the e2e test suite, we do API calls like Get() or even Create() and expect no error occurred, without giving it a chance to try again.
I think I'm with Barak here and don't think we should necessarily keep looping if an API call failed...
As Orel mentioned, returning the error would make this test slightly more robust against flaky infra, but I don't think tests are responsible for doing that.
This is an interesting debate though. Once we reach an agreement, we should document the decision as a KubeVirt coding guideline (in that doc maybe? #11456)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Agree with you that we should reach an agreement and align all the cases to it.
Thank you

We are setting kubevirt.configuration.ObsoleteCPUModels
to nil to align with test expectations.
The test assumes that kubevirt.configuration.ObsoleteCPUModels
is not set.

Before this PR, if kubevirt.configuration.ObsoleteCPUModels
was set before the tests started running,
then the test would fail.

After this PR, if kubevirt.configuration.ObsoleteCPUModels
was set before the tests started running, it will be set
to nil to prevent failure.

Additionally, we will wait for 30 seconds before failing
to give the node labeller enough time to set the new
labels on the nodes.

Signed-off-by: bmordeha <bmordeha@redhat.com>
@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented May 5, 2024

@Barakmor1: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-e2e-kind-1.27-vgpu fa6953d link false /test pull-kubevirt-e2e-kind-1.27-vgpu

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@orelmisan orelmisan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Barakmor1

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants