Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add non graceful shutdown integration test #119478

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

sonasingh46
Copy link
Contributor

@sonasingh46 sonasingh46 commented Jul 20, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds integration test for node out of service detach feature.
Ref feature PR: #108486

Does this PR introduce a user-facing change?

None

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- KEP:  https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown

Signed-off-by: Ashutosh Kumar <sonasingh46@gmail.com>
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 20, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 20, 2023
@xing-yang
Copy link
Contributor

/retest

Signed-off-by: Ashutosh Kumar <sonasingh46@gmail.com>
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 24, 2023
Signed-off-by: Ashutosh Kumar <sonasingh46@gmail.com>
@sonasingh46 sonasingh46 force-pushed the nongraceful_shutdown_integration_test branch from a4e7b44 to 898a7a1 Compare July 24, 2023 19:04
@sonasingh46 sonasingh46 changed the title [WIP]add non graceful shutdown integration test add non graceful shutdown integration test Jul 24, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 24, 2023
t.Fatalf("error in deleting pod: %v", err)
}
waitForPodDeletionTimeStampToSet(t, testClient, pod.Name, namespaceName)
waitForMetric(t, metrics.ForceDetachMetricCounter.WithLabelValues(metrics.ForceDetachReasonOutOfService), 1, "detach-metrics")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics are not thread safe, if there are other tests in the same binary they will use the same global registry, please check there is no possibility of test pollution, I can't remember if the metrics registry is already initialized when the apiserver starts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up re the metrics.
I have seen existing tests using metrics with gotCountofEvent>=expectedCountOfEvent for similar reason and waitForMetric function does the same.
cc @xing-yang

Comment on lines +602 to +608
pod, err := testingClient.CoreV1().Pods(podNamespace).Get(context.TODO(), podName, metav1.GetOptions{})
if err != nil {
t.Fatal(err)
}
if pod.DeletionTimestamp != nil {
return true, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be racy if the pod does not have a finalizer because can be deleted and return a 404 Not found error, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pod in question has been set a grace time period of 300 seconds in this test to allow enough time. So this should be fine. Let me know if you still have a different opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, thanks for clarifying

func waitForMetric(t *testing.T, m basemetric.CounterMetric, expectedCount float64, identifier string) {
if err := wait.Poll(100*time.Millisecond, 60*time.Second, func() (bool, error) {
gotCount, err := metricstestutil.GetCounterMetricValue(m)
fmt.Println(gotCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or this slipped from your local test or use t.Logf with a more meaninful message so it is debuggable

Signed-off-by: Ashutosh Kumar <sonasingh46@gmail.com>
@sonasingh46 sonasingh46 force-pushed the nongraceful_shutdown_integration_test branch from 6081207 to 9b63711 Compare July 27, 2023 07:47
@sonasingh46
Copy link
Contributor Author

/retest

1 similar comment
@sonasingh46
Copy link
Contributor Author

/retest

@xing-yang
Copy link
Contributor

/assign @msau42 @gnufied

Copy link
Contributor

@YuikoTakada YuikoTakada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your PR. Test scenario itself looks good. I've added some comments so please check them.

test/integration/volume/attach_detach_test.go Show resolved Hide resolved
test/integration/volume/attach_detach_test.go Outdated Show resolved Hide resolved
Signed-off-by: Ashutosh Kumar <sonasingh46@gmail.com>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sonasingh46
Once this PR has been reviewed and has the lgtm label, please ask for approval from msau42. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@YuikoTakada
Copy link
Contributor

YuikoTakada commented Sep 19, 2023

@sonasingh46 Thank you for updating. Unfortunately, this failed on my local env.
error message is:

test/integration/volume/attach_detach_test.go:128:15: cannot use v1.ResourceRequirements{…} (value of type 
"k8s.io/kubernetes/vendor/k8s.io/api/core/v1".ResourceRequirements) as 
"k8s.io/kubernetes/vendor/k8s.io/api/core/v1".VolumeResourceRequirements value in struct literal

Signed-off-by: Ashutosh Kumar <sonasingh46@gmail.com>
Signed-off-by: Ashutosh Kumar <sonasingh46@gmail.com>
@YuikoTakada
Copy link
Contributor

Thank you for updating. Looks good for me.


ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go informers.Core().V1().Nodes().Informer().Run(ctx.Done())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that informers.Start() already starts this informer, is not this duplicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove this line.

Comment on lines 194 to 206
// wait for volume to be attached
for i := 0; i < 10; i++ {
node, err = testClient.CoreV1().Nodes().Get(context.TODO(), nodeName, metav1.GetOptions{})
if err != nil {
t.Fatalf("Failed to get the node : %v", err)
}
if len(node.Status.VolumesAttached) > 1 {
break
}
time.Sleep(1 * time.Second)
}
if len(node.Status.VolumesAttached) < 1 {
t.Logf("failed to attach volume for pod %s on node %s", pod.Name, node.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have already libraries for doing these async checks

if err := wait.Poll(100*time.Millisecond, wait.ForeverTestTimeout, func() (bool, error) {
_, err := clientSet.CoreV1().Services("default").Get(ctx, "kubernetes", metav1.GetOptions{})
return err == nil, nil
}); err != nil {
t.Fatalf("Failed to wait for kubernetes service: %v:", err)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you already using them below, why do you implement the loop to check here differently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it. Missed this one. Thanks

Signed-off-by: Ashutosh Kumar <sonasingh46@gmail.com>
@sonasingh46
Copy link
Contributor Author

/retest

@sonasingh46
Copy link
Contributor Author

looking into the test failures

@xing-yang
Copy link
Contributor

/retest

@k8s-ci-robot
Copy link
Contributor

@sonasingh46: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-integration c9bf834 link true /test pull-kubernetes-integration

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 30, 2024
@xing-yang
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 30, 2024
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 2, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@xing-yang
Copy link
Contributor

Hi @sonasingh46, can you please rebase and address the CI failures? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants