Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce ReadWriteOncePod during scheduling #103082

Conversation

chrishenzie
Copy link
Member

Please see the last two commits for scheduler related changes.

What type of PR is this?

/kind feature

What this PR does / why we need it:

This is a followup to #102028 that introduces the ReadWriteOncePod access mode for PVs and PVCs. This PR introduces changes to the scheduler to enforce this access mode.

The ReadWriteOncePod access mode restricts volume access to a single pod on a single node. Since no other pods can use a PVC using this access mode (enforced during mount by kubelet, also will not attach to another node), we should enforce this during scheduling so we do not start pods that cannot proceed.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

I cannot merge this branch into the other PR's branch so this PR includes additional commits. Once it gets merged I will rebase.

Please see this section of KEP-2485 for more details on scheduling changes and additional context.

Does this PR introduce a user-facing change?

Enforce the ReadWriteOncePod PVC access mode during scheduling

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2485-read-write-once-pod-pv-access-mode

/assign @alculquicondor
/assign @msau42
/sig storage
/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 22, 2021
@k8s-ci-robot k8s-ci-robot added area/kubectl area/kubelet kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Jun 22, 2021
@chrishenzie chrishenzie force-pushed the read-write-once-pod-access-mode-scheduler branch from eda138c to fddf26a Compare June 22, 2021 06:23
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 22, 2021
Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open an issue for using the information you are adding to enforce the existing access modes?

pkg/scheduler/framework/types_test.go Outdated Show resolved Hide resolved
pkg/scheduler/framework/types_test.go Outdated Show resolved Hide resolved
pkg/scheduler/framework/types_test.go Outdated Show resolved Hide resolved
pkg/scheduler/framework/types_test.go Outdated Show resolved Hide resolved
pkg/scheduler/framework/types_test.go Outdated Show resolved Hide resolved
@chrishenzie
Copy link
Member Author

@alculquicondor @ahg-g Thank you for the review!

After some discussion offline, I think it's best we don't require this for alpha in v1.22. This will give more time to evaluate design + implementation details around handling preemption.

@alculquicondor
Copy link
Member

The problem I see is that there is no way for users to avoid scheduled pods that don't start, is there? Maybe preemption support could be left as a possible beta requirement.

@chrishenzie
Copy link
Member Author

chrishenzie commented Jun 22, 2021

@alculquicondor There isn't, but that behavior doesn't differ from how ReadWriteOnce currently works. Right now it is possible to create two pods using a ReadWriteOnce PVC where one will start and one will fail with:

Warning  FailedAttachVolume  17s   attachdetach-controller  Multi-Attach error for volume "pvc-cb0764cb-f148-40f9-899e-4d4d9700d37c" Volume is already used by pod(s) web-server

This issue should be fixed, but I don't think fixing it is a requirement for ReadWriteOncePod.

@chrishenzie
Copy link
Member Author

Maybe preemption support could be left as a possible beta requirement.

For alpha then would we go with marking as UnschedulableAndUnresolvable?

@ahg-g
Copy link
Member

ahg-g commented Jun 22, 2021

Yes, you would return UnschedulableAndUnresolvable. I think it is ok to defer support for preemption to beta, we need to update the KEP to add that to beta graduation criteria.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jun 29, 2021

@chrishenzie: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-storage-slow 406e0d9ab0e2741a44619dcae58f1fe55837787c link /test pull-kubernetes-e2e-gce-storage-slow
pull-kubernetes-e2e-gce-storage-snapshot 406e0d9ab0e2741a44619dcae58f1fe55837787c link /test pull-kubernetes-e2e-gce-storage-snapshot

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't remember if I asked this before: can you open an issue to add support for ReadWriteOnce in PreFilter?

@chrishenzie
Copy link
Member Author

@alculquicondor See here for the scheduler issue: #103132

@alculquicondor
Copy link
Member

Ah, I'm asking for PreFilter support for ReadWriteOnce, not ReadWriteOncePod

@chrishenzie
Copy link
Member Author

chrishenzie commented Jun 29, 2021

Oops, I misread the question, here is a new issue for this: #103305

@chrishenzie chrishenzie force-pushed the read-write-once-pod-access-mode-scheduler branch from f54a0cd to 7af3713 Compare June 29, 2021 18:56
Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2021
@chrishenzie chrishenzie force-pushed the read-write-once-pod-access-mode-scheduler branch from 7af3713 to 6c73475 Compare June 29, 2021 19:10
@chrishenzie
Copy link
Member Author

@alculquicondor @ahg-g for lgtm

Check the PVC ref count on the node info cache to determine if a pod's
PVCs are in use. If they are and it is using ReadWriteOncePod, fail the
request.
@chrishenzie chrishenzie force-pushed the read-write-once-pod-access-mode-scheduler branch from 6c73475 to 7ad44d0 Compare June 30, 2021 17:54
@ahg-g
Copy link
Member

ahg-g commented Jun 30, 2021

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 30, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, alculquicondor, chrishenzie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@chrishenzie
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 30, 2021
@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@chrishenzie
Copy link
Member Author

API review of the access mode (including scheduler changes) was completed as part of #102028

@k8s-ci-robot k8s-ci-robot merged commit 385402d into kubernetes:master Jun 30, 2021
SIG Node PR Triage automation moved this from Waiting on Author to Done Jun 30, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jun 30, 2021
@chrishenzie chrishenzie deleted the read-write-once-pod-access-mode-scheduler branch August 20, 2021 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubectl area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

8 participants