Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: KEP-4381: DRA: management access #4611

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

pohly
Copy link
Contributor

@pohly pohly commented May 2, 2024

A claim with management access to the underlying resource(s) is useful, e.g. for health checks or statistics gathering. Because this is privileged access, we need a way to control whether users are granted that privilege.

A claim with management access to the underlying resource(s) is useful,
e.g. for health checks or statistics gathering. Because this is privileged
access, we need a way to control whether users are granted that privilege.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 2, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
Once this PR has been reviewed and has the lgtm label, please assign mrunalp for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels May 2, 2024
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 2, 2024
// access mode that is only granted when there is a Quota object
// in the same namespace as the claim where AllowManagementAccess
// is true.
ManagementAccess bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to be in the allocation result, together with some explanations that the DRA driver kubelet plugin should validate allocation results. It's always possible that an error occurred and two claims both have the same instance in their result. The plugin should catch that and reject preparing the second claim, except when management access is involved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also necessary because devices might be injected differently if they are meant for management access vs. normal allocation (this is true for NVIDIA GPUs with MIG devices, for example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment saying that the ResourceClaimSpec.ManagementAccess is immutable. Then this information will be available to drivers - assuming that we do #4615.

This new boolean alone is not sufficient to deploy a daemon set which requests
and gets access to all resource instances of a certain type on a node. Some way
to ask for ">= 1 instance" with no upper bound will be needed. This can be
added together with support for optional allocation.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I have in mind is adding an optional "range" to a request. No range means "exactly one" (current KEP). Adding a range with "min: 1" could be used to ask for all GPUs. A range with "min: 0, max: 1" would be for a resource that is not absolutely required.

I'll do a separate PR with this.

@bart0sh bart0sh added this to Needs Reviewer in SIG Node PR Triage May 4, 2024
pohly added a commit to pohly/wg-device-management that referenced this pull request May 13, 2024
This proposal takes the existing KEP as base and includes:
- vendor-independent classes and attributes (kubernetes/enhancements#4614)
- optional allocation (kubernetes/enhancements#4619)
- inline parameters (kubernetes/enhancements#4613)
- management access (kubernetes/enhancements#4611)
- renaming "named resources" to "devices" wherever it makes sense and is
  user-facing (Slack discussion)
- MatchAttributes (from k8srm-prototype)
- OneOf (from k8srm-prototype)

`pkg/api` currently builds, but the rest doesn't. None of the YAML examples
have been updated yet.
pohly added a commit to pohly/wg-device-management that referenced this pull request May 13, 2024
This proposal takes the existing KEP as base and includes:
- vendor-independent classes and attributes (kubernetes/enhancements#4614)
- optional allocation (kubernetes/enhancements#4619)
- inline parameters (kubernetes/enhancements#4613)
- management access (kubernetes/enhancements#4611)
- renaming "named resources" to "devices" wherever it makes sense and is
  user-facing (Slack discussion)
- MatchAttributes (from k8srm-prototype)
- OneOf (from k8srm-prototype)

`pkg/api` currently builds, but the rest doesn't. None of the YAML examples
have been updated yet.
pohly added a commit to pohly/wg-device-management that referenced this pull request May 13, 2024
This proposal takes the existing KEP as base and includes:
- vendor-independent classes and attributes (kubernetes/enhancements#4614)
- optional allocation (kubernetes/enhancements#4619)
- inline parameters (kubernetes/enhancements#4613)
- management access (kubernetes/enhancements#4611)
- renaming "named resources" to "devices" wherever it makes sense and is
  user-facing (Slack discussion)
- MatchAttributes (from k8srm-prototype)
- OneOf (from k8srm-prototype)

`pkg/api` currently builds, but the rest doesn't. None of the YAML examples
have been updated yet.
pohly added a commit to pohly/wg-device-management that referenced this pull request May 13, 2024
This proposal takes the existing KEP as base and includes:
- vendor-independent classes and attributes (kubernetes/enhancements#4614)
- optional allocation (kubernetes/enhancements#4619)
- inline parameters (kubernetes/enhancements#4613)
- management access (kubernetes/enhancements#4611)
- renaming "named resources" to "devices" wherever it makes sense and is
  user-facing (Slack discussion)
- MatchAttributes (from k8srm-prototype)
- OneOf (from k8srm-prototype)

`pkg/api` currently builds, but the rest doesn't. None of the YAML examples
have been updated yet.
pohly added a commit to pohly/wg-device-management that referenced this pull request May 16, 2024
This proposal takes the existing KEP as base and includes:
- vendor-independent classes and attributes (kubernetes/enhancements#4614)
- optional allocation (kubernetes/enhancements#4619)
- inline parameters (kubernetes/enhancements#4613)
- management access (kubernetes/enhancements#4611)
- renaming "named resources" to "devices" wherever it makes sense and is
  user-facing (Slack discussion)
- MatchAttributes (from k8srm-prototype)
- OneOf (from k8srm-prototype)

`pkg/api` currently builds, but the rest doesn't. None of the YAML examples
have been updated yet.
pohly added a commit to pohly/wg-device-management that referenced this pull request May 16, 2024
This proposal takes the existing KEP as base and includes:
- vendor-independent classes and attributes (kubernetes/enhancements#4614)
- optional allocation (kubernetes/enhancements#4619)
- inline parameters (kubernetes/enhancements#4613)
- management access (kubernetes/enhancements#4611)
- renaming "named resources" to "devices" wherever it makes sense and is
  user-facing (Slack discussion)
- MatchAttributes (from k8srm-prototype)
- OneOf (from k8srm-prototype)

`pkg/api` currently builds, but the rest doesn't. None of the YAML examples
have been updated yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Status: Needs Reviewer
SIG Node PR Triage
Needs Reviewer
Development

Successfully merging this pull request may close these issues.

None yet

3 participants