Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SeccompDefault KEP upgrade strategy #2773

Merged
merged 1 commit into from Jun 22, 2021

Conversation

saschagrunert
Copy link
Member

This patch reflects the latest changes from the implementation PR:

  • Adding the kubelet configuration and CLI flag.
  • Making the rollout strategy more verbose and add mitigations.

Refers to: #2413
Triggered by: kubernetes/kubernetes#101943

cc @pjbgf @liggitt @mmdriley PTAL

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 2, 2021
@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jun 2, 2021
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 2, 2021
This patch reflects the latest changes from the implementation PR:

- Adding the kubelet configuration and CLI flag.
- Making the rollout strategy more verbose and add mitigations.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
@saschagrunert
Copy link
Member Author

@liggitt @mmdriley @mrunalp PTAL

@liggitt
Copy link
Member

liggitt commented Jun 14, 2021

lgtm, default behavior is unchanged from previous releases; kubelet gains a configuration option so the node owner can define what seccomp profile should be used for pods/containers which do not specify a seccomp profile

@saschagrunert
Copy link
Member Author

@derekwaynecarr please take a look too and approve if this change is fine

@mrunalp
Copy link
Contributor

mrunalp commented Jun 15, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 15, 2021
@mrunalp
Copy link
Contributor

mrunalp commented Jun 15, 2021

/assign @derekwaynecarr

@mmdriley
Copy link

@saschagrunert thank you for the changes! I think the new approach is a good balance that lets cluster operators implement more aggressive security features without posing an acute threat to compatibility.

I'm still a bit worried we're deferring a lot of the complexity of real-world rollouts based on the idea this is "off by default". Once this feature is in, there will be pressure to turn it on by default in a future release, and we'll face the same problems around deployment and compatibility. We can avoid future headaches by thinking through this stuff now.

Since this is implemented as a node flag, responsibility for rolling it out is likely to fall to platform teams (cluster admins) rather than application owners. It will need to be enabled for all applications in the cluster at once unless admins play scheduling tricks to put only "blessed" applications on new nodes with the flag enabled.

The existence of even one non-compliant application within a cluster will pose a significant rollout hurdle.

The current rollout guidance assumes that cluster admins have access to full and comprehensive test suites for all deployed applications, including high coverage for all third-party dependencies, and can run them on demand. That doesn't match the reality I've seen. So most cluster admins will have to go down the path we call "optional" where they run applications under a permissive/complain seccomp filter for a while -- but, for that, they have to write their own report-mode seccomp filter. Maybe we can write that filter for them?

There's also a ton of cluster-specific complexity hiding behind "[i]f it's possible to monitor audit logs within the cluster". Do we think that is possible in many/most customer clusters? If so, we should gesture at how it can be done. And if not, we shouldn't be relying on it as a rollout risk mitigation.

I'm sorry I'm joining this conversation late -- for the original KEP, was there any discussion of applying these policies to workloads rather than nodes? Something like upgrading the seccompProfile on admitted pods to RuntimeDefault if unspecified. That way the rollout could be done (or avoided) on a per-application basis, which reduces rollout risk and increases the overall likelihood that more cluster workloads are made subject to a basic seccomp profile.

@saschagrunert
Copy link
Member Author

… but, for that, they have to write their own report-mode seccomp filter. Maybe we can write that filter for them?

We followed the idea of having container runtimes define the profiles and not kubernetes. Sure, we could add documentation about how to modify the default profile to "log instead of block". We could also add more docs about how to investigate the audit logs on a node.

There's also a ton of cluster-specific complexity hiding behind "[i]f it's possible to monitor audit logs within the cluster". Do we think that is possible in many/most customer clusters? If so, we should gesture at how it can be done. And if not, we shouldn't be relying on it as a rollout risk mitigation.

I definitely see your point. I think we also can enhance the docs (or write a blog post and link it to the docs) about how this can be done in technical detail. Sure, we will not cover every platform/setup, but for example most systems ship auditd which is a good starting point.

I'm sorry I'm joining this conversation late -- for the original KEP, was there any discussion of applying these policies to workloads rather than nodes? Something like upgrading the seccompProfile on admitted pods to RuntimeDefault if unspecified. That way the rollout could be done (or avoided) on a per-application basis, which reduces rollout risk and increases the overall likelihood that more cluster workloads are made subject to a basic seccomp profile.

I could imagine doing this by using a selector, but how would that be specified? We could add it to the kubelet configuration, but I'm not in favor of mixing application and cluster logic into a single config. We could also add a new annotation/field, but this would require API changes. I think we still can elaborate on adding such a feature on top of the graduation. For example:

  • alpha: everything is off by default, we have a kubelet flag as well as feature gate
  • beta: everything is on by default, but we now have an additional API
  • stable: everything is on, maybe the flag get's deprecated

The point of not changing the API is already part of the KEP, but I see no blocker in revisiting on graduation.

@derekwaynecarr
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 22, 2021
@k8s-ci-robot k8s-ci-robot merged commit b44af54 into kubernetes:master Jun 22, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jun 22, 2021
@derekwaynecarr
Copy link
Member

@mmdriley i appreciate your summary on the distinction between platform owner and application owner with the outlined approach. i agree with @saschagrunert that a workload level choice can be explored later as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants