Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seldon-container-engine container spec does not allow additional securityContext parameters #5481

Open
st-bf opened this issue Mar 26, 2024 · 0 comments
Labels

Comments

@st-bf
Copy link

st-bf commented Mar 26, 2024

Describe the bug

When the seldon-core-operator services a SeldonDeployment kind, a seldon-container-engine spec is attached to the Pod spec for the model using the seldon-core-executor image. The securityContext fields for this container are hard-coded and can never include other standard securityContext settings such as runAsNonRoot or readOnlyRootFilesystem. For clarity, please note that my usage of "securityContext" refers to Kubernetes objects while "SecurityContext" refers to the object contained in the createEngineContainer() function.

This is because the SecurityContext object is composed directly with only two supported fields. No attempts are made to merge other securityContext parameters and the supported field names are hard-coded. Specifically, the createEngineContainer() function in seldon-core/operator/controllers/seldondeployment_engine.go:

340   if engineUser != nil {
341     escalationDefault := false
342     c.SecurityContext = &corev1.SecurityContext{RunAsUser: engineUser, AllowPrivilegeEscalation: &escalationDefault}
343   }

To reproduce

Define a SeldonDeployment with a fully-formed predictor/component spec in an environment with a gatekeeper.sh policy which requires all containers to include runAsNonRoot: true and/or readOnlyRootFilesystem: true in their securityContext map. The gatekeeper.sh validation webhook will deny admission of the model Pod(s) because the programmatically-generated seldon-container-engine container spec is incapable of providing additional securityContext values such as these.

Expected behaviour

The primary model container (for example, 'qna-coldstart-large') properly inherits the securityContext settings defined in the SeldonDeployment predictor component spec. The programmatically-generated seldon-container-engine container should provide a similar mechanism for specifying securityContext fields other than allowPrivilegeEscalation and runAsUser.

Possible Implementations

Idea 1:

Allow executor securityContext fields to be specified in a SeldonDeployment and, if provided, merge those values into the SecurityContext object constructed by the createEngineContainer() function. Values from the SeldonDeployment should prevail over the hard-coded fields inserted by the function. While I accept that this a significant schema change to the SeldonDeployment kind, this approach does allow extensibility if/when future securityContext fields are supported (or possibly mandated) in a future version of the control plane.

Idea 2:

Attempt to gather securityContext values from the PredictorSpec passed to the function. Values from the PredictorSpec should prevail over the hard-coded fields inserted by the function. This also scales and may be less intrusive than changing the construct of the SeldonDeployment kind at the root level.

Idea 3:

Add an environment variable to govern the addition of the { ReadOnlyRootFilesystem: true } element into the SecurityContext object creation in the createEngineContainer() function. This value would only be included in the SecurityContext map if the environment variable literally matches the word "true", and omitted if it is any other value or nil/undefined/blank.

Separately, in the existing check for (engineUser != nil), the { RunAsNonRoot: true } flag should hypothetically always be present when the container is running as (engineUser).

This is not particularly scalable if future securityContext fields are supported by the control plane, but it does isolate the change to a small block of code without adjusting the schema of any other object kinds.

Environment

EKS cluster with a gatekeeper.sh policy requiring additional securityContext parameters for all running containers. In this specific instance, securityContext.readOnlyRootFileSystem and securityContext.runAsNonRoot are explicitly required.

  • Cloud Provider: AWS
  • Kubernetes Cluster Version: 1.27
  • Deployed Seldon System Images: seldonio/seldion-core-operator

Of note, I found this bug while working with seldon-core-operator version 1.16, but this problem still exists in the master branch as of the time of writing.

@st-bf st-bf added the bug label Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant