[Bug] Pod.status subresource not working after upgrade from 1.9.4 #10182

PhilippMT · 2024-05-06T11:50:49Z

Kyverno Version

1.12.0

Kubernetes Version

1.29.x

Kubernetes Platform

EKS

Kyverno Rule Type

Mutate

Description

Hello,

I just upgraded kyverno from 1.9.4 to the latest 1.12.1 with a complete new installation. I have a ClusterPolicy which was working fine with 1.9.4 but is not working with the latest 1.12.1. I also tried version 1.11.4 but with the same result. The ClusterPolicy was triggered as soon as a pod of the datadog DaemonSet is in the ready state. The policy mutates the node the pod runs on and removes a startup taint.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: datadog-agent-startup-taint
spec:
  failurePolicy: Fail
  rules:
    - name: remove-startup-taint
      match:
        any:
          - resources:
              kinds:
                - v1/Pod.status
              namespaces:
                - datadog
              names:
                - datadog-*
      preconditions:
        all:
          - key: '{{request.object.metadata.ownerReferences[0].kind}}'
            operator: Equals
            value: DaemonSet
          - key: '{{request.object.metadata.ownerReferences[0].name}}'
            operator: Equals
            value: datadog
          - key: "{{ to_string((request.object.status.containerStatuses[?name == 'agent'].ready)[0] || false ) }}"
            operator: Equals
            value: 'true'
          - key: '{{request.operation}}'
            operator: In
            value:
              - UPDATE
      # Mutates the Deployment resource to add fields.
      mutate:
        targets:
          - apiVersion: v1
            kind: Node
            name: '{{request.object.spec.nodeName}}'
        patchStrategicMerge:
          spec:
            taints: "{{ target.spec.taints[?key != 'node.datadog.eu/agent-not-ready'] }}"

These are the values I deployed the helm chart with

    config:
      resourceFiltersIncludeNamespaces:
        - flux-system
        - karpenter
      resourceFiltersExclude:
        - '[Node,*,*]'
        - '[Node/*,*,*]'
    features:
      logging:
        format: json
    admissionController:
      rbac:
        clusterRole:
          extraResources:
            - apiGroups:
                - ''
              resources:
                - nodes
              verbs:
                - update
                - list
                - get
      replicas: 3
      priorityClassName: system-cluster-critical
      tolerations:
        - key: node.datadog.eu/agent-not-ready
          operator: Exists
          effect: NoSchedule
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
                - key: eks.amazonaws.com/nodegroup
                  operator: Exists
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
      podDisruptionBudget:
        enabled: true
      initContainer:
        resources:
          requests: {cpu: 10m, memory: 64Mi}
          limits: {cpu: 100m, memory: 256Mi}
      container:
        resources:
          requests: {cpu: 100m, memory: 128Mi}
          limits: {memory: 384Mi}
    backgroundController:
      rbac:
        clusterRole:
          extraResources:
            - apiGroups:
                - ''
              resources:
                - nodes
              verbs:
                - update
                - list
                - get
      replicas: 2
      priorityClassName: system-cluster-critical
      tolerations:
        - key: node.datadog.eu/agent-not-ready
          operator: Exists
          effect: NoSchedule
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
                - key: eks.amazonaws.com/nodegroup
                  operator: Exists
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
      podDisruptionBudget:
        enabled: true
      resources:
        requests: {cpu: 100m, memory: 64Mi}
        limits: {memory: 128Mi}
    cleanupController:
      rbac:
        clusterRole:
          extraResources: []
      replicas: 2
      priorityClassName: system-cluster-critical
      tolerations:
        - key: node.datadog.eu/agent-not-ready
          operator: Exists
          effect: NoSchedule
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
                - key: eks.amazonaws.com/nodegroup
                  operator: Exists
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
      podDisruptionBudget:
        enabled: true
      resources:
        requests: {cpu: 100m, memory: 64Mi}
        limits: {memory: 128Mi}
    reportsController:
      rbac:
        clusterRole:
          extraResources: []
      replicas: 2
      priorityClassName: system-cluster-critical
      tolerations:
        - key: node.datadog.eu/agent-not-ready
          operator: Exists
          effect: NoSchedule
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
                - key: eks.amazonaws.com/nodegroup
                  operator: Exists
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
      podDisruptionBudget:
        enabled: true
      resources:
        requests: {cpu: 100m, memory: 64Mi}
        limits: {memory: 128Mi}

The generated webbhook configuration looks fine to me

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  annotations:
    admissions.enforcer/disabled: "true"
  creationTimestamp: "2024-05-06T11:08:32Z"
  generation: 11
  labels:
    webhook.kyverno.io/managed-by: kyverno
  name: kyverno-resource-validating-webhook-cfg
  resourceVersion: "1422173941"
  uid: b7d4d77e-0caf-441a-ba22-8ba4bcb4fa95
webhooks:
- admissionReviewVersions:
  - v1
  clientConfig:
    caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM3VENDQWRXZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFZTVJZd0ZBWURWUVFERE>    service:
      name: kyverno-svc
      namespace: kyverno
      path: /validate/fail
      port: 443
  failurePolicy: Fail
  matchPolicy: Equivalent
  name: validate.kyverno.svc-fail
  namespaceSelector:
    matchExpressions:
    - key: kubernetes.io/metadata.name
      operator: NotIn
      values:
      - kube-system
    - key: kubernetes.io/metadata.name
      operator: NotIn
      values:
      - kyverno
  objectSelector: {}
  rules:
  - apiGroups:
    - ""
    apiVersions:
    - v1
    operations:
    - CREATE
    - UPDATE
    - DELETE
    - CONNECT
    resources:
    - configmaps
    - pods
    - pods/ephemeralcontainers
    - pods/status
    scope: Namespaced
  - apiGroups:
    - apps
    apiVersions:
    - v1
    operations:
    - CREATE
    - UPDATE
    - DELETE
    - CONNECT
    resources:
    - daemonsets/status
    - deployments/status
    scope: Namespaced
  sideEffects: NoneOnDryRun
  timeoutSeconds: 10

The "pod.status" subresource seams not to work in general anymore. I also created an example with generate rules which creates ConfigMaps for UPDATES on "Deployment.status" and "DaemonSet.status" which is working as expected, but nothing happends on "Pod.status" updates.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: check
spec:
  generateExisting: true
  rules:
  - name: generate-config-map
    match:
      any:
      - resources:
          kinds:
          - Pod.status
          operations:
          - UPDATE
    generate:
      kind: ConfigMap
      synchronize: true
      apiVersion: v1
      name: test-pod-abc
      namespace: kyverno
      data:
        data:
          test: |
            Hello World
  - name: generate-daemonset-config-map
    match:
      any:
      - resources:
          kinds:
          - DaemonSet.status
          operations:
          - UPDATE
    generate:
      kind: ConfigMap
      synchronize: true
      apiVersion: v1
      name: test-daemonset-abc
      namespace: kyverno
      data:
        data:
          test: |
            Hello World
  - name: generate-deployment-config-map
    match:
      any:
      - resources:
          kinds:
          - Deployment.status
          operations:
          - UPDATE
    generate:
      kind: ConfigMap
      synchronize: true
      apiVersion: v1
      name: test-deployment-abc
      namespace: kyverno
      data:
        data:
          test: |
            Hello World

Does anybody has an advise for me if there is a misconfiguration?

Best regards
Philipp

Steps to reproduce

Deploy the last ClusterPolicy "check"
Manually delete the pod of any DaemonSet (not in kyverno or kube-system namespaces)
Manually scale any Deployment (not in kyverno or kube-system namespaces)
Wait for the pods to become ready

Result: The 2 configMaps "test-daemonset-abc" and "test-deployment-abc" but not the configMap "test-pod-abc".

Expected behavior

I expect the ClusterPolicy to react on pod.status Update events.

Screenshots

No response

Kyverno logs

No error logs and also no logs which gives any information about the ClusterPolicy is executed.

Slack discussion

No response

Troubleshooting

I have read and followed the documentation AND the troubleshooting guide.
I have searched other issues in this repository and mine is not recorded.

welcome · 2024-05-06T11:50:52Z

Thanks for opening your first issue here! Be sure to follow the issue template!

chipzoller · 2024-05-06T12:56:14Z

  - key: '{{request.operation}}'
    operator: In
    value:
      - UPDATE

The In operator no longer functions as stated in the release notes for 1.12.0 here. Either move this to operations[] in the match block or specify it like:

  - key: '{{request.operation}}'
    operator: Equals
    value: UPDATE

PhilippMT · 2024-05-09T18:40:53Z

The In operator no longer functions as stated in the release notes for 1.12.0 here. Either move this to operations[] in the match block or specify it like:

Hi,

sorry that was my fault. This is the ClusterPolicy I currently use:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: datadog-agent-startup-taint
spec:
  failurePolicy: Fail
  background: false
  webhookTimeoutSeconds: 30
  mutateExistingOnPolicyUpdate: false
  rules:
    - name: remove-startup-taint
      match:
        any:
          - resources:
              kinds:
                - v1/Pod.status
              namespaces:
                - datadog
              names:
                - datadog-*
      preconditions:
        all:
          - key: '{{request.object.metadata.ownerReferences[0].kind}}'
            operator: Equals
            value: DaemonSet
          - key: '{{request.object.metadata.ownerReferences[0].name}}'
            operator: Equals
            value: datadog
          - key: "{{ to_string((request.object.status.containerStatuses[?name == 'agent'].ready)[0] || false ) }}"
            operator: Equals
            value: 'true'
      # Mutates the Deployment resource to add fields.
      mutate:
        targets:
          - apiVersion: v1
            kind: Node
            name: '{{request.object.spec.nodeName}}'
        patchStrategicMerge:
          spec:
            taints: "{{ target.spec.taints[?key != 'node.datadog.eu/agent-not-ready'] }}"

I also tried to add the update operations

- resources:
      kinds:
        - v1/Pod.status
      namespaces:
        - datadog
      names:
        - datadog-*
      operations:
        - UPDATE

It is not working, even if I completely remove the whole preconditions sequence. It looks like kyverno does not handle pod.status updates at all. I also tried all kinds v1/Pod.status, v1/Pod/status, Pod.status and Pod/status.

I also tried to remove all other selectors except for kinds and operations:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: datadog-agent-startup-taint
spec:
  admission: true
  background: false
  failurePolicy: Fail
  mutateExistingOnPolicyUpdate: false
  rules:
  - match:
      any:
      - resources:
          kinds:
          - v1/Pod.status
          operations:
          - UPDATE
    mutate:
      patchStrategicMerge:
        spec:
          taints: '{{ target.spec.taints[?key != ''node.datadog.eu/agent-not-ready'']
            }}'
      targets:
      - apiVersion: v1
        kind: Node
        name: '{{request.object.spec.nodeName}}'
    name: remove-startup-taint
    skipBackgroundRequests: true
  validationFailureAction: Audit
  webhookTimeoutSeconds: 30

I do get some error logs from the backgroud-controller now, ~~even if background: false is set.~~

{
	"content": {
		"service": "background-controller",
		"message": "",
		"attributes": {
			"caller": "mutate/mutate.go:165",
			"level": "error",
			"resource": "v1/Pod/[namespace removed]/[podname removed]",
			"logger": {
				"name": "background"
			},
			"name": "ur-kdxxd",
			"error": "failed to mutate existing resource, rule remove-startup-taint, response error: : failed to substitute variables in target[0].Name {{request.object.spec.nodeName}}, value: <nil>, err: failed to resolve request.object.spec.nodeName at path : JMESPath query failed: Unknown key \"nodeName\" in path",
			"ts": "2024-05-09T18:22:53Z",
			"policy": "datadog-agent-startup-taint"
		}
	}
}

and

cannot generate events for empty target resource

I do get these errors as long as the pod is pending, and the status is updated, e.g. by the cluster-autoscaler. But as soon as the pod is scheduled and is started, kyverno does not handle any further pod.status update events e.g. when a container states changes.

chipzoller · 2024-06-04T12:18:34Z

@PhilippMT, remove the excludeGroups: system:nodes line in the Kyverno ConfigMap or change the value to something other than system:nodes. This will probably fix the issue. Kyverno is filtering out requests by groups from system:nodes which is the group in which kubelet can be found and is responsible for updating Pod status.

PhilippMT added bug Something isn't working triage Default label assigned to all new issues indicating label curation is needed to fully organize. labels May 6, 2024

realshuting added this to the Kyverno Release 1.12.3 milestone May 22, 2024

realshuting assigned MariamFahmy98 May 23, 2024

realshuting modified the milestones: Kyverno Release 1.12.3, Kyverno Release 1.12.4 May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Pod.status subresource not working after upgrade from 1.9.4 #10182

[Bug] Pod.status subresource not working after upgrade from 1.9.4 #10182

PhilippMT commented May 6, 2024 •

edited by chipzoller

welcome bot commented May 6, 2024

chipzoller commented May 6, 2024

PhilippMT commented May 9, 2024 •

edited by chipzoller

chipzoller commented Jun 4, 2024

[Bug] Pod.status subresource not working after upgrade from 1.9.4 #10182

[Bug] Pod.status subresource not working after upgrade from 1.9.4 #10182

Comments

PhilippMT commented May 6, 2024 • edited by chipzoller

Kyverno Version

Kubernetes Version

Kubernetes Platform

Kyverno Rule Type

Description

Steps to reproduce

Expected behavior

Screenshots

Kyverno logs

Slack discussion

Troubleshooting

welcome bot commented May 6, 2024

chipzoller commented May 6, 2024

PhilippMT commented May 9, 2024 • edited by chipzoller

chipzoller commented Jun 4, 2024

PhilippMT commented May 6, 2024 •

edited by chipzoller

PhilippMT commented May 9, 2024 •

edited by chipzoller