You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When consuming the local setup with extensions for Azure environment, if the shoot hosting the local azure seed is hibernated, and then brought back up, the local-seed fails to recover.
This happens because during hibernation all the pods are removed including the ones from garden, registry, kyverno namespaces.
When the shoot which hosts the local seed is brought up, the gardenlet pod fails to come up as it throws ImagePullErr as it can't fetch the image from registry cache as the service is unable to serve in absence of any endpoints as there are no pods running in this namespace.
Checking the sts in registry namespace reveals that it cannot create pod as it is not able to reach the mutating webhook topology.azure.extensions.gardener.cloud as the service is not reachable.
Checking the extensions namespace, no pods are running and the replica set is failing as it is not able to reach gardener-resource-manager in garden namespace, which is down for the same reason as gardenlet with ImagePullErr as registry svc has no endpoints.
(local seed) --> (gardenlet/grm) --- > (registry) --- (topoogy.aware.extesnsion) ---> (grm)
The seed can not be reconciled due to the above deadlock.
Mitigation
This was fixed by ignoring the namespace registry for the topology.azure.extensions.gardener.cloud mutating webhook. This was done by labelling the namespace registry with values which qualify the ignore selector for the webhook.
k label ns registry gardener.cloud/role=extension
Additionally we also labelled kyverno namespace as the kyverno-svc has no endpoints for the same reasons and this stops make gardener-extensions-up to work as well.
k label ns kyverno gardener.cloud/role=extension
Side note
This doesn't happen for dev seeds as:
They are never hibernated.
The registry pod runs in kube-system namespace which is ignored by the azure extensions mutating webhook for topology.
What you expected to happen:
It should be possible to recover the local seed when waking up the hosting shoot in Azure.
An easy fix is to add the labels as used above when creating kyverno and registry namespace in the local-extension based setup.
How to reproduce it (as minimally and precisely as possible):
Create a local setup with extension on Azure with a local seed and local shoot running.
Hibernate the dev shoot hosting the local-seed and cp of local-shoot.
Wake up the dev shoot.
The local-seed will not be become ready as gardenlet and grm are down for the reasons quoted above.
Anything else we need to know?:
Environment:
Gardener version:
Kubernetes version (use kubectl version):
Cloud provider or hardware configuration:
Others:
The text was updated successfully, but these errors were encountered:
How to categorize this issue?
/area dev-productivity
/kind bug
What happened:
Symptoms & Observation
When consuming the local setup with extensions for Azure environment, if the shoot hosting the local azure seed is hibernated, and then brought back up, the local-seed fails to recover.
This happens because during hibernation all the pods are removed including the ones from
garden
,registry
,kyverno
namespaces.When the shoot which hosts the local seed is brought up, the
gardenlet
pod fails to come up as it throwsImagePullErr
as it can't fetch the image from registry cache as the service is unable to serve in absence of any endpoints as there are no pods running in this namespace.Checking the sts in registry namespace reveals that it cannot create pod as it is not able to reach the mutating webhook
topology.azure.extensions.gardener.cloud
as the service is not reachable.Checking the extensions namespace, no pods are running and the replica set is failing as it is not able to reach gardener-resource-manager in garden namespace, which is down for the same reason as gardenlet with ImagePullErr as registry svc has no endpoints.
(local seed) --> (gardenlet/grm) --- > (registry) --- (topoogy.aware.extesnsion) ---> (grm)
The seed can not be reconciled due to the above deadlock.
Mitigation
This was fixed by ignoring the namespace
registry
for thetopology.azure.extensions.gardener.cloud
mutating webhook. This was done by labelling the namespaceregistry
with values which qualify the ignore selector for the webhook.Additionally we also labelled
kyverno
namespace as the kyverno-svc has no endpoints for the same reasons and this stopsmake gardener-extensions-up
to work as well.Side note
This doesn't happen for dev seeds as:
kube-system
namespace which is ignored by the azure extensions mutating webhook for topology.What you expected to happen:
It should be possible to recover the local seed when waking up the hosting shoot in Azure.
An easy fix is to add the labels as used above when creating
kyverno
andregistry
namespace in the local-extension based setup.How to reproduce it (as minimally and precisely as possible):
The local-seed will not be become ready as gardenlet and grm are down for the reasons quoted above.
Anything else we need to know?:
Environment:
kubectl version
):The text was updated successfully, but these errors were encountered: