-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] fleet-agent fails to deploy on an RKE2 Windows Custom cluster #39372
Comments
This is most directly related to kubernetes/kubernetes#102849 and seems like an issue with the manifest used to deploy Given the fact that this has been observed between Fleet However, based on Fleet's own codebase, which adds TLDR: the issue here is with a discrepancy between the I can confirm this is the case by simply looking at the apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
meta.helm.sh/release-name: <REDACTED>
meta.helm.sh/release-namespace: cattle-fleet-system
objectset.rio.cattle.io/applied: <REDACTED>
objectset.rio.cattle.io/id: fleet-agent-bootstrap
creationTimestamp: <REDACTED>
generation: 2
labels:
app.kubernetes.io/managed-by: Helm
objectset.rio.cattle.io/hash: <REDACTED>
managedFields: <REDACTED>
name: fleet-agent
namespace: cattle-fleet-system
resourceVersion: <REDACTED>
uid: <REDACTED>
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: fleet-agent
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: fleet-agent
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: fleet.cattle.io/agent
operator: In
values:
- "true"
weight: 1
containers:
- env:
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: AGENT_SCOPE
- name: CHECKIN_INTERVAL
value: 0s
- name: GENERATION
value: bundle
image: rancher/fleet-agent:v0.5.0
imagePullPolicy: IfNotPresent
name: fleet-agent
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
runAsGroup: 1000
runAsNonRoot: true
runAsUser: 1000
serviceAccount: fleet-agent
serviceAccountName: fleet-agent
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
operator: Equal
value: "true"
- effect: NoSchedule
key: cattle.io/os
operator: Equal
value: linux
status:
... Since this is a pure Fleet issue and the fix has been diagnosed, I'm moving this issue back over to the Fleet team to address. |
closing in favor of the above linked ticket |
Rancher Server Setup
Information about the Cluster
User Information
Describe the bug
When performing an HA Upgrade on an RKE2 Custom Windows Cluster, the new pod created by the fleet-agent deployment fails to come up, causing the fleet-agent deployment itself to eventually timeout and fail.
To Reproduce
Result
Note that the fleet-agent deployment remains in an Updating state for a time, until it eventually goes into a Failed State. A new fleet-agent pod is generated with the image rancher/fleet-agent:v0.5.0-rc2, and the pod goes into a Containercreating/Waiting state and never completes. The Events logs shows several warnings from the kubelet of the pod (see screenshots).
Expected Result
The new pod that is created and the fleet-agent deployment both become Active without error.
Screenshots
The text was updated successfully, but these errors were encountered: