Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator throwing error in an endless loop "too old resource version" #2354

Closed
fhalde opened this issue Apr 19, 2024 · 11 comments
Closed

Operator throwing error in an endless loop "too old resource version" #2354

fhalde opened this issue Apr 19, 2024 · 11 comments

Comments

@fhalde
Copy link

fhalde commented Apr 19, 2024

Bug Report

What did you do?

We are not sure of the events that led to this. It started occurring suddenly. A restart has fixed it though but the operator was non-functional by this time i.e. it was not reconciling anything

What did you expect to see?

No errors

What did you see instead? Under which circumstances?

Our operator is throwing the following in a endless loop

2024-04-19 08:59:46,858 i.f.k.c.d.i.AbstractWatchManager [ERROR] Received an error which is not a status but {"type":"ERROR","object":{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"too old resource version: 31159423 (31160199)","reason":"Expired","code":410}} - will retry

Environment

Kubernetes cluster type:
EKS

$ Mention java-operator-sdk version from pom.xml file

4.8.2

$ java -version

openjdk version "21.0.2" 2024-01-16 LTS
OpenJDK Runtime Environment Corretto-21.0.2.13.1 (build 21.0.2+13-LTS)
OpenJDK 64-Bit Server VM Corretto-21.0.2.13.1 (build 21.0.2+13-LTS, mixed mode, sharing)

$ kubectl version

Client Version: v1.29.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.1-eks-b9c9ed7

Possible Solution

Additional context

Unfortunately no, this error has no logs prior to it and it just started occuring out of the blue. We are using 6.10.0 fabric8 client

@fhalde
Copy link
Author

fhalde commented Apr 19, 2024

actually, fabric8 6.11.0 is in the dependency tree

@csviri
Copy link
Collaborator

csviri commented Apr 19, 2024

This seems to be an issue with the watches in fabric8 client.
cc @manusa @shawkins

@shawkins
Copy link
Collaborator

Classloading issues make this logic subseptiable to this problem - fabric8io/kubernetes-client#5692

We could consider making the deserialization here to just generic instead, but more than likely the user will want to fix having more than one definition of Status in the classpath.

@fhalde
Copy link
Author

fhalde commented Apr 19, 2024

Hi @shawkins

but more than likely the user will want to fix having more than one definition of Status in the classpath

I'm not sure what is this Status you are referring to. Are you saying I look at my mvn dependency:tree?

@fhalde
Copy link
Author

fhalde commented Apr 19, 2024

this is how the deps look like

[INFO] +- io.javaoperatorsdk:operator-framework:jar:4.8.2:compile
[INFO] |  +- io.javaoperatorsdk:operator-framework-core:jar:4.8.2:compile
[INFO] |  |  \- io.fabric8:kubernetes-client:jar:6.11.0:compile
.
.
.
.
[INFO] +- io.strimzi:api:jar:0.40.0:compile
[INFO] |  +- io.fabric8:kubernetes-client-api:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-gatewayapi:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-resource:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-rbac:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-admissionregistration:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-apps:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-autoscaling:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-batch:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-certificates:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-coordination:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-discovery:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-events:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-extensions:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-flowcontrol:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-metrics:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-policy:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-scheduling:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-storageclass:jar:6.10.0:compile
[INFO] |  |  +- io.fabric8:kubernetes-model-node:jar:6.10.0:compile
[INFO] |  |  +- org.snakeyaml:snakeyaml-engine:jar:2.7:compile
[INFO] |  |  \- com.fasterxml.jackson.datatype:jackson-datatype-jsr310:jar:2.16.0:compile
[INFO] |  +- io.fabric8:kubernetes-model-core:jar:6.10.0:compile
[INFO] |  +- io.fabric8:kubernetes-model-networking:jar:6.10.0:compile
[INFO] |  +- io.fabric8:kubernetes-model-common:jar:6.10.0:compile
[INFO] |  +- io.fabric8:kubernetes-model-apiextensions:jar:6.10.0:compile

@fhalde
Copy link
Author

fhalde commented Apr 19, 2024

is it better to keep the fabric8 version consistent?

@shawkins
Copy link
Collaborator

@fhalde yes it is especially if you don't have a flat classloader and end up with two different Status class definitions accessible from different classloaders.

@fhalde
Copy link
Author

fhalde commented Apr 20, 2024

hmm, we definitely don't make use of any classloaders. is this some fabric8 internals? anyway here is what my fat jar contents look like

jar -tvf operator.jar | grep '/Status.class'
 io/javaoperatorsdk/operator/health/Status.class
 io/strimzi/api/kafka/model/kafka/Status.class
 io/fabric8/kubernetes/api/model/Status.class
 org/apache/logging/log4j/core/util/internal/Status.class
 ch/qos/logback/core/status/Status.class

@shawkins

@metacosm
Copy link
Collaborator

Can you try to make sure that the fabric8 client version that gets put into your fat jar is the same version as the one used by JOSDK?

@fhalde
Copy link
Author

fhalde commented Apr 29, 2024

Hi @metacosm , we were running our operator with a single version of fabric8 for a few days and today this error came up once again

here is what i could gather by attaching a debugger. the status message was unmarshalled into a GenericKubernetesResource class rather than Status. Weirdly the error stopped after a while after I attached a remote debugger

If this comes up once again i'll let you know.

@csviri
Copy link
Collaborator

csviri commented May 27, 2024

will close this issue, pls let us know if that happens again.

@csviri csviri closed this as completed May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants