Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major breakage detected in master (4.11-SNAPSHOT) due to #2292 #2499

Closed
FWiesner opened this issue Sep 18, 2020 · 4 comments
Closed

Major breakage detected in master (4.11-SNAPSHOT) due to #2292 #2499

FWiesner opened this issue Sep 18, 2020 · 4 comments
Assignees
Labels

Comments

@FWiesner
Copy link
Contributor

the fix for #2292 (rev 2652139) is breaking our solution as Kubernetes is not reliably returning HTTP 409, but at times can also respond with error 500 with a message to please retry. This is confirmed with Kubernetes v1.16.7 up to v1.19.1.

2020-09-17 15:57:36.963+0200 |  | ::: | .175.80.197:6443/... |  WARN | .i.WatchConnectionManager |  | Exec Failure2020-09-17 15:57:36.963+0200 |  | ::: | .175.80.197:6443/... |  WARN | .i.WatchConnectionManager |  | Exec Failureio.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://10.175.80.197:6443/apis/networking.istio.io/v1beta1/namespaces/dx-system/gateways. Message: The POST operation against Gateway.networking.istio.io could not be completed at this time, please try again.. Received status: Status(apiVersion=v1, code=500, details=StatusDetails(causes=[], group=networking.istio.io, kind=Gateway, name=POST, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=The POST operation against Gateway.networking.istio.io could not be completed at this time, please try again., metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=ServerTimeout, status=Failure, additionalProperties={}). at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:589) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:528) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:492) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:451) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:252) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:841) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:332) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:402) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:396) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:82) at com.oracle.cx.verticals.dx4c.config.k8s.istio.EgressGatewayReconciler.mergePortsFromServiceEntry(EgressGatewayReconciler.java:156) at com.oracle.cx.verticals.dx4c.config.k8s.istio.EgressGatewayReconciler.lambda$onServiceEntryChanged$3(EgressGatewayReconciler.java:62) at java.base/java.util.Optional.ifPresentOrElse(Optional.java:201) at com.oracle.cx.verticals.dx4c.config.k8s.istio.EgressGatewayReconciler.findGatewayAndRunOrElse(EgressGatewayReconciler.java:102) at com.oracle.cx.verticals.dx4c.config.k8s.istio.EgressGatewayReconciler.onServiceEntryChanged(EgressGatewayReconciler.java:61) at com.oracle.cx.verticals.dx4c.config.k8s.istio.ServiceEntryReconciler.performSubordinateIstioUpdates(ServiceEntryReconciler.java:273) at com.oracle.cx.verticals.dx4c.config.k8s.istio.ServiceEntryReconciler.mergeToExistingServiceEntry(ServiceEntryReconciler.java:265) at com.oracle.cx.verticals.dx4c.config.k8s.istio.ServiceEntryReconciler.lambda$createNewOrMergeWithServiceEntry$2(ServiceEntryReconciler.java:159) at java.base/java.util.Optional.ifPresentOrElse(Optional.java:201) at com.oracle.cx.verticals.dx4c.config.k8s.istio.ServiceEntryReconciler.createNewOrMergeWithServiceEntry(ServiceEntryReconciler.java:155) at com.oracle.cx.verticals.dx4c.config.k8s.istio.ServiceEntryReconciler.lambda$realizeServiceEntryChanges$1(ServiceEntryReconciler.java:148) at java.base/java.util.ArrayList.forEach(ArrayList.java:1540) at com.oracle.cx.verticals.dx4c.config.k8s.istio.ServiceEntryReconciler.realizeServiceEntryChanges(ServiceEntryReconciler.java:144) at com.oracle.cx.verticals.dx4c.config.k8s.istio.ServiceEntryReconciler.onTICChange(ServiceEntryReconciler.java:100) at com.oracle.cx.verticals.dx4c.config.k8s.targetinstanceconfig.TICReconciler.onTICUpdated(TICReconciler.java:101) at com.oracle.cx.verticals.dx4c.config.k8s.OldBaseReconciler.lambda$onResourceAddedOrModified$5(OldBaseReconciler.java:87) at com.oracle.cx.verticals.dx4c.config.k8s.OldBaseReconciler.onEvent(OldBaseReconciler.java:41) at com.oracle.cx.verticals.dx4c.config.k8s.OldBaseReconciler.onResourceAddedOrModified(OldBaseReconciler.java:74) at com.oracle.cx.verticals.dx4c.config.k8s.targetinstanceconfig.TICReconciler.lambda$startWatch$1(TICReconciler.java:74) at com.oracle.cx.verticals.dx4c.config.k8s.resources.BaseResourceHandler.handle(BaseResourceHandler.java:209) at com.oracle.cx.verticals.dx4c.config.k8s.resources.BaseResourceHandler.handleActionWithCacheUpdate(BaseResourceHandler.java:169) at com.oracle.cx.verticals.dx4c.config.k8s.targetinstanceconfig.TICReconciler.lambda$startWatch$2(TICReconciler.java:68) at com.oracle.cx.verticals.dx4c.config.k8s.resources.BaseResourceHandler$1.eventReceived(BaseResourceHandler.java:75) at com.oracle.cx.verticals.dx4c.config.k8s.resources.BaseResourceHandler$1.eventReceived(BaseResourceHandler.java:69) at com.oracle.cx.verticals.dx4c.config.k8s.resources.BaseResourceHandler$WrappedWatcher.eventReceived(BaseResourceHandler.java:250) at com.oracle.cx.verticals.dx4c.config.k8s.resources.BaseResourceHandler$WrappedWatcher.eventReceived(BaseResourceHandler.java:235) at io.fabric8.kubernetes.client.utils.WatcherToggle.eventReceived(WatcherToggle.java:49) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:235) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)

We will locally make a change to unblock, but this needs a fix upstream IMHO

@FWiesner
Copy link
Contributor Author

FWiesner commented Sep 18, 2020

quick hack for lines 397 onwards (as of 1832b75)with Thread.sleep and no timeout...

final CompletableFuture<T> future = new CompletableFuture<>();
    while (!future.isDone()) {
      try {
        // Create
        KubernetesResourceUtil.setResourceVersion(itemToCreateOrReplace, null);
        future.complete(create(itemToCreateOrReplace));
      } catch (KubernetesClientException exception) {
        final T itemFromServer;
        if (exception.getCode() == HttpURLConnection.HTTP_INTERNAL_ERROR) {
          itemFromServer = fromServer().get();
          if (itemFromServer == null) {
            try {
              Thread.sleep(200);
            } catch (InterruptedException e) {
              Thread.currentThread().interrupt();
            }
            continue;
          }
        } else if (exception.getCode() != HttpURLConnection.HTTP_CONFLICT) {
          throw exception;
        } else {
          itemFromServer = fromServer().get();
        }

        // Conflict; Do Replace
        KubernetesResourceUtil.setResourceVersion(itemToCreateOrReplace, KubernetesResourceUtil.getResourceVersion(itemFromServer));
        future.complete(replace(itemToCreateOrReplace));
      }
    }
    return future.join();

@manusa manusa added the bug label Sep 18, 2020
@coopstah13
Copy link
Contributor

A lesser problem, but this also breaks tests using the KubernetesCrudDispatcher as the dispatcher is not implemented with the same behavior as is now required by the createOrReplace method. The dispatcher just blindly adds the object to the map, so it is random which you will get in any subsequent request.

@manusa manusa assigned manusa and rohanKanojia and unassigned manusa Oct 19, 2020
@rohanKanojia
Copy link
Member

Since PR sent by Florian #2501 is merged and available in v4.13.0, can we consider this issue closed? Or is there something missing?

@FWiesner
Copy link
Contributor Author

FWiesner commented Nov 17, 2020 via email

@manusa manusa closed this as completed Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants