Events on CRDs cause full cluster discovery #523

torfjor · 2023-05-25T10:54:23Z

Hi!

We have a setup where a central admin cluster running Argo CD is managing Applications on a fleet of workload clusters. Our central admin cluster connects to the worker clusters through Anthos Fleet and the Connect Gateway. The worker clusters are a mix of Anthos Bare Metal and GKE.

We ran into an issue where we hit the default Connect Gateway API quota with only two registered workload clusters and a handful of deployed Applications. Investigation showed that Argo CD was performing full API discovery requests on registered workload clusters multiple times per minute. Further investigation led us to this event loop in gitops-engine, where c.startMissingWatches() performs a non-cached discovery of the target cluster each time a CRD changes.

This turns out to be problematic for GKE clusters with Backup for GKE enabled, because the system-provided addon-manager will patch its CRDs very often:

Looking at the Connect Gateway API traffic you can see the sharp drop when we added a resource exclusion on gkebackup.gke.io/*

(PS: The last sudden spike was caused by us temporarily removing the resource exclusion)

For our use case, having gkebackup.gke.io/* excluded is totally fine. We contacted Google Support about the issue, and the rapidly patched CRDs is intended behaviour. Their immediate response to the chatty nature of Argo CD was to just raise the quota for affected customers.

Writing up this issue because it might not be very evident for people running Argo or Flux targeting GKE clusters unless they have good visibility into their API server traffic.

Possibly related:

Resource API discovery results do not appear to come from a cache #451

Update 05-31-2023:

Just heard back from the product team for Backup for GKE and a fix for the rapidly patched CRDs will be rolled out next week.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Events on CRDs cause full cluster discovery #523

Events on CRDs cause full cluster discovery #523

torfjor commented May 25, 2023 •

edited

Events on CRDs cause full cluster discovery #523

Events on CRDs cause full cluster discovery #523

Comments

torfjor commented May 25, 2023 • edited

torfjor commented May 25, 2023 •

edited