Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage in repo server for plugin detection with >8.000 apps #15763

Open
3 tasks done
woehrl01 opened this issue Oct 2, 2023 · 22 comments
Open
3 tasks done

High CPU usage in repo server for plugin detection with >8.000 apps #15763

woehrl01 opened this issue Oct 2, 2023 · 22 comments
Labels
bug Something isn't working component:cmp Config Management Plugin related issues

Comments

@woehrl01
Copy link
Contributor

woehrl01 commented Oct 2, 2023

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

Using 2.8.3 of argocd we can see high cpu usages in the repo server for detecting the plugins.

We are using a huge monorepo for our applications, without any templating (just plain yaml). But the detection of plugins take a significant amount of time.

Flame graph with pixie:
Bildschirmfoto 2023-09-08 um 13 47 16

Bildschirmfoto 2023-09-08 um 13 46 59

Another one on cleanup:
Bildschirmfoto 2023-09-08 um 12 57 08

Slack discussion: https://cloud-native.slack.com/archives/C01TSERG0KZ/p1694514516286809?thread_ts=1694175483.721089&cid=C01TSERG0KZ

CC: @csantanapr

To Reproduce

Apply thousand of apps at the same time

Expected behavior

Apply them "fast"

Screenshots

Version

v2.8.3+77556d9
@woehrl01 woehrl01 added the bug Something isn't working label Oct 2, 2023
@woehrl01 woehrl01 changed the title High CPU usage in repo server for plugin detection with >7.000 apps High CPU usage in repo server for plugin detection with >8.000 apps Oct 2, 2023
@crenshaw-dev
Copy link
Collaborator

Do you actually use any CMPs, or is all that truly completely wasted CPU?

@woehrl01
Copy link
Contributor Author

woehrl01 commented Oct 2, 2023

@crenshaw-dev I use a CMP, but not in that repository.

@crenshaw-dev
Copy link
Collaborator

Gotcha. We could cache the discovery result on a per-commit basis, but my guess is that you're hitting the high CPU use with new commits.

An alternative would be to explicitly set helm kustomize or directory in the spec.source field. That should force Argo to bypass the CMP detection phase.

@woehrl01
Copy link
Contributor Author

woehrl01 commented Oct 2, 2023

No actually, it's the same commit, but I have a mono repo, so it does the resolving 8.000 times for each root folder of the apps.

Great, I'll check the directory part

@JuozasVainauskas
Copy link

JuozasVainauskas commented Oct 3, 2023

We encountered the same issue on v2.7.11. After migrating plugins from ArgoCD-cm to sidecars, CPU and memory usage skyrocketed. Consequently, argocd-repo-server pods started to get throttled, ArgoCD slowed down and eventually got stuck. Bumping argocd-repo-server resources requests and limits did not help. Therefore, we had to revert the changes.

As a result, we can not use ArgoCD sidecar plugins and are blocked from updating ArgoCD to v2.8

Resource usage increase after plugins migration to sidecars:
Screenshot 2023-10-03 at 13 28 01

@crenshaw-dev
Copy link
Collaborator

@woehrl01 this might also help mitigate the issue if your monorepo is large due to non-yaml resources: https://argo-cd.readthedocs.io/en/latest/operator-manual/config-management-plugins/#plugin-tar-stream-exclusions

@JuozasVainauskas
Copy link

Gotcha. We could cache the discovery result on a per-commit basis, but my guess is that you're hitting the high CPU use with new commits.

An alternative would be to explicitly set helm kustomize or directory in the spec.source field. That should force Argo to bypass the CMP detection phase.

Could you please elaborate on this solution?

@woehrl01
Copy link
Contributor Author

woehrl01 commented Oct 3, 2023

Thanks @crenshaw-dev the repo only consists of yaml files, but I still use it to exclude the .git folder.

I also experience that I have to lower the parallel repo actions from 50 to 5 otherwise I'll end up in a strange deadlock situation. Could be because of the plugin detect, too.

@crenshaw-dev
Copy link
Collaborator

@JuozasVainauskas Argo CD only does plugin "discovery" if you haven't explicitly specified in your App manifest that you want something besides a plugin. For example:

kind: Application
spec:
  source:
    kustomize:
      images: [a=b]

For this app, Argo CD would skip plugin discovery because it automatically knows it'll be using Kustomize instead.

@JuozasVainauskas
Copy link

@JuozasVainauskas Argo CD only does plugin "discovery" if you haven't explicitly specified in your App manifest that you want something besides a plugin. For example:

kind: Application
spec:
  source:
    kustomize:
      images: [a=b]

For this app, Argo CD would skip plugin discovery because it automatically knows it'll be using Kustomize instead.

Understood, thank you. Unfortunately, this will not help us since we use plugins by name instead of discovery.

@woehrl01
Copy link
Contributor Author

woehrl01 commented Oct 4, 2023

@crenshaw-dev I just deployed the fix with the directory across all our clusters. The CPU usage of the repo-server has not changed (but isn't an issue yet), I'll monitor and keep you updated.

@todaywasawesome
Copy link
Contributor

Comments from @crenshaw-dev - When you add a CMP, all apps now have to query that CMP to see if it can be handled. This is by design to keep potential issues out of repo server. However, it does create a performance penalty if you add a single CMP for a single app because all apps have to check against that CMP. Will review to see if we should architect differently.

@todaywasawesome
Copy link
Contributor

Related proposal: #15006

@todaywasawesome
Copy link
Contributor

Another suggested stop-gap: Support a feature flag to disable discovery.

@alexmt suggests keeping it disabled by default.

@JuozasVainauskas
Copy link

We managed to keep CPU usage under control by setting --parallelismlimit flag. However, after argocd-cm plugins migration to sidecars, CPU usage still increased significantly and ArgoCD got slower. As a result, we can not migrate our argocd-cm plugins to sidecars and upgrade ArgoCD instances to 2.8.x

Screenshot 2023-10-06 at 13 36 56

@JuozasVainauskas
Copy link

Update: we have successfully solved the performance issue by setting --plugin-tar-exclude value to .git/* and migrated argocd-cm plugins to sidecars.

@sidewinder12s
Copy link

Potentially unrelated I had wondered if it might not be easier/better if we could configure the plugin-tar as inclusive per plugin rather than globally and an exclusion list. At least in large monorepos it's much easier to decide what I want to send to the CMP rather than trying to exclude.

@woehrl01
Copy link
Contributor Author

woehrl01 commented Oct 18, 2023

@crenshaw-dev

We just did a redeploy of about 6.000 apps today, with the fix of assigning the directory and bypassing the plugin detection, we have now received a really awesome deployment time of about 20 minutes. CPU usage of the repo servers is also great!

Bildschirmfoto 2023-10-18 um 12 38 25

CPU usage of repo server:
Bildschirmfoto 2023-10-18 um 12 35 11

Possible optimization points to further improve the performance is getting rid of the multiple git operations considering that it's a mono repo and a single commit which triggered the redeploy:

Bildschirmfoto 2023-10-18 um 12 43 35

@sidewinder12s
Copy link

I can also confirm huge perf improvements by adding:

directory:
  includes: '*'

To our directory argocd apps (we only have maybe 30-50 of them). git_ms timing from our repo-server logs went from 40s to 20s

@ctrought
Copy link

Close to 70 cores peak for a repo server pod in one of our clusters.

ArgoCD 2.8.4

image

@silveiralexf
Copy link

silveiralexf commented Feb 8, 2024

Our problem seems quite similar to the ones from folks in the thread... we have a big mono repo and a high number of applications (+8k), and cloning the same repo for each app seems to be the cause of the performance problems.

When using the plugin as an initContainer in previous versions of ArgoCD (2.4.18), the same plugin synchronizes all apps and resources in just a few seconds (~20s), in the other hand, when using it as a sidecar it takes around 20 minutes, lots of CPU, and often never fully completes...

Question: Is it possible/recommended to still use plugins as InitContainer instead of changing to the new sidecar approach? had the impression the option was removed from v2.8+ versions but couldn't tell for sure from the docs so far...

Also, is there any way to avoid the multiple cloning of the same repo that we might have missed from the docs?

Thanks a ton in advance for any insights!!!

@leoluz leoluz added the component:cmp Config Management Plugin related issues label Feb 28, 2024
@jsolana
Copy link

jsolana commented Apr 24, 2024

Hi, also related: #17948 and #17951

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component:cmp Config Management Plugin related issues
Projects
None yet
Development

No branches or pull requests

9 participants