Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argo CD: PersistentVolumeClaims for volumes #438

Open
docent-net opened this issue Sep 3, 2020 · 31 comments · May be fixed by #1648
Open

Argo CD: PersistentVolumeClaims for volumes #438

docent-net opened this issue Sep 3, 2020 · 31 comments · May be fixed by #1648
Labels
argo-cd awaiting-upstream Is waiting for a change upstream to be completed before it can be merged. enhancement on-hold

Comments

@docent-net
Copy link
Contributor

docent-net commented Sep 3, 2020

Is your feature request related to a problem? Please describe.

I'm trying to mount remote storage to the argocd repo-server. I can provide volumeMounts list as well as volumes list, but I can't create PVC for those volumes using helm chart.

Describe the solution you'd like

volumes.yaml could contain a definition of StorageClass name for PVC. When provided, the volume mount would be created on top of this PVC.

Describe alternatives you've considered

Alternatively, documentation could be extended with information about using PVC and manual creation of PVC.

Additional context

I can see, that similar feature is already implemented in redis-ha:

{{- if .Values.persistentVolume.enabled }}

Also, this request is only about repo-server, but similar could be requested for Redis and other components.

If this looks like something useful, I can provide PR.

@tlvenn
Copy link

tlvenn commented Oct 17, 2020

I think this is more than useful and should definitely comes outside of the box.
@docent-net any chance you can get a stab at this fairly soon ?

@docent-net
Copy link
Contributor Author

Sure, on it :)

@docent-net
Copy link
Contributor Author

@tlvenn I need to ask a question, about this. Read through ArgoCD docs (e.g. here https://argoproj.github.io/argo-cd/operator-manual/high_availability/#argocd-repo-server) and it looked like a straightforward thing - just create volumeClaimTemplates with proper configuration for the PVC and that's all.

However, repo-server can be replicated, so here it hit me, that maybe we'd need a StatefulSet here instead of Deployment, because there is some state.. or not? Unfortunately, I'm not that familiar w/ArgoCD HA, but after going through the documentation and thinking about it I have 2 assumptions to discuss:

  1. ArgoCD is rather a stateless app (but for the Redis, which is a cache, so its data can be lost). We don't really care about contents of the repo-server, and the only "important" directory is /tmp, where repos are cloned. "important" because the only reason we care about mounting it from the remote repo is that its size can grow and Pod disk can run out of space. In this case, when Pod is lost for some reason (node down, Pod was evicted by scheduler) underlying storage can be safely removed by k8s, as it will not be used anymore and its contents are not important. In this case, we can stay with Deployment resource as it's currently defined and basically create new PVs via volumeClaimTemplates.
  2. We care about the state and contents of /tmp and we need to replace Deployment w/Statefulset to be sure, that replicas use the same storage as before.

After writing this whole post I think that the correct answer is nr.1, but just need to validate it w/you :) Thanks

@tlvenn
Copy link

tlvenn commented Nov 23, 2020

Hi @docent-net , I am no expert when it comes to argo cd but my feeling is that the repo server is a cache layer and I would opt for option 1 as well.

@tlvenn
Copy link

tlvenn commented Nov 24, 2020

For illustration purpose, not saying this a good idea, Grafana chart for example supports both approaches:

https://github.com/grafana/helm-charts/blob/main/charts/grafana/templates/deployment.yaml#L1
https://github.com/grafana/helm-charts/blob/main/charts/grafana/templates/statefulset.yaml#L1

@docent-net
Copy link
Contributor Author

I'm going w/opt1 - lets face it - the data is stored in /tmp, so let's treat it as temporary :) Thanks, will get into it

@github-actions
Copy link

github-actions bot commented Apr 1, 2021

Stale issue message

@tlvenn
Copy link

tlvenn commented Apr 9, 2021

@docent-net did you make any progress on this ?

@jbartlet
Copy link

jbartlet commented Apr 22, 2021

I'm a little confused...I am trying to deploy ArgoCD in HA mode and use a PVC for the repo-server to avoid the disk consumption issue, described above. Of course, there is an issue with one PVC. Option 1 above mentions leave as a Deployment and use volumeClaimTemplates....but those aren't supported for Deployments, only SS. Am I missing something? Do I just convert to SS?

@ramanNarasimhan77
Copy link

Can this issue be re-opened? Right now the only possible option without making changes to the helm chart is to use emptyDir

@kaidobit
Copy link

Please reopen this issue:

I'm using Longhorn replicated blockstorage, indeed it is my default storageclass and I would love to make use of it for argocd.
This should be the default behavior at least for non-HA, @tlvenn gave a good example for a working non-HA/HA Chart:

For illustration purpose, not saying this a good idea, Grafana chart for example supports both approaches:

https://github.com/grafana/helm-charts/blob/main/charts/grafana/templates/deployment.yaml#L1
https://github.com/grafana/helm-charts/blob/main/charts/grafana/templates/statefulset.yaml#L1

Furthermore as @docent-net already pointed out, this Chart is partially making use of the ability to set a custom storageclass:

Additional context

I can see, that similar feature is already implemented in redis-ha:

{{- if .Values.persistentVolume.enabled }}

@snuggie12
Copy link

https://kubernetes.io/docs/concepts/storage/_print/#generic-ephemeral-volumes is alpha in 1.19, beta in 1.21 and stable in 1.23.

@pierluigilenoci
Copy link

Please reopen this issue.

@pierluigilenoci
Copy link

@mkilchhofer could you please take a look?

@mkilchhofer
Copy link
Member

@pierluigilenoci I don't understand why someone wants to configure PV on Argo CD components.

We only implement use cases which are supported by the upstream project itself (https://github.com/argoproj/argo-cd).

One thing I am okay with is to add the ability to deploy any kind of resources (extra manifests) like the bitnami charts do.

@snuggie12
Copy link

@mkilchhofer quoted from the official docs:

argocd-repo-server clones repository into /tmp ( of path specified in TMPDIR env variable ). Pod might run out of disk space if have too many repository or repositories has a lot of files. To avoid this problem mount persistent volume.

The problem is they officially tell you to do something without telling you how to.

FWIW, I'm using my suggested approach of ephemeral volumes while keeping it a Deployment but I know that's not available to everyone since it's in 1.21+ as beta:

volume for tmp for repo-server
spec:
  template:
    spec:
      volumes:
      - ephemeral:
          volumeClaimTemplate:
            metadata:
              creationTimestamp: null
              labels:
                type: argocd-repo-server-tmp
            spec:
              accessModes:
              - ReadWriteOnce
              resources:
                requests:
                  storage: 10Gi
              storageClassName: standard-ssd
              volumeMode: Filesystem

@mkilchhofer mkilchhofer changed the title PersistentVolumeClaims for volumes Argo CD: PersistentVolumeClaims for volumes May 25, 2022
@mkilchhofer
Copy link
Member

ACK. That makes sense. There's already an issue upstream for that: argoproj/argo-cd#7927

@mkilchhofer mkilchhofer added the awaiting-upstream Is waiting for a change upstream to be completed before it can be merged. label May 26, 2022
@pierluigilenoci
Copy link

@mkilchhofer I don't have a particular fetish for PVs.

I just wish that in the presence of large caches, a situation aggravated also by some bug (argoproj/argo-cd#4002 and helm/helm#10583), the Repo Server pods are not evicted ('Pod The node had condition: [DiskPressure]. ') or, even worse, other pods are evicted in the same node (exactly what happened in my case).

In any case the idea of allowing the deployment of extra manifests is always a good idea (and if you look at my history on GitHub it is something I particularly appreciate).

Thanks for your help and your time.

@pierluigilenoci
Copy link

It might make sense to link the CNCF's Slack (stranded) discussion as well:
https://cloud-native.slack.com/archives/C020XM04CUW/p1653511733823829

@pierluigilenoci
Copy link

@docent-net any update on this?

@docent-net
Copy link
Contributor Author

@pierluigilenoci sure. I've been following this discussion (as well as that mentioned on Slack and regarding StatefulSets) and I'm not sure right now.

This issue is about Deployment. My original problem was that having replica=1 for Deployment type, I wanted to have persistent storage for the git cache. Thanks to that, in case of POD replacement, my infra wouldn't be impacted with traffic/performance spike due to cache rebuilding / git repos pulling on a higher scale.

And I have implemented that, and it works on my cluster correctly. But I have only one replica.

Now - with replicas>1 there's problem of AccessModes, that might be very specific. Having it set to ReadWriteMany (and that is a requirement for this workload), could mean a huge performance drop due to underlying file system configurations.

Why do I think so? Thinking about possible scenarios, where this PVC would be used I can only think of considerable caches with many, many (possible little) files (that is how git repositories looks like, thinking .git directories, indexes etc). And FS synchronization across nodes with such huge distributed disk might be really problematic. Now think of possible issues, that could occur as a side effect of it. The user has inconsistent git directories because of some consistency problem of underlying DFS.

Having this said, I'm not sure anymore, that this PR is a good idea. StatefulSet might be a better approach here.

I can still raise a PR, but I'm just afraid, that someone someday will have a big headache due to this.

@pierluigilenoci
Copy link

@docent-net sorry for the delay in answering.

The answer you are looking for is Statefulset.
If you have multiple repo-server replicas you must also have multiple disks, one for each replica.
So the solution for assigning 1:1 disks is Statefulset.

Using NFS is a problem of overcomplication and poor performance.
From personal experience I can say that sometimes a cache on NFS is slower than re-loading everything from scratch over the network.

The alternative solution is to use emptyDir but it does not solve the node disk load problem.
Unless you put the disk in memory but in this case the cost increases.

The perfect solution does not exist but something needs to be done.

@aroundthecode
Copy link
Contributor

I've one solution to propose

  1. default deploy behaviour is the current one with emptyDir

  2. if a different volume definition is provided in values file then it's applied as-is, without any additional implementation, leaving final user the desired approach for volume management .

repoServer:
 existingVolumes:
    tmp_dir:
      persistentVolumeClaim:
        claimName: pvc-argocd-repo-server-tmp

I've performed such changes on my forked repo for details: main...aroundthecode:argo-helm:reposerver-pvc

With such apporach user can create PV/PVC outside helm and then attach to the deploy, or leave everything as is.
It could be a quick fix allowing user to avoid diskpressure issue and going on discussing Statefulset migration

@arielly-parussulo
Copy link

Hello there!
Do you intend to open a PR for this solution so we can have it in the oficial Helm Chart? I am trying to migrate an internal chart that we use to the official one and this is one of the features that we would need in order to achieve that.
For now I will create a fork but I think that this config should be in the oficial Helm Chart as it is recommended by the High Availability doc.

@docent-net
Copy link
Contributor Author

@aroundthecode

I've performed such changes on my forked repo for details: main...aroundthecode:argo-helm:reposerver-pvc

So, are you able to raise a PR for this, since you already have it worked out?

@arielly-parussulo
Copy link

I ended up creating a PR to add support to deploy argocd-repo-server as a StatefulSet with PVCs based on what we used in my company:
#1648
Feel free to add suggestions or editing it.

@pierluigilenoci
Copy link

@arielly-parussulo, your PR is a bit stranded...

@aroundthecode
Copy link
Contributor

@aroundthecode

I've performed such changes on my forked repo for details: main...aroundthecode:argo-helm:reposerver-pvc

So, are you able to raise a PR for this, since you already have it worked out?

HI all, I've tried to perform an alternative PR #2410 with my approach on this, not sure if everthing is fine, please check and advise if needed

@snuggie12
Copy link

@aroundthecode oddly enough I was probably about a week away from making a PR since we're about to start using the helm chart. Yours looks more comprehensive though. Thanks!

@aroundthecode
Copy link
Contributor

@snuggie12 ( or anyone) can you please help me with the PR? it seems something is missing in the docs generation but I can't realize what.

I've added the helm-docs comments to my values.yaml section but is seems something is still missing since it's telling me the documentation is outdated.

@snuggie12
Copy link

@snuggie12 ( or anyone) can you please help me with the PR? it seems something is missing in the docs generation but I can't realize what.

I've added the helm-docs comments to my values.yaml section but is seems something is still missing since it's telling me the documentation is outdated.

@aroundthecode It's running helm-docs in the check and then doing a git diff and seeing there's a change to be made. What it wants you to do is run helm-docs yourself and commit the updated README.md. According to this there should be a script which cleanly executes (it mentions a container so I think you'll need some sort of container runtime like docker or containerd/cri.) If that's an issue in your env you can always install the helm plugin yourself.

Or if you just want to cheat you can look at the diff in the GH action output and paste it in the README.md (mostly kidding as you should probably follow their instructions, but wanted to leave it as a last restort.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
argo-cd awaiting-upstream Is waiting for a change upstream to be completed before it can be merged. enhancement on-hold
Projects
None yet
Development

Successfully merging a pull request may close this issue.