Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Statefulset Support #137

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

rvc-mf
Copy link

@rvc-mf rvc-mf commented May 19, 2022

Description

Adds support for statefulset deployments.

This pull request introduces the following changes:

  • Adds an optional field called workloadType to values.yaml, where the default value is deployment but statefulset is also an option.
  • Adds a _statefulset_spec.tpl template which gets used when workloadType is set to statefulset. In turn, _deployment_spec_tpl only gets used when workloadType is set to deployment.
  • Add an optional updateStrategy field to values.yaml which gets injected into updateStrategy on statefulset workloads (workloadType: statefulset). deploymentStrategy continues getting injected into strategy on deployment workloads.
  • Add an optional name field under service due to Statefulset workload requirements. service could still have enabled: false if the service resource is created outside of the chart.

None of these changes are breaking, as statefulset support is optional and off-by-default. Pre-existing charts should be able to use this version without encountering issues. Similarly, I don't expect any downtime from the changes.

Documentation

I have updated the README in /charts/k8s-service to indicate that statefulsets are also deployable by the chart. I have also added proper documentation on each field that was added/edited in the chart's values.yaml.

I have also added three tests that test the correctness of statefulset canary deployments, similar to how deployment workloads are tested. You can find the output here: https://gist.github.com/rvc-mf/c2c4c82713cd8da4a95a3bf079d43cb4

Furthermore, I have successfully deployed several statefulset applications using these changes (previously made in a "private" fork) in other projects.

TODOs

Please ensure all of these TODOs are completed before asking for a review.

  • Ensure the branch is named correctly with the issue number. e.g: feature/new-vpc-endpoints-955 or bug/missing-count-param-434.
  • Update the docs.
  • Keep the changes backward compatible where possible.
  • Run the pre-commit checks successfully.
  • Run the relevant tests successfully.

Related Issues

Addresses #136

 - Adds appropriate documentation to Chart README
 - Introduces statefulset spec template
 - Adds appropriate tests
 - Adds appropriate test fixtures
 - Adds appropriate templates
@rvc-mf rvc-mf changed the title [WIP] Adds Statefulset Support - Issue 136 Adds Statefulset Support - Issue 136 May 19, 2022
@rvc-mf rvc-mf changed the title Adds Statefulset Support - Issue 136 Adds Statefulset Support May 19, 2022
@rvc-mf
Copy link
Author

rvc-mf commented Sep 29, 2022

Bumping this up as it's been a couple of months with no response. @autero1 Is there anything missing from this PR that I should work on before it's ready for review?

@mloskot
Copy link

mloskot commented Apr 11, 2023

It is a useful addition indeed 👍

@rimoore
Copy link

rimoore commented May 18, 2023

@yorinasub17 this is a good feature.

@yorinasub17
Copy link
Contributor

Hiya! Responding because I was tagged - unfortunately, I am no longer with Gruntwork so I can't do anything to move this along 😞
Sorry I can't be of help here!

@rvc-mf
Copy link
Author

rvc-mf commented Aug 14, 2023

Bumping this again. @rhoboat @zackproser @autero1

@zackproser
Copy link
Contributor

Hi @rvc-mf - none of the folks you've pinged here work at Gruntwork any longer.

@mloskot
Copy link

mloskot commented Aug 20, 2023

AFAICT, this project is orphaned and should be archived to indicate the unmaintained state.

@brandon-langley-mf
Copy link

brandon-langley-mf commented Aug 21, 2023

AFAICT, this project is orphaned and should be archived to indicate the unmaintained state.

How is it an orphaned project if it's actively being used by the k8s-service terraform module? Let alone a module you released not 2 weeks ago now.

@rvc-mf
Copy link
Author

rvc-mf commented Aug 21, 2023

Hi @rvc-mf - none of the folks you've pinged here work at Gruntwork any longer.

@zackproser I pinged the people who were assigned to review the PR automatically. Where can I find a list of people who can look at these changes?

@arsci
Copy link

arsci commented Aug 21, 2023

Hi @rvc-mf, I'll make sure this PR gets routed to the right SME for review.

@mloskot
Copy link

mloskot commented Aug 21, 2023

AFAICT, this project is orphaned and should be archived to indicate the unmaintained state.

How is it an orphaned project if it's actively being used by the k8s-service terraform module? Let alone a module you released not 2 weeks ago now.

I use the charts/k8s-service in my own IasC and I've been watching this project for quite a while, and based on at least two

  • this fairly trivial non-intrusive PR has been dandling for almost 1.5 year without any conclusive decision
  • my PodDisruptionBudget created even if minPodsAvailable=0 #156 about quite a serious issue, at least from point of AKS users, has been dangling without a single word of response for half a year now

tells the gut feeling of my 20+ years of experience as an open source developer and contributor that this project has been orphaned.

@josh-padnick
Copy link
Contributor

Hi folks, Gruntwork co-founder and Chief Product Officer here. I'm sorry for the delays and lack of response from our team. As a few folks have mentioned, all those Grunts subscribed to this thread happen to be no longer with the company, so this escaped our radar screen.

But as of today, it escapes us no more. As @arsci said, we'll route this to the right SMEs and get an official response on priority here.

@ryehowell
Copy link
Contributor

Hey all, as @josh-padnick mentioned, apologies for the lack of traction on this one. I just got this added to our project board to get prioritized for a review. Feel free to tag me for anything related to this repo as I am now one of the maintainers of this project. Review soon to come!

@josh-padnick josh-padnick requested review from arsci and ryehowell and removed request for yorinasub17, zackproser, autero1 and rhoboat August 21, 2023 22:49
@mloskot
Copy link

mloskot commented Aug 21, 2023

Awesome news, thank you!

Copy link
Contributor

@ryehowell ryehowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, thank you for the contribution! StatefulSet support would be a great addition to the k8s-service chart.

Here are my thoughts on the initial PR:

  • I don't think having a canary configured for the StatefulSet is the best path for adding support for StatefulSets. StatefulSets are typically deployed with persistence and adding a canary deployment will likely be problematic for any workload that is deployed with any kind of app data or state stored on volumes (such as DB's). Usually the updateStrategy that is built into the controller is used for updates (RollingUpdate or OnDelete). There are other strategies as well (more complex, but possible), but reusing the canary strategy that is built for the Deployment spec likely isn't the way to go.
  • An additional Service spec (headless service) should be added to support the StatefulSet as they require a headless service for the network identity of the Pods. This would likely be an additional service manifest file specifically for the headless service.
  • One of the key elements of StatefulSets are their persistence (typically why StatefulSets are chosen over the other Controllers). Having full configuration support for persistence should be available day 1.

Feel free to share your thoughts on this and open up the discussion for brainstorming any ideas.

 - updates tests
 - updates helper files/functions
@rvc-mf
Copy link
Author

rvc-mf commented Sep 12, 2023

@ryehowell , I believe I've accounted for all the changes you've mentioned in your comment, but I'm running into issues with the testing suite. I can't seem to get it to work, even for tests that cover files that I didn't change. See the error log below (this is on a file I have not changed):

$ go test -v -tags all -timeout 60m -run TestK8SServiceCanaryDeployment
# (...)
TestK8SServiceCanaryDeployment 2023-09-12T07:36:48-06:00 logger.go:66: Error: INSTALLATION FAILED: services "k8s-service-canary-ykmv5h-canary-test" already exists
=== NAME  TestK8SServiceCanaryDeployment
    install.go:16: 
                Error Trace:    $HOME/projects/helm-kubernetes-services/test/install.go:16
                                                        $HOME/projects/helm-kubernetes-services/test/k8s_service_canary_deployment_test.go:53
                Error:          Received unexpected error:
                                error while running command: exit status 1; Error: INSTALLATION FAILED: services "k8s-service-canary-ykmv5h-canary-test" already exists
                Test:           TestK8SServiceCanaryDeployment
TestK8SServiceCanaryDeployment 2023-09-12T07:36:48-06:00 logger.go:66: Running command helm with args [delete --namespace k8s-service-canary-ykmv5h k8s-service-canary-ykmv5h]
TestK8SServiceCanaryDeployment 2023-09-12T07:36:49-06:00 logger.go:66: release "k8s-service-canary-ykmv5h" uninstalled
TestK8SServiceCanaryDeployment 2023-09-12T07:36:49-06:00 client.go:42: Configuring Kubernetes client using config file $HOME/.kube/config with context 
--- FAIL: TestK8SServiceCanaryDeployment (0.47s)
FAIL
exit status 1
FAIL    github.com/gruntwork-io/helm-kubernetes-services/test   0.488s

I tried debugging the error and it seems like the error happens at the helm module ("github.com/gruntwork-io/terratest/modules/helm") in one of the RunCommand steps. Here's an example command the helm.Install code generates:

Running command helm with args [install --namespace k8s-service-ykmv5h -f /home/rogvc/projects/helm-kubernetes-services/charts/k8s-service/linter_values.yaml -f /home/rogvc/projects/helm-kubernetes-services/test/fixtures/main_statefulset_values.yaml k8s-service-ykmv5h /home/rogvc/projects/helm-kubernetes-services/charts/k8s-service]

Any tips on how to proceed?

@ryehowell
Copy link
Contributor

Hey @rvc-mf , apologies for the delay here. We've been working on getting some new features out for our EKS module. I'll do another review of this and take a look at the failing test as well.

@mateimicu, would you be available to help review this? I've got this in my queue, but wanted see if you would like to take a look at this as well 😄 to cast a wider net on this.

@mateimicu
Copy link
Contributor

Hey @rvc-mf , apologies for the delay here. We've been working on getting some new features out for our EKS module. I'll do another review of this and take a look at the failing test as well.

@mateimicu, would you be available to help review this? I've got this in my queue, but wanted see if you would like to take a look at this as well 😄 to cast a wider net on this.

@ryehowell sure :D, I will take a look today

@mloskot
Copy link

mloskot commented Oct 6, 2023

FYA, although I'm in no position to provide a technical review of this PR, I would like to endorse it as a user of this project.
I have been using the change in this PR in a setup of a few AKS clusters for 6 months now and w/o any issues.
Thank you @rvc-mf rv and others involved.

@mateimicu
Copy link
Contributor

@ryehowell , I believe I've accounted for all the changes you've mentioned in your comment, but I'm running into issues with the testing suite. I can't seem to get it to work, even for tests that cover files that I didn't change. See the error log below (this is on a file I have not changed):

$ go test -v -tags all -timeout 60m -run TestK8SServiceCanaryDeployment
# (...)
TestK8SServiceCanaryDeployment 2023-09-12T07:36:48-06:00 logger.go:66: Error: INSTALLATION FAILED: services "k8s-service-canary-ykmv5h-canary-test" already exists
=== NAME  TestK8SServiceCanaryDeployment
    install.go:16: 
                Error Trace:    $HOME/projects/helm-kubernetes-services/test/install.go:16
                                                        $HOME/projects/helm-kubernetes-services/test/k8s_service_canary_deployment_test.go:53
                Error:          Received unexpected error:
                                error while running command: exit status 1; Error: INSTALLATION FAILED: services "k8s-service-canary-ykmv5h-canary-test" already exists
                Test:           TestK8SServiceCanaryDeployment
TestK8SServiceCanaryDeployment 2023-09-12T07:36:48-06:00 logger.go:66: Running command helm with args [delete --namespace k8s-service-canary-ykmv5h k8s-service-canary-ykmv5h]
TestK8SServiceCanaryDeployment 2023-09-12T07:36:49-06:00 logger.go:66: release "k8s-service-canary-ykmv5h" uninstalled
TestK8SServiceCanaryDeployment 2023-09-12T07:36:49-06:00 client.go:42: Configuring Kubernetes client using config file $HOME/.kube/config with context 
--- FAIL: TestK8SServiceCanaryDeployment (0.47s)
FAIL
exit status 1
FAIL    github.com/gruntwork-io/helm-kubernetes-services/test   0.488s

I tried debugging the error and it seems like the error happens at the helm module ("github.com/gruntwork-io/terratest/modules/helm") in one of the RunCommand steps. Here's an example command the helm.Install code generates:

Running command helm with args [install --namespace k8s-service-ykmv5h -f /home/rogvc/projects/helm-kubernetes-services/charts/k8s-service/linter_values.yaml -f /home/rogvc/projects/helm-kubernetes-services/test/fixtures/main_statefulset_values.yaml k8s-service-ykmv5h /home/rogvc/projects/helm-kubernetes-services/charts/k8s-service]

Any tips on how to proceed?

Hi @rvc-mf,
Thanks for the PR. I will need some time to give it a good read.

Let's first try and unblock you.

I managed to run the command you provided locally, go test -v -tags all -timeout 60m -run TestK8SServiceCanaryDeployment from the main branch.

I think this fails because there is a name collision.
The original services name is defined here as name: {{ include "k8s-service.fullname" . }}
The default configuration for the headless service you introduced is defined here and unless overwritten by .Values.headlessService.name collides with the original one.

  {{- if .Values.headlessService.name }}
  name: {{ .Values.headlessService.name }}
  {{- else }}
  name: {{ include "k8s-service.fullname" . }}
  {{- end }}

@rvc-mf
Copy link
Author

rvc-mf commented Oct 24, 2023

@mateimicu Thanks for the reply. I was able to find the issue and solve it on my latest changes. I'm now running into another issue that I think has more to do with my docker + minikube environment than the code changes.

I checked out the main version of the repository and I'm trying to run the tests to get my environment up to speed before retesting my new changes, but I am still getting errors when doing so.

Has anyone in your team seen a similar error to this when running the testing suite?

Error we get when we add 'minikube' to /etc/hosts

=== NAME TestK8SServiceCanaryDeployment
k8s_service_example_test_helpers.go:234:
Error Trace: /home/rogvc/temp/helm-kubernetes-services/test/k8s_service_example_test_helpers.go:234
/home/rogvc/temp/helm-kubernetes-services/test/retry.go:68
/home/rogvc/temp/helm-kubernetes-services/test/retry.go:93
/home/rogvc/temp/helm-kubernetes-services/test/retry.go:68
/home/rogvc/temp/helm-kubernetes-services/test/retry.go:57
/home/rogvc/temp/helm-kubernetes-services/test/k8s_service_example_test_helpers.go:232
/home/rogvc/temp/helm-kubernetes-services/test/k8s_service_canary_deployment_test.go:59
Error: Received unexpected error:
Get "http://minikube:32085": dial tcp: lookup minikube on 15.122.222.53:53: no such host
Test: TestK8SServiceCanaryDeployment
TestK8SServiceCanaryDeployment 2023-10-24T11:45:27-06:00 logger.go:66: Running command helm with args [delete --namespace k8s-service-canary-hkznz5 k8s-service-canary-hkznz5]
TestK8SServiceCanaryDeployment 2023-10-24T11:45:27-06:00 logger.go:66: release "k8s-service-canary-hkznz5" uninstalled
TestK8SServiceCanaryDeployment 2023-10-24T11:45:27-06:00 client.go:42: Configuring Kubernetes client using config file /home/rogvc/.kube/config with context
--- FAIL: TestK8SServiceCanaryDeployment (26.76s)
FAIL

Error we get when we don't add it:

=== NAME TestK8SServiceCanaryDeployment
k8s_service_example_test_helpers.go:234:
Error Trace: /home/csmcclain/temp/helm-kubernetes-services/test/k8s_service_example_test_helpers.go:234
/home/csmcclain/temp/helm-kubernetes-services/test/retry.go:68
/home/csmcclain/temp/helm-kubernetes-services/test/retry.go:93
/home/csmcclain/temp/helm-kubernetes-services/test/retry.go:68
/home/csmcclain/temp/helm-kubernetes-services/test/retry.go:57
/home/csmcclain/temp/helm-kubernetes-services/test/k8s_service_example_test_helpers.go:232
/home/csmcclain/temp/helm-kubernetes-services/test/k8s_service_canary_deployment_test.go:59
Error: Received unexpected error:
Get "http://minikube:30961": dial tcp: lookup minikube on 127.0.0.53:53: server misbehaving
Test: TestK8SServiceCanaryDeployment
TestK8SServiceCanaryDeployment 2023-10-24T13:28:40-06:00 logger.go:66: Running command helm with args [delete --namespace k8s-service-canary-tomrer k8s-service-canary-tomrer]
TestK8SServiceCanaryDeployment 2023-10-24T13:28:40-06:00 logger.go:66: release "k8s-service-canary-tomrer" uninstalled
TestK8SServiceCanaryDeployment 2023-10-24T13:28:40-06:00 client.go:42: Configuring Kubernetes client using config file /home/csmcclain/.kube/config with context
--- FAIL: TestK8SServiceCanaryDeployment (15.92s)
FAIL

If y'all don't use Docker + Minikube to run tests, how do you set up a kubernetes cluster to run the test suite?

@rvc-mf
Copy link
Author

rvc-mf commented Nov 1, 2023

Tried running the main testing suite using EKS+Fargate, a local rancher cluster, and minikube + qemu2 but still got similar issues.

Would appreciate some direction on how to set up a proper testing environment to get unblocked on this PR.

@rvc-mf
Copy link
Author

rvc-mf commented Nov 16, 2023

Update: got test suite running by giving minikube a static ip on start (thanks @brandon-langley-mf ).

See gist for updated test results: https://gist.github.com/rvc-mf/c2c4c82713cd8da4a95a3bf079d43cb4

PR is ready for code review once again 👍🏼

@mateimicu
Copy link
Contributor

Tried running the main testing suite using EKS+Fargate, a local rancher cluster, and minikube + qemu2 but still got similar issues.

Would appreciate some direction on how to set up a proper testing environment to get unblocked on this PR.

Hi @rvc-mf,

minikube works definitely on macOS / Linux / Windows.
The test is trying to do a HTTP request to the service exposed as a NodePort. Depending on your Kubernetes setup.

This is what I use for MacOS

Start minikube like this minikube start --driver=docker --extra-config=apiserver.service-node-port-range=32760-32767 --ports=127.0.0.1:32760-32767:32760-32767. This will:

  • limit the port-ranges a nodePort object can have using the API Server configuration
  • expose the exact same range locally
  • the last step is the add 127.0.0.1 minikube in /etc/hosts

How to debug this in general

To check how to do this in your setup, you can:

  1. Keep the helm release after the test finishes. You can do this by removing the line here
  2. Now check how you can access the Service. You can find the service by running kubectl svc -A. It is the one that starts with k8s-service-canary-.

@mateimicu
Copy link
Contributor

Update: got test suite running by giving minikube a static ip on start (thanks @brandon-langley-mf ).

Awesome 🥳

Let me pull the latest and test it

@mateimicu mateimicu self-requested a review November 17, 2023 15:49
@brandon-langley-mf
Copy link

Update: got test suite running by giving minikube a static ip on start (thanks @brandon-langley-mf ).

Awesome 🥳

Let me pull the latest and test it

@mateimicu fyi, we found two tests are failing that are also failing in the release tag as well. They look like examples that are unfortuantely tagged in such a way they are pulled into the 'all' and 'integration' tests. We did not adjust them because we weren't sure if this was intentional or not - is this something we should open an issue for?

@mateimicu
Copy link
Contributor

Hi @brandon-langley-mf

If you have some details a new issue would be awesome to track it.

@brandon-langley-mf
Copy link

@mateimicu, @ryehowell, any updates on this one?

@mateimicu
Copy link
Contributor

Sorry @brandon-langley-mf this slipped on my side. Let me jump on it 👀

Copy link
Contributor

@mateimicu mateimicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another look at the PR. I tried to set up a Zookeeper with this module, a fairly common use case of a stateful set.

Unfortunately, it is not straightforward, and the default values must be evaluated for the stateful case.

There is also an overload of features present only with some workload types. One option to avoid this complexity is to create another helm module under helm-kubernetes-services/charts/k8s-stateful-service with a separate feature set specifically tailored to stateful services.

@@ -367,6 +397,8 @@ minPodsAvailable: 0
# Kubernetes defaults to None.
# - sessionAffinityConfig (object) : Configuration for session affinity, as defined in Kubernetes
# (https://kubernetes.io/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies)
# - name (required for statefulset : The name of the service associated with this deployment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: required for statefulset makes configuring this more complicated. And in Helm, we can't validate for proper configuration.

suggestion: If it is about the headless service I would propose {{ include "k8s-service.fullname" . }}-headless as a default name to avoid having a conflict with the default configuration.

charts/k8s-service/values.yaml Outdated Show resolved Hide resolved
- name: {{ $name }}
emptyDir: {}
{{- end }}
{{- /* END VOLUME LOGIC */ -}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): One key feature of stateful sets is the volumeClaimTemplates. What is the ability to configure this? Is it expected to happen via another section?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mateimicu Is this something that could be added in a future iteration of this feature? Maybe this PR represents the initial bare-bones support that @mloskot and @rvc-mf are already using in their deployments 🤔

Co-authored-by: Matei <code@mateimicu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet