Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make termination grace seconds configurable #4681

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

yeya24
Copy link
Contributor

@yeya24 yeya24 commented Mar 25, 2022

Signed-off-by: Ben Ye ben.ye@bytedance.com

Description

We have the usecase to configure the termination grace seconds value of the prometheus statefulset. Now the value is hardcoded as 10m.

Type of change

What type of changes does your code introduce to the Prometheus operator? Put an x in the box that apply.

  • CHANGE (fix or feature that would cause existing functionality to not work as expected)
  • FEATURE (non-breaking change which adds functionality)
  • BUGFIX (non-breaking change which fixes an issue)
  • ENHANCEMENT (non-breaking change which improves existing functionality)
  • NONE (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)

Changelog entry

Please put a one-line changelog entry below. This will be copied to the changelog file during the release process.

Make terminationGracePeriodSeconds configurable for the Prometheus Statefulset

Signed-off-by: Ben Ye <ben.ye@bytedance.com>
@yeya24 yeya24 requested a review from a team as a code owner March 25, 2022 05:32
// Set this value longer than the expected cleanup time for your process.
// Defaults to 600 seconds.
// +optional
TerminationGracePeriodSeconds *int64 `json:"terminationGracePeriodSeconds,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be better to use uint to avoid negative values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add default value using validation marker so we can remove the conditional from statefulset code

// +kubebuilder:default:="600"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the review.

Signed-off-by: Ben Ye <ben.ye@bytedance.com>
Copy link
Contributor

@slashpai slashpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add tests?

// +optional
TerminationGracePeriodSeconds *int64 `json:"terminationGracePeriodSeconds,omitempty"`
// +kubebuilder:default:="600"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since its int type we don't need quotes

// +kubebuilder:default:=600

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated

var minReadySeconds int32
var (
minReadySeconds int32
terminationGracePeriod int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be uint too?

Copy link
Contributor Author

@yeya24 yeya24 Mar 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it is casted from uint64. The pod spec needs int64 not uint64.

@yeya24
Copy link
Contributor Author

yeya24 commented Mar 28, 2022

I have added tests. Can you please help take a look? Thanks!

Copy link
Contributor

@philipgough philipgough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@simonpasquier
Copy link
Contributor

👋 @yeya24 can you share more details about your use case?
In the past (#3433) we've been reluctant adding such field since the requirements were either faster/immediate migrations (but increasing the risk of data corruption) or alleviating sub-optimal performances on Prometheus shutdown (which should rather be addressed in Prometheus directly).

@yeya24
Copy link
Contributor Author

yeya24 commented Mar 28, 2022

As I mentioned in thanos-io/thanos#5255, the issue is on our CNI side. It enables graceful termination on the IPVS side so the connection will remain for 10min in prometheus operator case (the termination seconds is 600s hardcoded). If the promethues itself is down somehow, then thanos sidecar is still available, causing a lot of partial query errors from our Thanos Query for 10m.

@simonpasquier
Copy link
Contributor

@yeya24 sorry for the late follow-up but I'm not sure to understand exactly the scenario. You want to configure a shorter termination grace period in case Prometheus is stuck?

@yeya24
Copy link
Contributor Author

yeya24 commented Apr 20, 2022

@yeya24 sorry for the late follow-up but I'm not sure to understand exactly the scenario. You want to configure a shorter termination grace period in case Prometheus is stuck?

Yes. If it is stuch in case the backend storage like Ceph is not responsive, then it doesn't make sense to wait for 10m to ensure data is written successfully because the storage is done.
In this case, I want to stop it ASAP.

Copy link
Member

@paulfantom paulfantom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the promethues itself is down somehow, then thanos sidecar is still available

That sounds like a thanos sidecar issue, not prometheus-operator or prometheus one.


I would be against configurable terminationGracePeriodSeconds. In the majority of cases tweaking it can lead to unexpected data loss. I also understand the case that @yeya24 is making about CNI and CSI being unresponsive and that in those cases fast termination is beneficial. However, those issues look like edge cases that can happen in particular critical scenarios which are most likely handled directly by users via kubectl. And if that is the case then kubectl delete pod <prometheus> --grace-period=0 is likely a better way.

@yeya24
Copy link
Contributor Author

yeya24 commented May 10, 2022

That sounds like a thanos sidecar issue, not prometheus-operator or prometheus one.
I agree and this was solved on that side already.

In the majority of cases tweaking it can lead to unexpected data loss

If data loss is fine and users want to have a termination time < 10m. I think this use case is still valid since 10m might not fit all users. In our case we want a smaller duration like 5m. What about other use cases that think 10m is too short and they want to have a longer duration like 1h?

My point is that an operator should provide some way to allow users to configure k8s native fields like terminationGracePeriodSeconds. Users can take this risk if they really want to, like us.

@rnaveiras
Copy link

Hi, team; it's been a while since the last messages in this PR.

I'm interested in the feature. However, my use case is different. We want to increase beyond the default 10 minutes because, in some cases, that is not enough for a graceful shutdown if you've enabled the feature flag for snapshot in-memory chunks. See details prometheus/prometheus#7229

Our setup required more than 10 minutes to complete the chunk snapshots successfully.

/cc @yeya24 I'm happy to collaborate to get this PR in good shape again.

@simonpasquier
Copy link
Contributor

@rnaveiras more than 10 minutes for the chunks snapshot seems a bit extreme. Do you have an explanation why it takes so long? Have you tried reporting the issue to prometheus/prometheus?

Having said that, we had many requests in the past to customize the termination grace periods and though sometimes the justification could be challenged, we also agreed in #4691 that we shouldn't block such customization if there was high demand from the community and no alternative existed.
In summary, feel free to resurrect this pull request (sharing credits with @yeya24 of course).

@github-actions github-actions bot removed the stale label Nov 23, 2023
@github-actions github-actions bot added the stale label Jan 22, 2024
rnaveiras added a commit to rnaveiras/prometheus-operator that referenced this pull request Mar 21, 2024
It makes the pod.spec TerminationGracePeriodSeconds configurable via the
CRDs for prometheus and prometheusagent

Fixes prometheus-operator#3433
Closes prometheus-operator#4681

Co-authored-by: Ben Ye <ben.ye@bytedance.com>
Signed-off-by: Raul Navieras <me@raulnaveiras.com>
rnaveiras added a commit to rnaveiras/prometheus-operator that referenced this pull request Mar 28, 2024
It makes the pod.spec TerminationGracePeriodSeconds configurable via the
CRDs for prometheus and prometheusagent

Fixes prometheus-operator#3433
Closes prometheus-operator#4681

Co-authored-by: Ben Ye <ben.ye@bytedance.com>
Signed-off-by: Raul Navieras <me@raulnaveiras.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants