Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make termination grace seconds configurable #4681

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions Documentation/api.md
Expand Up @@ -810,6 +810,7 @@ PrometheusSpec is a specification of the desired behavior of the Prometheus clus
| enforcedLabelValueLengthLimit | Per-scrape limit on length of labels value that will be accepted for a sample. If a label value is longer than this number post metric-relabeling, the entire scrape will be treated as failed. 0 means no limit. Only valid in Prometheus versions 2.27.0 and newer. | *uint64 | false |
| enforcedBodySizeLimit | EnforcedBodySizeLimit defines the maximum size of uncompressed response body that will be accepted by Prometheus. Targets responding with a body larger than this many bytes will cause the scrape to fail. Example: 100MB. If defined, the limit will apply to all service/pod monitors and probes. This is an experimental feature, this behaviour could change or be removed in the future. Only valid in Prometheus versions 2.28.0 and newer. | ByteSize | false |
| minReadySeconds | Minimum number of seconds for which a newly created pod should be ready without any of its container crashing for it to be considered available. Defaults to 0 (pod will be considered available as soon as it is ready) This is an alpha field and requires enabling StatefulSetMinReadySeconds feature gate. | *uint32 | false |
| terminationGracePeriodSeconds | Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request. Value must be non-negative integer. The value zero indicates stop immediately via the kill signal (no opportunity to shut down). If this value is nil, the default grace period will be used instead. Default value is set to 10 min because Prometheus may take quite long to shutdown to checkpoint existing data. The grace period is the duration in seconds after the processes running in the pod are sent a termination signal and the time when the processes are forcibly halted with a kill signal. Set this value longer than the expected cleanup time for your process. | *uint64 | false |
| retention | Time duration Prometheus shall retain data for. Default is '24h' if retentionSize is not set, and must match the regular expression `[0-9]+(ms\|s\|m\|h\|d\|w\|y)` (milliseconds seconds minutes hours days weeks years). | string | false |
| retentionSize | Maximum amount of disk space used by blocks. | ByteSize | false |
| disableCompaction | Disable prometheus compaction. | bool | false |
Expand Down
15 changes: 15 additions & 0 deletions bundle.yaml
Expand Up @@ -17514,6 +17514,21 @@ spec:
use ''image'' instead. The image tag can be specified as part of
the image URL.'
type: string
terminationGracePeriodSeconds:
default: "600"
description: Optional duration in seconds the pod needs to terminate
gracefully. May be decreased in delete request. Value must be non-negative
integer. The value zero indicates stop immediately via the kill
signal (no opportunity to shut down). If this value is nil, the
default grace period will be used instead. Default value is set
to 10 min because Prometheus may take quite long to shutdown to
checkpoint existing data. The grace period is the duration in seconds
after the processes running in the pod are sent a termination signal
and the time when the processes are forcibly halted with a kill
signal. Set this value longer than the expected cleanup time for
your process.
format: int64
type: integer
thanos:
description: "Thanos configuration allows configuring various aspects
of a Prometheus server in a Thanos environment. \n This section
Expand Down
Expand Up @@ -6207,6 +6207,21 @@ spec:
use ''image'' instead. The image tag can be specified as part of
the image URL.'
type: string
terminationGracePeriodSeconds:
default: "600"
description: Optional duration in seconds the pod needs to terminate
gracefully. May be decreased in delete request. Value must be non-negative
integer. The value zero indicates stop immediately via the kill
signal (no opportunity to shut down). If this value is nil, the
default grace period will be used instead. Default value is set
to 10 min because Prometheus may take quite long to shutdown to
checkpoint existing data. The grace period is the duration in seconds
after the processes running in the pod are sent a termination signal
and the time when the processes are forcibly halted with a kill
signal. Set this value longer than the expected cleanup time for
your process.
format: int64
type: integer
thanos:
description: "Thanos configuration allows configuring various aspects
of a Prometheus server in a Thanos environment. \n This section
Expand Down
6 changes: 6 additions & 0 deletions jsonnet/prometheus-operator/prometheuses-crd.json
Expand Up @@ -5766,6 +5766,12 @@
"description": "Tag of Prometheus container image to be deployed. Defaults to the value of `version`. Version is ignored if Tag is set. Deprecated: use 'image' instead. The image tag can be specified as part of the image URL.",
"type": "string"
},
"terminationGracePeriodSeconds": {
"default": "600",
"description": "Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request. Value must be non-negative integer. The value zero indicates stop immediately via the kill signal (no opportunity to shut down). If this value is nil, the default grace period will be used instead. Default value is set to 10 min because Prometheus may take quite long to shutdown to checkpoint existing data. The grace period is the duration in seconds after the processes running in the pod are sent a termination signal and the time when the processes are forcibly halted with a kill signal. Set this value longer than the expected cleanup time for your process.",
"format": "int64",
"type": "integer"
},
"thanos": {
"description": "Thanos configuration allows configuring various aspects of a Prometheus server in a Thanos environment. \n This section is experimental, it may change significantly without deprecation notice in any release. \n This is experimental and may change significantly without backward compatibility in any release.",
"properties": {
Expand Down
11 changes: 11 additions & 0 deletions pkg/apis/monitoring/v1/types.go
Expand Up @@ -335,6 +335,17 @@ type CommonPrometheusFields struct {
// This is an alpha field and requires enabling StatefulSetMinReadySeconds feature gate.
// +optional
MinReadySeconds *uint32 `json:"minReadySeconds,omitempty"`
// Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request.
// Value must be non-negative integer. The value zero indicates stop immediately via
// the kill signal (no opportunity to shut down).
// If this value is nil, the default grace period will be used instead. Default value is set to
// 10 min because Prometheus may take quite long to shutdown to checkpoint existing data.
// The grace period is the duration in seconds after the processes running in the pod are sent
// a termination signal and the time when the processes are forcibly halted with a kill signal.
// Set this value longer than the expected cleanup time for your process.
// +optional
// +kubebuilder:default:="600"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since its int type we don't need quotes

// +kubebuilder:default:=600

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated

TerminationGracePeriodSeconds *uint64 `json:"terminationGracePeriodSeconds,omitempty"`
}

// Prometheus defines a Prometheus deployment.
Expand Down
12 changes: 7 additions & 5 deletions pkg/prometheus/statefulset.go
Expand Up @@ -325,10 +325,6 @@ func makeStatefulSetService(p *monitoringv1.Prometheus, config operator.Config)

func makeStatefulSetSpec(p monitoringv1.Prometheus, c *operator.Config, shard int32, ruleConfigMapNames []string,
tlsAssetSecrets []string, version semver.Version) (*appsv1.StatefulSetSpec, error) {
// Prometheus may take quite long to shut down to checkpoint existing data.
// Allow up to 10 minutes for clean termination.
terminationGracePeriod := int64(600)

prometheusImagePath, err := operator.BuildImagePath(
operator.StringPtrValOrDefault(p.Spec.Image, ""),
operator.StringValOrDefault(p.Spec.BaseImage, c.PrometheusDefaultBaseImage),
Expand Down Expand Up @@ -904,10 +900,16 @@ func makeStatefulSetSpec(p monitoringv1.Prometheus, c *operator.Config, shard in
}
}

var minReadySeconds int32
var (
minReadySeconds int32
terminationGracePeriod int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be uint too?

Copy link
Contributor Author

@yeya24 yeya24 Mar 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it is casted from uint64. The pod spec needs int64 not uint64.

)
if p.Spec.MinReadySeconds != nil {
minReadySeconds = int32(*p.Spec.MinReadySeconds)
}
if p.Spec.TerminationGracePeriodSeconds != nil {
terminationGracePeriod = int64(*p.Spec.TerminationGracePeriodSeconds)
}

operatorInitContainers = append(operatorInitContainers,
operator.CreateConfigReloader(
Expand Down