You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using Prometheus v2.47.0 in a production environment, and samples are sending from the Prometheus Agent to the Prometheus Server via remote write. At first, everything was normal, but one day both the Prometheus Agent and the Prometheus Server started logging errors simultaneously. Subsequently, remote write encountered exceptions, and samples could no longer be sent from the Prometheus Agent to the Prometheus Server. The Prometheus Server was unable to retrieve any data. The related logs are as follows.
What did you expect to see?
No error or warn logs are present, and the Prometheus remote write is working properly.
What did you see instead? Under which circumstances?
logs of the Prometheus Server
ts=2024-04-19T19:00:23.278Z caller=head.go:1298 level=info component=tsdb msg="Head GC completed" caller=truncateOOO duration=177.89714ms
ts=2024-04-19T19:00:23.291Z caller=compact.go:708 level=info component=tsdb msg="Found overlapping blocks during compaction" ulid=01HVVVPBKJ6ZPPFT1ZAKNJQ5D0
ts=2024-04-19T19:00:34.900Z caller=compact.go:464 level=info component=tsdb msg="compact blocks" count=2 mint=1712858400000 maxt=1713052800000 ulid=01HVVVPBKJ6ZPPFT1ZAKNJQ5D0 sources="[01HVVMTJJF4M4N0AD5DB4GWJHK 01HVVVP46HXTE8Y0V059C1PVW1]" duration=11.618858661s
ts=2024-04-19T19:00:34.993Z caller=db.go:1463 level=warn component=tsdb msg="Overlapping blocks found during reloadBlocks" detail="[mint: 1713376800000, maxt: 1713384000000, range: 2h0m0s, blocks: 2]: <ulid: 01HVVMTWRFJ98FFAR80Q1V16T7, mint: 1713247200000, maxt: 1713441600000, range: 54h0m0s>, <ulid: 01HVVVPAX03W5C02BERVRNPVYM, mint: 1713376800000, maxt: 1713384000000, range: 2h0m0s>\n[mint: 1713384000000, maxt: 1713391200000, range: 2h0m0s, blocks: 2]: <ulid: 01HVVMTWRFJ98FFAR80Q1V16T7, mint: 1713247200000, maxt: 1713441600000, range: 54h0m0s>, <ulid: 01HVVVPB1KFK9BPRXH4YTSSS13, mint: 1713384000000, maxt: 1713391200000, range: 2h0m0s>\n[mint: 1713391200000, maxt: 1713398400000, range: 2h0m0s, blocks: 2]: <ulid: 01HVVMTWRFJ98FFAR80Q1V16T7, mint: 1713247200000, maxt: 1713441600000, range: 54h0m0s>, <ulid: 01HVVVPB7M5R29B6QPS0Q6H4PH, mint: 1713391200000, maxt: 1713398400000, range: 2h0m0s>"
ts=2024-04-19T19:00:35.159Z caller=db.go:1619 level=info component=tsdb msg="Deleting obsolete block" block=01HVVMTJJF4M4N0AD5DB4GWJHK
ts=2024-04-19T19:00:35.162Z caller=db.go:1619 level=info component=tsdb msg="Deleting obsolete block" block=01HVVVP46HXTE8Y0V059C1PVW1
ts=2024-04-19T19:00:35.179Z caller=compact.go:708 level=info component=tsdb msg="Found overlapping blocks during compaction" ulid=01HVVVPQ6WR9TWGNWHKADKC9CD
ts=2024-04-19T19:00:59.310Z caller=compact.go:464 level=info component=tsdb msg="compact blocks" count=4 mint=1713247200000 maxt=1713441600000 ulid=01HVVVPQ6WR9TWGNWHKADKC9CD sources="[01HVVMTWRFJ98FFAR80Q1V16T7 01HVVVPAX03W5C02BERVRNPVYM 01HVVVPB1KFK9BPRXH4YTSSS13 01HVVVPB7M5R29B6QPS0Q6H4PH]" duration=24.146765124s
ts=2024-04-19T19:00:59.335Z caller=db.go:1619 level=info component=tsdb msg="Deleting obsolete block" block=01HVVVPAX03W5C02BERVRNPVYM
ts=2024-04-19T19:00:59.552Z caller=db.go:1619 level=info component=tsdb msg="Deleting obsolete block" block=01HVVMTWRFJ98FFAR80Q1V16T7
ts=2024-04-19T19:00:59.555Z caller=db.go:1619 level=info component=tsdb msg="Deleting obsolete block" block=01HVVVPB1KFK9BPRXH4YTSSS13
ts=2024-04-19T19:00:59.557Z caller=db.go:1619 level=info component=tsdb msg="Deleting obsolete block" block=01HVVVPB7M5R29B6QPS0Q6H4PH
ts=2024-04-19T20:33:26.485Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-19T20:33:26.539Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-19T20:33:26.626Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-19T20:33:26.775Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-19T20:33:27.042Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-19T20:33:27.552Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-19T20:33:28.546Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-19T20:33:30.490Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-19T20:33:34.358Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-19T20:33:39.393Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
......
ts=2024-04-23T07:30:00.472Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-23T07:30:05.544Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-23T07:30:10.706Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-23T07:30:15.741Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-23T07:30:20.798Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-23T07:30:25.829Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-23T07:30:30.895Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
ts=2024-04-23T07:30:36.030Z caller=write_handler.go:76 level=error component=web msg="Error appending remote write" err="too old sample"
**logs of the Prometheus Agent**
ts=2024-04-19T10:37:15.381Z caller=db.go:621 level=info msg="series GC completed" duration=57.876035ms
ts=2024-04-19T12:37:15.440Z caller=db.go:621 level=info msg="series GC completed" duration=58.146495ms
ts=2024-04-19T12:37:15.440Z caller=checkpoint.go:100 level=info msg="Creating checkpoint" from_segment=108 to_segment=109 mint=1713529927000
ts=2024-04-19T12:37:18.083Z caller=db.go:691 level=info msg="WAL checkpoint complete" first=108 last=109 duration=2.701587989s
ts=2024-04-19T14:37:18.174Z caller=db.go:621 level=info msg="series GC completed" duration=87.806249ms
ts=2024-04-19T16:37:18.276Z caller=db.go:621 level=info msg="series GC completed" duration=99.576596ms
ts=2024-04-19T16:37:18.277Z caller=checkpoint.go:100 level=info msg="Creating checkpoint" from_segment=110 to_segment=111 mint=1713544323000
ts=2024-04-19T16:37:21.258Z caller=db.go:691 level=info msg="WAL checkpoint complete" first=110 last=111 duration=3.081670852s
ts=2024-04-19T18:37:21.330Z caller=db.go:621 level=info msg="series GC completed" duration=70.392311ms
ts=2024-04-19T20:33:26.517Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: too old sample"
ts=2024-04-19T20:34:29.714Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: too old sample"
ts=2024-04-19T20:35:30.113Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: too old sample"
ts=2024-04-19T20:36:30.478Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: too old sample"
ts=2024-04-19T20:37:21.427Z caller=db.go:621 level=info msg="series GC completed" duration=94.488464ms
ts=2024-04-19T20:37:21.428Z caller=checkpoint.go:100 level=info msg="Creating checkpoint" from_segment=112 to_segment=113 mint=1713558496000
ts=2024-04-19T20:37:24.536Z caller=db.go:691 level=info msg="WAL checkpoint complete" first=112 last=113 duration=3.203556221s
ts=2024-04-19T20:37:31.335Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: too old sample"
ts=2024-04-19T20:37:55.483Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1713558806 minSendTimestamp=1713559055
ts=2024-04-19T20:38:05.482Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1713558806 minSendTimestamp=1713559065
ts=2024-04-19T20:38:15.483Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1713558806 minSendTimestamp=1713559075
ts=2024-04-19T20:38:25.484Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1713558806 minSendTimestamp=1713559085
ts=2024-04-19T20:38:31.694Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: too old sample"
ts=2024-04-19T20:38:35.483Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1713558806 minSendTimestamp=1713559095
ts=2024-04-19T20:38:45.483Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1713558806 minSendTimestamp=1713559105
ts=2024-04-19T20:38:55.483Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1713558806 minSendTimestamp=1713559115
......
ts=2024-04-23T07:23:47.309Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: too old sample"
ts=2024-04-23T07:24:47.769Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: too old sample"
ts=2024-04-23T07:25:48.331Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: too old sample"
ts=2024-04-23T07:26:48.895Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: too old sample"
ts=2024-04-23T07:27:49.299Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: too old sample"
ts=2024-04-23T07:28:49.735Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: too old sample"
ts=2024-04-23T07:29:50.300Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: too old sample"
ts=2024-04-23T07:30:51.184Z caller=dedupe.go:112 component=remote level=warn remote_name=prometheus-k8s-0 url=https://prometheus-k8s-0.ccos-monitoring:9091/api/v1/write msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: too old sample"
System information
No response
Prometheus version
prometheus, version 2.47.0 (branch: HEAD, revision: efa34a5840661c29c2e362efa76bc3a70dccb335)
build user: root@4f2c12e526ab
build date: 20231002-15:09:56
go version: go1.20.8
platform: linux/amd64
tags: netgo,builtinassets,stringlabels
What did you do?
I am using Prometheus v2.47.0 in a production environment, and samples are sending from the Prometheus Agent to the Prometheus Server via remote write. At first, everything was normal, but one day both the Prometheus Agent and the Prometheus Server started logging errors simultaneously. Subsequently, remote write encountered exceptions, and samples could no longer be sent from the Prometheus Agent to the Prometheus Server. The Prometheus Server was unable to retrieve any data. The related logs are as follows.
What did you expect to see?
No error or warn logs are present, and the Prometheus remote write is working properly.
What did you see instead? Under which circumstances?
logs of the Prometheus Server
System information
No response
Prometheus version
Prometheus configuration file
Alertmanager version
No response
Alertmanager configuration file
No response
Logs
No response
The text was updated successfully, but these errors were encountered: