Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus报错:Out of order sample from remote write #12052

Closed
liufangpeng opened this issue Mar 3, 2023 · 8 comments
Closed

Prometheus报错:Out of order sample from remote write #12052

liufangpeng opened this issue Mar 3, 2023 · 8 comments

Comments

@liufangpeng
Copy link

What did you do?

  prometheus:
    image: prom/prometheus-test5:latest
    container_name: prometheus
    hostname: prometheus
    restart: always
    environment:
      TZ: Asia/Shanghai
    ports:
      - "9090:9090"
    networks:
      - nightingale
    command:
      - "--enable-feature=remote-write-receiver"
      - "--query.lookback-delta=2m"
      - "--storage.tsdb.wal-compression"
      - "--log.level=debug"

What did you expect to see?

What did you see instead? Under which circumstances?

刚刚部署好Prometheus并无错误日志输出,运行一段时间之后突然容器日志里大批量的会刷这些日志,并且Prometheus里查询不到数据无法正常使用

ts=2023-03-02T10:46:58.404Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="duplicate sample for timestamp" series="{__name__=\"disk_inodes_used\", device=\"JuiceFS:juice\", fstype=\"fuse.juicefs\", ident=\"10.79.10.12\", mode=\"rw\", path=\"/var/lib/kubelet/pods/e3409740-980e-4389-aea8-503c2b8c17db/volumes/kubernetes.io~csi/pvc-1bd206e9-65e0-4e50-853b-4ca4d6c0abcb/mount\", product=\"devops-tool\", vpc=\"public\"}" timestamp=1677754012257
ts=2023-03-02T12:00:50.558Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="duplicate sample for timestamp" series="{__name__=\"disk_inodes_used\", device=\"JuiceFS:juice\", fstype=\"fuse.juicefs\", ident=\"10.79.10.12\", mode=\"rw\", path=\"/var/lib/kubelet/pods/9875042f-5b64-4604-bfbf-19a384ce7611/volumes/kubernetes.io~csi/pvc-100430c4-605b-4e45-b72e-f332919f81ff/mount\", product=\"devops-tool\", vpc=\"public\"}" timestamp=1677758443489
ts=2023-03-02T12:23:37.274Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="duplicate sample for timestamp" series="{__name__=\"disk_used\", device=\"JuiceFS:juice\", fstype=\"fuse.juicefs\", ident=\"10.79.10.12\", mode=\"rw\", path=\"/var/lib/kubelet/pods/9875042f-5b64-4604-bfbf-19a384ce7611/volumes/kubernetes.io~csi/pvc-100430c4-605b-4e45-b72e-f332919f81ff/mount\", product=\"devops-tool\", vpc=\"public\"}" timestamp=1677759810439
ts=2023-03-02T12:35:15.388Z caller=compact.go:519 level=info component=tsdb msg="write block" mint=1677749748962 maxt=1677751200000 ulid=01GTH536JJ37JVREHMQN5NK4J1 duration=13.034656377s
ts=2023-03-02T12:35:16.225Z caller=head.go:1219 level=info component=tsdb msg="Head GC completed" caller=truncateMemory duration=832.192859ms
ts=2023-03-02T12:35:16.225Z caller=checkpoint.go:100 level=info component=tsdb msg="Creating checkpoint" from_segment=0 to_segment=18 mint=1677751200000
ts=2023-03-02T12:35:47.832Z caller=head.go:1191 level=info component=tsdb msg="WAL checkpoint complete" first=0 last=18 duration=31.606785111s
ts=2023-03-02T12:59:36.405Z caller=compact.go:519 level=info component=tsdb msg="write block" mint=1677751200000 maxt=1677758400000 ulid=01GTH6FDZDDTERS40WB1J8ABBE duration=24.679446612s
ts=2023-03-02T12:59:37.031Z caller=head_read.go:131 level=debug component=tsdb msg="Looked up series not found"
ts=2023-03-02T12:59:37.031Z caller=head_read.go:131 level=debug component=tsdb msg="Looked up series not found"
ts=2023-03-02T12:59:37.031Z caller=head_read.go:131 level=debug component=tsdb msg="Looked up series not found"
ts=2023-03-02T12:59:37.031Z caller=head_read.go:131 level=debug component=tsdb msg="Looked up series not found"

System information

Linux 3.10.0-1160.15.2.el7.x86_64 x86_64

Prometheus version

prometheus, version 2.42.0 (branch: HEAD, revision: 225c61122d88b01d1f0eaaee0e05b6f3e0567ac0)
  build user:       root@c67d48967507
  build date:       20230201-07:53:32
  go version:       go1.19.5
  platform:         linux/amd64

Prometheus configuration file

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'n9e'
    file_sd_configs:
    - files:
      - targets.json

  - job_name: 'idm-mq'

Alertmanager version

Alertmanager configuration file

Logs

@bboreham
Copy link
Member

bboreham commented Mar 3, 2023

Thank you for your report. I don't speak Chinese, but I asked Google Translate to convert "刚刚部署好Prometheus并无错误日志输出,运行一段时间之后突然容器日志里大批量的会刷这些日志,并且Prometheus里查询不到数据无法正常使用".

Prometheus has just been deployed and there is no error log output. After running for a period of time, suddenly a large number of these logs will be flushed in the container log, and the data cannot be queried in Prometheus and cannot be used normally.

ts=2023-03-02T10:46:58.404Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="duplicate sample for timestamp" series="{name="disk_inodes_used", device="JuiceFS:juice", fstype="fuse.juicefs", ident="10.79.10.12", mode="rw", path="/var/lib/kubelet/pods/e3409740-980e-4389-aea8-503c2b8c17db/volumes/kubernetes.io~csi/pvc-1bd206e9-65e0-4e50-853b-4ca4d6c0abcb/mount", product="devops-tool", vpc="public"}" timestamp=1677754012257

That seems to be a problem with something sending bad data to your Prometheus.

ts=2023-03-02T12:00:50.558Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="duplicate sample for timestamp" series="{name="disk_inodes_used", device="JuiceFS:juice", fstype="fuse.juicefs", ident="10.79.10.12", mode="rw", path="/var/lib/kubelet/pods/9875042f-5b64-4604-bfbf-19a384ce7611/volumes/kubernetes.io~csi/pvc-100430c4-605b-4e45-b72e-f332919f81ff/mount", product="devops-tool", vpc="public"}" timestamp=1677758443489

There is over an hour gap between these two log lines. Maybe the translation is bad - can you put more detail on when you got a large number of logs?

the data cannot be queried in Prometheus and cannot be used normally.

Do you get an error message? If so, please quote it exactly. If not, please explain what you do see.

@dhirmansyah
Copy link

dhirmansyah commented Mar 4, 2023

Hi All,

I had same issue as describe above. after I checked log error and found logs as following information :

Log prometheus

  1. when prometheus send to remote write it's get error 400 with bad request
Mar  4 20:19:58 cit01prom01 prometheus[45022]: ts=2023-03-04T13:19:58.522Z caller=dedupe.go:112 component=remote level=error remote_name=df99af url=https://datasource.xxx-it.co.id/api/v1/write msg="non-recoverable error" count=1500 exemplarCount=0 err="server returned HTTP status 400 Bad Request: out of order sample"
Mar  4 20:50:08 cit01prom01 prometheus[45022]: ts=2023-03-04T13:50:08.585Z caller=dedupe.go:112 component=remote level=error remote_name=df99af url=https://datasource.xxx-it.co.id/api/v1/write msg="non-recoverable error" count=1500 exemplarCount=0 err="server returned HTTP status 400 Bad Request: out of bounds"
Mar  4 20:50:08 cit01prom01 prometheus[45022]: ts=2023-03-04T13:50:08.623Z caller=dedupe.go:112 component=remote level=error remote_name=df99af url=https://datasource.xxx-it.co.id/api/v1/write msg="non-recoverable error" count=1500 exemplarCount=0 err="server returned HTTP status 400 Bad Request: out of order sample"
  1. when I try to query metric prometheus_tsdb_out_of_order_samples_total and get value as following :
prometheus_tsdb_out_of_order_samples_total{env="dev", instance="localhost:9090", job="prometheus", site="jkt-citlab"} | 1384
-- | --
  1. After I try to restart prometheus server the error was gone

Log in prometheus remote writer

Mar  4 14:20:20 sgcitlab01prom02 prometheus[7394]: ts=2023-03-04T14:20:20.007Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"node_systemd_timer_last_trigger_seconds\", custid=\"xxxx-infra\", env=\"dev\", hostname=\"cit01owncld01\", instance=\"192.168.1.25:9100\", job=\"node\", name=\"fwupd-refresh.timer\", site=\"jkt-citlab\"}" timestamp=1677935995518
Mar  4 14:20:20 sgcitlab01prom02 prometheus[7394]: ts=2023-03-04T14:20:20.019Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"node_network_carrier_up_changes_total\", custid=\"mpm\", device=\"vethf8bc3f5\", env=\"dev\", hostname=\"cit01devmpm01\", instance=\"192.168.81.8:9100\", job=\"node\", site=\"jkt-citlab\"}" timestamp=1677935889392
Mar  4 14:20:20 sgcitlab01prom02 prometheus[7394]: ts=2023-03-04T14:20:20.019Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"process_max_fds\", custid=\"xxxx-infra\", env=\"dev\", hostname=\"cit01rprx01\", instance=\"192.168.76.17:9100\", job=\"node\", site=\"jkt-citlab\"}" timestamp=1677935892889
Mar  4 14:20:20 sgcitlab01prom02 prometheus[7394]: ts=2023-03-04T14:20:20.020Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"container_tasks_state\", custid=\"mpm\", env=\"dev\", hostname=\"cit01devmpm01\", id=\"/system.slice/system-getty.slice\", instance=\"192.168.81.8:8080\", job=\"docker_container\", site=\"jkt-citlab\", state=\"iowaiting\"}" timestamp=1677935895991
Mar  4 14:20:20 sgcitlab01prom02 prometheus[7394]: ts=2023-03-04T14:20:20.028Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"node_systemd_unit_state\", custid=\"xxxx-infra\", env=\"dev\", hostname=\"cit01post01\", instance=\"192.168.76.9:9100\", job=\"node\", name=\"multipathd.socket\", site=\"jkt-citlab\", state=\"activating\"}" timestamp=1677935991167
Mar  4 14:20:20 sgcitlab01prom02 prometheus[7394]: ts=2023-03-04T14:20:20.032Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"node_systemd_unit_state\", custid=\"xxxx-infra\", env=\"dev\", hostname=\"cit01teampass01\", instance=\"192.168.1.92:9100\", job=\"node\", name=\"rsyslog.service\", site=\"jkt-citlab\", state=\"deactivating\", type=\"notify\"}" timestamp=1677935892929
Mar  4 14:20:20 sgcitlab01prom02 prometheus[7394]: ts=2023-03-04T14:20:20.040Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"node_network_transmit_drop_total\", custid=\"mpm\", device=\"veth26b3b56\", env=\"dev\", hostname=\"cit01devmpm01\", instance=\"192.168.81.8:9100\", job=\"node\", site=\"jkt-citlab\"}" timestamp=1677935984392
Mar  4 14:20:20 sgcitlab01prom02 prometheus[7394]: ts=2023-03-04T14:20:20.044Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"node_disk_write_time_seconds_total\", custid=\"mpm\", device=\"sr0\", env=\"dev\", hostname=\"cit01prdmpm01\", instance=\"192.168.81.19:9100\", job=\"node\", site=\"jkt-citlab\"}" timestamp=1677935894166

I just read this information and I have check there is no duplicate targets, I have separate tagging for every prometheus.

and just want to make sure is it error still related to this history ?

thanks in advance

@roidelapluie
Copy link
Member

These errors are mostly caused by timestamp inconsistencies in the data or issues with the data source itself.

It is important to note that time synchronization is critical. I would suggest making sure that the time of your servers are synchronized correctly.

Additionally, make sure that you have set all external labels to be unique and consistent. If Prometheus is the sender, then ensure that all external labels used are unique across all instances.

If the sender is not Prometheus, then check the sender's configuration and ensure that it is sending correctly formatted data with timestamps in order.

@liufangpeng
Copy link
Author

liufangpeng commented Mar 6, 2023

感谢您的报告。我不会说中文,但我要求Google翻译转换为“ Prometheus并并无错误输出日志日志,运行运行时间时间之后突然容器里里大批量刷刷使用”。

Prometheus刚刚部署,没有错误日志输出。运行一段时间后,容器日志中突然会大量flush这些日志,无法在Prometheus中查询到数据,无法正常使用。

ts=2023-03-02T10:46:58.404Z caller=write_handler.go:109 level=error component=web msg="远程写入的乱序样本" err="时间戳的重复样本" series="{ name = "disk_inodes_used", device="JuiceFS:juice", fstype="fuse.juicefs", ident="10.79.10.12", mode="rw", path="/var/lib/kubelet/pods/e3409740-980e- 4389-aea8-503c2b8c17db/volumes/kubernetes.io~csi/pvc-1bd206e9-65e0-4e50-853b-4ca4d6c0abcb/mount", product="devops-tool", vpc="public"}" timestamp=1677754012257

这似乎是向您的普罗米修斯发送错误数据的问题。

ts=2023-03-02T12:00:50.558Z caller=write_handler.go:109 level=error component=web msg="远程写入的乱序样本" err="时间戳的重复样本" series="{ name = "disk_inodes_used", device="JuiceFS:juice", fstype="fuse.juicefs", ident="10.79.10.12", mode="rw", path="/var/lib/kubelet/pods/9875042f-5b64- 4604-bfbf-19a384ce7611/volumes/kubernetes.io~csi/pvc-100430c4-605b-4e45-b72e-f332919f81ff/mount", product="devops-tool", vpc="public"}" timestamp=1677758443489

这两个日志行之间有一个多小时的间隔。也许翻译不好——当你得到大量日志时,你能提供更多细节吗?

无法在Prometheus中查询到数据,无法正常使用。

你收到错误信息了吗?如果是这样,请准确引用。如果不是,请解释您所看到的。

这个现象在清理过数据之后重启会恢复,但是运行一段时间就又会进行大量输出,并且导致Prometheus内查询不到监控项无法正常使用。
ts=2023-03-06T00:49:04.364Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"processes_paging\", busigroup=\"Ubuntu桌面云\", ident=\"Ubuntu桌面云-YF-U1804-0014-10.79.165.14\"}" timestamp=1678063742904 ts=2023-03-06T00:49:04.415Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"net_drop_in\", busigroup=\"Ubuntu桌面云\", ident=\"Ubuntu桌面云-T4-018-10.79.165.16\", interface=\"ens192\"}" timestamp=1678063714190 ts=2023-03-06T00:49:04.513Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"rest_client_request_duration_seconds_sum\", busigroup=\"devops\", env=\"office-prod\", ident=\"kubesphere-ofc-worker-4\", instance=\"127.0.0.1:10250\", job=\"categraf\", product=\"kubesphere\", url=\"https://127.0.0.1:8443/apis/authentication.k8s.io/v1/tokenreviews\", verb=\"POST\"}" timestamp=1678063737479 ts=2023-03-06T00:49:04.608Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"docker_container_cpu_throttling_throttled_time\", busigroup=\"devops\", container_id=\"f00910ad63888592df7d541ace198c054b1510447d58282ee05b7149b24ced04\", container_image=\"k8s.gcr.io/pause\", container_name=\"k8s_POD_cloud-iptables-manager-t7k84_kubeedge_153e8657-0710-4f49-81f8-af9175d6a3ea_0\", cpu=\"cpu-total\", env=\"rd-test\", ident=\"kubesphere-rdt-worker-5\", k8s_app=\"iptables-manager\", kubeedge=\"iptables-manager\", pod_template_generation=\"2\"}" timestamp=1678063742410 ts=2023-03-06T00:49:04.700Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"kubelet_pod_worker_start_duration_seconds_bucket\", busigroup=\"devops\", env=\"rd-prod\", ident=\"kubesphere-rd-worker-1\", instance=\"127.0.0.1:10250\", job=\"categraf\", le=\"+Inf\"}" timestamp=1678063742729 ts=2023-03-06T00:49:04.701Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"kubernetes_pod_network_tx_errors\", app=\"imagecheckv7\", busigroup=\"devops\", env=\"rd-test\", ident=\"kubesphere-rdt-worker-2\", namespace=\"haohan-dev\", node_name=\"kubesphere-rdt-worker-2\", pod_name=\"imagecheckv7-v1-774fd4d49b-gfq75\", pod_template_hash=\"774fd4d49b\", version=\"v1\"}" timestamp=1678063740831 ts=2023-03-06T00:49:04.790Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"container_fs_writes_total\", busigroup=\"devops\", container=\"zookeeper\", device=\"/dev/sdb1\", env=\"rd-test\", id=\"/kubepods/burstable/pode64b7c8c-3033-4253-8dd3-e668a3f00a6b/651ef49866487618ac8efdb678de3fca6d40ca97d717998b8e5ff64c418466a9\", ident=\"kubesphere-rdt-worker-1\", image=\"sha256:389dc01c4deb8ee3d879407bb1ccada0b4e8bafca855b22a6fced1f4f1c01a62\", instance=\"127.0.0.1:10250\", job=\"categraf\", name=\"k8s_zookeeper_zookeeper-cluster-1_zk-test_e64b7c8c-3033-4253-8dd3-e668a3f00a6b_0\", namespace=\"zk-test\", pod=\"zookeeper-cluster-1\"}" timestamp=1678063740388 ts=2023-03-06T00:49:04.937Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"container_network_transmit_bytes_total\", busigroup=\"devops\", container=\"POD\", env=\"rd-test\", id=\"/kubepods/burstable/poda75f7e3e-d42f-45a7-8bc8-1dccecc19ebb/2bb241c5b4101fedcb0abaa643f6cbea9d2d6c7b7ff6feba667d7fe95b4d695b\", ident=\"kubesphere-rdt-master-3\", image=\"k8s.gcr.io/pause:3.2\", instance=\"127.0.0.1:10250\", interface=\"cali7a824a8713f\", job=\"categraf\", name=\"k8s_POD_node-local-dns-xbtmr_kube-system_a75f7e3e-d42f-45a7-8bc8-1dccecc19ebb_2\", namespace=\"kube-system\", pod=\"node-local-dns-xbtmr\"}" timestamp=1678063741885 ts=2023-03-06T00:49:04.952Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"docker_container_mem_usage\", busigroup=\"devops\", container_id=\"2c4f8e6af5e12b72b23b07e3945f4f30909b2c05684aa1478510f5c1951d1809\", container_image=\"sha256\", container_name=\"k8s_categraf_nightingale-categraf-rxw94_devops_92ac93e8-bb2d-400c-8741-05faeaea5671_0\", env=\"rd-test\", ident=\"kubesphere-rdt-master-3\"}" timestamp=1678063739157 ts=2023-03-06T00:49:04.967Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"docker_container_mem_usage_percent\", busigroup=\"devops\", container_id=\"756fa5da1984b40d9d3aeb794b0e5b7d66fb27736fe915641d1fed3d1d7e3f1b\", container_image=\"sha256\", container_name=\"k8s_jfs-mount_juicefs-kubesphere-rdt-worker-1-pvc-20150c21-696b-4c18-a4d3-e075446ba49b-cyspxx_devops_4c31aac2-1a17-47ff-b35d-6dd764038f76_0\", env=\"rd-test\", ident=\"kubesphere-rdt-worker-1\"}" timestamp=1678063741204 ts=2023-03-06T00:49:04.971Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"docker_container_mem_max_usage\", busigroup=\"devops\", container_id=\"04916c48599f54e5f8a65ed9622d45f934a64c407521bc6bc1ab9ee81b5988e8\", container_image=\"sha256\", container_name=\"k8s_redis_redis-standalone-replicas-0_middleware-test_7a2deb23-0332-4ee6-adc0-8e28d9207ae9_3\", env=\"rd-test\", ident=\"kubesphere-rdt-worker-2\"}" timestamp=1678063742529 ts=2023-03-06T00:49:05.008Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"sockstat_used\", ident=\"监控节点-ecs-prometheus-10.79.15.146\"}" timestamp=1678063727125 ts=2023-03-06T00:49:05.008Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"docker_container_cpu_usage_system\", busigroup=\"devops\", container_id=\"d7c2e02e70211d671b5e50850a12bc1e47d188fb03eca349688224ba5bf0de38\", container_image=\"sha256\", container_name=\"k8s_postgresql_postgresql-0_middleware-prod_2bbf9a05-d707-4da2-b42a-13e660f38896_1\", cpu=\"cpu-total\", env=\"rd-test\", ident=\"kubesphere-rdt-worker-2\"}" timestamp=1678063742529 ts=2023-03-06T00:49:05.163Z caller=write_handler.go:109 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"disk_total\", busigroup=\"devops\", device=\"vda1\", env=\"rd-prod\", fstype=\"ext4\", ident=\"kubesphere-rd-worker-2\", mode=\"rw\", path=\"/\"}" timestamp=1678063739739

@liufangpeng
Copy link
Author

顺便问一下开启out_of_order_time_window能解决这个问题吗,这个应该在哪个文件进行开启Prometheus.yaml吗

@roidelapluie
Copy link
Member

roidelapluie commented Mar 6, 2023

Enabling out_of_order_time_window may provide some relief but it's not certain that it will resolve the problem completely. It would be better to try and identify the root cause of the issue instead of relying on this.

@liufangpeng
Copy link
Author

liufangpeng commented Mar 6, 2023

启用out_of_order_time_window可能会缓解一些问题,但不确定是否会完全解决问题。最好尝试找出问题的根本原因,而不是依赖于此。

因为目前情况看收集的数据时间戳会有乱序的情况,没找到具体原因,启动out_of_order_time_window应该在哪个配置文件中配置。

@roidelapluie
Copy link
Member

To enable the out_of_order_time_window, you will have to add it under the storage > tsdb configuration section in your Prometheus config file.

storage:
  tsdb:
    out_of_order_time_window: 2m 

@prometheus prometheus locked and limited conversation to collaborators Mar 6, 2023
@roidelapluie roidelapluie converted this issue into discussion #12064 Mar 6, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants