Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus does not recognize HELP and TYPE for OpenMetrics counters #13944

Open
link2xt opened this issue Apr 17, 2024 · 8 comments
Open

Prometheus does not recognize HELP and TYPE for OpenMetrics counters #13944

link2xt opened this issue Apr 17, 2024 · 8 comments

Comments

@link2xt
Copy link

link2xt commented Apr 17, 2024

What did you do?

I pointed Prometheus to OpenMetrics endpoint which outputs data that looks like this:

$ curl http://192.168.10.2:9001/metrics -v
*   Trying 192.168.10.2:9001...
* Connected to 192.168.10.2 (192.168.10.2) port 9001 (#0)
> GET /metrics HTTP/1.1
> Host: 192.168.10.2:9001
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-length: 561
< content-type: application/openmetrics-text;charset=utf-8;version=1.0.0
< date: Wed, 17 Apr 2024 10:56:28 GMT
<
# HELP direct_notifications Number of direct notifications.
# TYPE direct_notifications counter
direct_notifications_total 1
# HELP heartbeat_notifications Number of heartbeat notifications.
# TYPE heartbeat_notifications counter
heartbeat_notifications_total 123
# HELP heartbeat_registrations Number of heartbeat registrations.
# TYPE heartbeat_registrations counter
heartbeat_registrations_total 456
# HELP heartbeat_token_count Number of tokens registered for heartbeat notifications.
# TYPE heartbeat_token_count gauge
heartbeat_token_count 789
# EOF
* Connection #0 to host 192.168.10.2 left intact

Endpoint is provided by https://github.com/deltachat/notifiers which uses https://github.com/prometheus/client_rust

What did you expect to see?

I expected that Prometheus would display "counter" hint and help line "Number of direct notifications." However, Prometheus ignores HELP and TYPE for direct_notifications.

Apparently it expected that response would contain:

# HELP direct_notifications_total Number of direct notifications.
# TYPE direct_notifications_total counter
direct_notifications_total 1

but this is not correct according to OpenMetrics specification. This is the difference documented in https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/

What did you see instead? Under which circumstances?

I expected that OpenMetrics parser would be used according to the content-type and recognize HELP and TYPE for direct_notifications_total counter.

System information

Linux 5.15.0-102-generic x86_64

Prometheus version

prometheus, version 2.51.2 (branch: HEAD, revision: b4c0ab52c3e9b940ab803581ddae9b3d9a452337)
  build user:       root@b63f02a423d9
  build date:       20240410-14:05:54
  go version:       go1.22.2
  platform:         linux/amd64
  tags:             netgo,builtinassets,stringlabels

Prometheus configuration file

scrape_configs:
...
  - job_name: "notifications"
    scrape_interval: 30s
    static_configs:
      - targets: ['192.168.10.2:9001']
@link2xt
Copy link
Author

link2xt commented Apr 17, 2024

Related, another OpenMetrics counter incompatibility: #6541

@beorn7
Copy link
Member

beorn7 commented Apr 19, 2024

Could you verify what protocol is actually used in the Prometheus scrape? (I believe this particular aspect of OpenMetrics should be implemented correctly in Prometheus, and my suspicion here is that Prometheus, for some reason, thinks it is scraping the classic text format, or it might even negotiate protobuf, maybe because native histograms are enabled…)

@link2xt
Copy link
Author

link2xt commented Apr 20, 2024

How can I verify which protocol is used? It is not printed in debug logs.
Note that server only outputs content-type: application/openmetrics-text;charset=utf-8;version=1.0.0, this is the only format supported:
https://github.com/deltachat/notifiers/blob/2bcedc685df371777779198e390eabb52ad89f12/src/metrics.rs#L84-L92

I changed scrape config like this and ran systemctl prometheus reload and checked in the log that config got reloaded successfully, but it did not fix the problem:

  - job_name: "notifications"
    scrape_interval: 30s
    scrape_protocols: ["OpenMetricsText1.0.0"]
    static_configs:
      - targets: ['192.168.10.2:9001']

@beorn7
Copy link
Member

beorn7 commented May 2, 2024

How can I verify which protocol is used?

For ground truth, I would use a packet sniffer like tcpdump or Wireshark.

@beorn7
Copy link
Member

beorn7 commented May 2, 2024

I have reproduced the issue now.

tl;dr: @link2xt is right. The UI isn't properly creating a type hint for counters ingested via OpenMetrics.

This is a UI issue. The UI is querying the /api/v1/metadata endpoint, and for counters that are ingested via OpenMetrics, it returns something like this for a counter that creates a metric request_processing_seconds within Prometheus:

    "request_processing_seconds": [
      {
        "type": "counter",
        "unit": "",
        "help": "Time spent processing request"
      }

The UI in this case is only looking for an entry request_processing_seconds_total, which doesn't exist in the metadata response.

First question to OM experts @RichiH and @SuperQ : Is the metadata endpoint behaving as expected here?

And in case it is, this would be an issue to be fixed by our frontend experts (@Nexucis and @juliusv, maybe).

@SuperQ
Copy link
Member

SuperQ commented May 2, 2024

I suppose we should make the metadata endpoint match the expected Prometheus name when exposing OM metrics output for counters.

@beorn7
Copy link
Member

beorn7 commented May 2, 2024

Some musings about the question if the metadata API is behaving correctly: I think it's really problematic that the exposition format now determines how a counter is represented in the metadata API response. I get the rationale behind OM's decision to change the "true name" of a counter metric foo_total to just foo. This is following in the footsteps of a summary called just foo even if it also contains metrics called foo_sum and foo_count. And a (classic) histogram called just foo that implies metrics foo_bucket, foo_sum, and foo_count, and not even a metric called just foo.

However, with native histograms as the histogram of choice, the classic histograms might just become a thing of the past. Even summaries, if we still want to support them, could eventually be represented by "native summaries" (no concrete plans, just a reasonable speculation). Then there would be no footsteps to follow anymore and we could "go back" to calling a counter foo_total just foo_total everywhere. This would also remove the still unresolved collision of go_memstats_alloc_bytes and go_memstats_alloc_bytes_total, see prometheus/client_golang#829 .

@beorn7
Copy link
Member

beorn7 commented May 2, 2024

I suppose we should make the metadata endpoint match the expected Prometheus name when exposing OM metrics output for counters.

So you would say we should change the example output above to say request_processing_seconds_total instead of request_processing_seconds?

This would work if we do it at the level of ingesting metadata. We cannot really do it after the fact in the metadata API itself because, at that point, we don't know anymore which format was used to ingest the metric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants