Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exemplars support #11

Open
clux opened this issue Apr 4, 2021 · 4 comments · May be fixed by #12
Open

exemplars support #11

clux opened this issue Apr 4, 2021 · 4 comments · May be fixed by #12

Comments

@clux
Copy link
Member

clux commented Apr 4, 2021

We have Loki -> Tempo support so people can discover bad reconcile traces from a controller's logs. However, it would be much easier to do this based on exemplars in the tail end of its new histogram.

This currently isn't working. Here's a WIP issue.

I have a hacky implementation of exmplars in tikv/rust-prometheus#395.
With the use in master, it outputs:

# HELP foo_controller_handled_events handled events
# TYPE foo_controller_handled_events counter
foo_controller_handled_events 3
# HELP foo_controller_reconcile_duration_seconds The duration of reconcile to complete in seconds
# TYPE foo_controller_reconcile_duration_seconds histogram
foo_controller_reconcile_duration_seconds_bucket{le="0.01"} 0
foo_controller_reconcile_duration_seconds_bucket{le="0.1"} 0
foo_controller_reconcile_duration_seconds_bucket{le="0.25"} 0
foo_controller_reconcile_duration_seconds_bucket{le="0.5"} 0
foo_controller_reconcile_duration_seconds_bucket{le="1"} 0
foo_controller_reconcile_duration_seconds_bucket{le="5"} 0
foo_controller_reconcile_duration_seconds_bucket{le="15"} 3 # {trace_id="27c2e480c02d586c98934828324eeb9a"} 9 1617533722.954
foo_controller_reconcile_duration_seconds_bucket{le="60"} 3
foo_controller_reconcile_duration_seconds_bucket{le="+Inf"} 3
foo_controller_reconcile_duration_seconds_sum 25
foo_controller_reconcile_duration_seconds_count 3

which SHOULD be in line with the openmetric spec on exemplars
even matches the exemplar example

promtool 2.26 does not give good info on this (but then, not sure if it has support yet, exemplars are experimental thus far.

kubectl port-forward svc/foo-controller 8080:80
curl 0.0.0.0:8080/metrics -sSL | ./promtool check metrics
error while linting: text format parsing error in line 12: expected integer as timestamp, got "#"

but looks like the grafan agent (0.13) also fails to scrape it:

kubectl port-forward -n monitoring grafana-agent-5gkqg 8000:80
curl http://0.0.0.0:8000/agent/api/v1/targets | jq
...
      "last_scrape": "2021-04-04T10:40:08.843113131Z",
      "scrape_duration_ms": 7,
      "scrape_error": "expected timestamp or new record, got \"MNAME\""

so we are probably blocked upstream on scraper not understanding the comment hash.

Image that SHOULD work: clux/controller:0.9.3

@clux
Copy link
Member Author

clux commented Apr 4, 2021

Asked in grafana's agent slack

@clux
Copy link
Member Author

clux commented Apr 4, 2021

Grafana Agent changelog implies agent 0.13 is on prometheus 0.25, and 0.26 was released literally yesterday with the exemplar pr merged. From the looks of the PR it looks like it includes the necessary changes to the scraper, so will probably have to wait for the agent to pick it up, or try to run a headless prometheus myself.

EDIT: agent testing is not possible for a while because remote_write support is missing for exemplars, and grafana cloud will need exemplar support in cortex.

@clux
Copy link
Member Author

clux commented Apr 12, 2021

Based on issues in prometheus prometheus/prometheus#8707 it's possible that exemplar scraping does not work in prometheus 2.26. I could not get it to work at any rate. If it's meant to work, I'll open an issue.

@roidelapluie
Copy link

error while linting: text format parsing error in line 12: expected integer as timestamp, got "#"

I am not sure that promtool check metrics checks the openmetrics format, it might just do the Prometheus text format.

@clux clux linked a pull request Jun 12, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants