Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple prometheus metrics #125

Open
anthr76 opened this issue Jul 11, 2022 · 9 comments · Fixed by #307
Open

Simple prometheus metrics #125

anthr76 opened this issue Jul 11, 2022 · 9 comments · Fixed by #307
Labels
enhancement New feature or request help wanted Extra attention is needed v2 v2 release

Comments

@anthr76
Copy link

anthr76 commented Jul 11, 2022

Is your feature request related to a problem? Please describe.
I'd like to observe some minimal metrics.

Describe the solution you'd like
Expose prometheus metrics

Describe alternatives you've considered
Looking for a rust library to possibly plugin into.

Additional context
Add any other context or screenshots about the feature request here if needed.

@joseluisq joseluisq added enhancement New feature or request v2 v2 release labels Jul 11, 2022
@joseluisq joseluisq added the help wanted Extra attention is needed label Jul 20, 2022
@pl4nty
Copy link

pl4nty commented Apr 9, 2023

The tracing Prometheus integration seems to have stalled (tokio-rs/tracing#29), but there's an OpenTelemetry integration that supports OTLP metrics with a feature flag. Looks like it'd be a few lines in logger.rs so I'll do a first pass

@pl4nty
Copy link

pl4nty commented Apr 10, 2023

Famous last words... I've implemented an stdout exporter, but the OTLP exporter requires tokio for transport (via tonic). Never written Rust before so not sure how to implement that safely, since the existing tokio runtime builder has error handling that relies on tracing init. Maybe init stdout tracing, then add the OTLP layer later?

@joseluisq
Copy link
Collaborator

Maybe init stdout tracing, then add the OTLP layer later?

I'm not OTLP familiar. so if a stdout exporter is fine for the purpose and Prometheus usage, then we could go with it.

On the other hand, if feature-completeness is wanted then I was even thinking why not introduce the feature under an --experimental- prefix CLI flag so users can improve it later. But here I will need help from a permanent contributor that makes sure the feature will be finished.

So don't hesitate to open a PR or draft if you think, so others could jump in.

@pl4nty
Copy link

pl4nty commented Dec 18, 2023

Had another attempt at this today. Prometheus scrapes text metrics from an endpoint, so I've added /metrics with fields from tokio-metrics-collector. But I'm not sure this is ready for a PR because it uses Tokio RuntimeMetrics, which requires tokio_unstable.

Unfortunately the OTel crate didn't have default fields, and the first-party tokio-metrics is stateful so isn't suitable for Prometheus. It's also worth considering whether we want any additional metrics.

Current metrics
# HELP tokio_budget_forced_yield_count Returns the number of times that tasks have been forced to yield back to the scheduler after exhausting their task budgets.
# TYPE tokio_budget_forced_yield_count counter
tokio_budget_forced_yield_count 0
# HELP tokio_elapsed Total amount of time elapsed since observing runtime metrics.
# TYPE tokio_elapsed counter
tokio_elapsed 3.739249635
# HELP tokio_injection_queue_depth The number of tasks currently scheduled in the runtime’s injection queue.
# TYPE tokio_injection_queue_depth gauge
tokio_injection_queue_depth 0
# HELP tokio_io_driver_ready_count Returns the number of ready events processed by the runtime’s I/O driver.
# TYPE tokio_io_driver_ready_count counter
tokio_io_driver_ready_count 7
# HELP tokio_num_remote_schedules The number of tasks scheduled from outside of the runtime.
# TYPE tokio_num_remote_schedules counter
tokio_num_remote_schedules 2
# HELP tokio_total_busy_duration The amount of time worker threads were busy.
# TYPE tokio_total_busy_duration counter
tokio_total_busy_duration 0.000231982
# HELP tokio_total_local_queue_depth The total number of tasks currently scheduled in workers’ local queues.
# TYPE tokio_total_local_queue_depth gauge
tokio_total_local_queue_depth 0
# HELP tokio_total_local_schedule_count The number of tasks scheduled from worker threads.
# TYPE tokio_total_local_schedule_count counter
tokio_total_local_schedule_count 1
# HELP tokio_total_noop_count The number of times worker threads unparked but performed no work before parking again.
# TYPE tokio_total_noop_count counter
tokio_total_noop_count 9
# HELP tokio_total_overflow_count The number of times worker threads saturated their local queues.
# TYPE tokio_total_overflow_count counter
tokio_total_overflow_count 0
# HELP tokio_total_park_count The number of times worker threads parked.
# TYPE tokio_total_park_count counter
tokio_total_park_count 10
# HELP tokio_total_polls_count The number of tasks that have been polled across all worker threads.
# TYPE tokio_total_polls_count counter
tokio_total_polls_count 1
# HELP tokio_total_seal_operations The number of times worker threads stole tasks from another worker thread.
# TYPE tokio_total_seal_operations counter
tokio_total_seal_operations 0
# HELP tokio_total_steal_count The number of tasks worker threads stole from another worker thread.
# TYPE tokio_total_steal_count counter
tokio_total_steal_count 0
# HELP tokio_workers_count The number of worker threads used by the runtime.
# TYPE tokio_workers_count gauge
tokio_workers_count 8

I've added an OTLP trace exporter too. Not sure I know enough to get it merge-ready though - it's using #[tokio::main] (is this ok?) and outputs unformatted errors.

@joseluisq
Copy link
Collaborator

I've added an OTLP trace exporter too. Not sure I know enough to get it merge-ready though - it's using #[tokio::main] (is this ok?) and outputs unformatted errors.

@pl4nty Maybe you could even start a draft PR just to see how it would look like and exchange ideas.

Just as an anecdote. I had a conversation about the Prometheus feature in SWS with one of the tracing crate maintainers time ago and he said that it could work but he was not really sure.

In eventually case, I could even try to reach out the tracing crate folks on Discord to help us out.

@joseluisq
Copy link
Collaborator

And about the feature, it is not specific at the moment what metrics has to be exported. So it would be also good to describe what metrics should be expected.

@pl4nty
Copy link

pl4nty commented Dec 26, 2023

I'll send a PR for metrics soon. OTLP is proving more challenging - I've fixed error formatting, but testing against a server fails when initialised from logger.rs. cannot send message to batch processor as the channel is closed so likely Tokio-related?

I can init in server.rs after the runtime is built, but can't find a way to init the stdout layer in logger.rs then add the OTLP layer later. Maybe these issues are relevant?
tokio-rs/tracing#2499
tokio-rs/tracing#1629

@joseluisq
Copy link
Collaborator

@pl4nty I left some comments on #296. Let's continue the discussion there.

@joseluisq
Copy link
Collaborator

Experimental Tokio Runtime metrics for Prometheus is added as a first stage in #307. However, we would like to continue enhancing this feature to support server metrics like incoming requests, connected clients, response code and response duration in the near future.
Feel free to contribute!

@joseluisq joseluisq added the help wanted Extra attention is needed label Feb 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed v2 v2 release
Projects
None yet
3 participants