Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add openmetrics and exemplars support #482

Closed
wants to merge 2 commits into from

Conversation

voltbit
Copy link
Contributor

@voltbit voltbit commented Nov 26, 2021

example

Overview

  • Added support for the OpenMetrics standard
  • Added support for automatically providing exemplars populated with trace information from OpenTelemetry

The main purpose of the PR is to enable the use of Exemplars for NodeJS codebase.

Relevant resources:

Grafana exemplars
Prometheus exemplars
OpenMetics spec
Prometheus format

Design

The OpenTelemetry standard is very close to the original Prometheus format. In order to keep the library as backwards compatible as possible the default format is kept unchanged (Prometheus) and the use of exemplars is disabled.

The new features can be toggled:

  • The format at the registry level (prometheus/openmetrics)
  • The exemplars at the metric level

Each registry instance has an attribute (contentType) that will decide the format.
The two possible formats are defined by the constants OPENMETRICS_CONTENT_TYPE and PROMETHEUS_CONTENT_TYPE which contain the HTTP content type.
Future versions should default to the 1.0.0 version.

Each metric has a flag for enabling the exemplar, the flag is put on the metrics supeclass for simplicity, but out of the currently implemented metric types only histograms and counters can have exemplars.

The biggest change to the code is the creation of separate functions for Counter increment and Histogram observe. Because the functions need to support a third optional parameter (exemplar labels) I have changed the way parameters are passed to the functions. Instead of using plain (label, value) the users will need to provide a single object with the format ({labels, value, exemplarLabels}).
The change should not impact existing users, but users who want to use exemplars will need to use the new call format.

Exemplar object

Timestamp - is the time when the exemplar was created
Reference from the Golang client: https://github.com/prometheus/client_golang/blob/1b145cad6847a692bd07e872d64b7102d33213c6/prometheus/histogram.go#L432.

There is a hard 128 UTF-8 character limit on exemplar length.

The labels use for out of the box traces are traceId and spanId, it feels more like JavaScript to me, there is no other reason for the name choice. The golang implementation seems to be using traceID here and the Java impl. uses trace_id here. The label used for exemplars can be changed in Grafana.

Counters in OpenMetrics

Counters have a brekaing change in the form of an enforced _total suffix, it is not just a convention anymore. Examples:

Prometheus

# HELP mycounter help
# TYPE mycounter counter
mycounter 0

# HELP mycounter2_total help
# TYPE mycounter2_total counter
mycounter2_total 0

OpenMetrics

# HELP mycounter help
# TYPE mycounter counter
mycounter_total 0

Prometheus ignores the comments related to name and type, but the name of the metrics changes too and has the potential to break dashboards/alerts etc. The current implementation follows the same approach as the Java implementation here:

https://github.com/prometheus/client_java/blob/master/simpleclient/src/main/java/io/prometheus/client/Counter.java#L72-L108

However, instead of applying the suffix at the level of the Counter object, this implementation applyes the change in the Registry object. The disadvantage is that the code is less elegant. The advantage is that the change is not breaking in any way for the existing users - the _total suffix will only be enforced by OpenMetrics registries, not Prometheus registries.

In the future, when OpenMetrics becomes more widely adopted, the behaviour can be moved inside Counter object and made mandatory.


Benchmarks

Benchmark tests were not changed. They are using the default registry type (Prometheus) and no exemplars, so it is a check to see the impact for current users of the library. Ran 4 tests (results in gists bellow). The highest impact was on the registry benchmark with a ~10-15% performance hit.

⚠ registry ➭ getMetricsAsJSON#1 with 64 is 5.345% acceptably slower.
⚠ registry ➭ getMetricsAsJSON#2 with 8 is 3.076% acceptably slower.
⚠ registry ➭ getMetricsAsJSON#2 with 4 and 2 with 2 is 4.063% acceptably slower.
✓ registry ➭ getMetricsAsJSON#2 with 2 and 2 with 4 is 0.1468% faster.
⚠ registry ➭ getMetricsAsJSON#6 with 2 is 4.137% acceptably slower.
✗ registry ➭ metrics#1 with 64 is 11.50% slower.
✓ registry ➭ metrics#2 with 8 is 0.2174% faster.
⚠ registry ➭ metrics#2 with 4 and 2 with 2 is 1.046% acceptably slower.
⚠ registry ➭ metrics#2 with 2 and 2 with 4 is 3.413% acceptably slower.
✗ registry ➭ metrics#6 with 2 is 15.03% slower.
⚠ histogram ➭ observe#1 with 64 is 0.2735% acceptably slower.
⚠ histogram ➭ observe#2 with 8 is 0.5325% acceptably slower.
⚠ histogram ➭ observe#2 with 4 and 2 with 2 is 0.09087% acceptably slower.
⚠ histogram ➭ observe#2 with 2 and 2 with 4 is 0.3243% acceptably slower.
⚠ histogram ➭ observe#6 with 2 is 0.6302% acceptably slower.
✓ gauge ➭ inc is 16.96% faster.
⚠ gauge ➭ inc with labels is 1.991% acceptably slower.
⚠ summary ➭ observe#1 with 64 is 2.261% acceptably slower.
✓ summary ➭ observe#2 with 8 is 2.166% faster.
✓ summary ➭ observe#2 with 4 and 2 with 2 is 0.4552% faster.
⚠ summary ➭ observe#2 with 2 and 2 with 4 is 1.407% acceptably slower.
⚠ summary ➭ observe#6 with 2 is 1.116% acceptably slower.

https://gist.github.com/voltbit/55bfdafccb5a0458d0b2aff9703dae43
https://gist.github.com/voltbit/1e1097e6400638334e11da52fefcd5d4
https://gist.github.com/voltbit/41828df848a7132c1aad196414ea2d69
https://gist.github.com/voltbit/70539929453a2b95d4f9ac2df6707a9b

TODO

  • Complete test coverage of the new features
  • Performance tests
  • Add better examples and readme info
  • Strategy for registry merge when there are different registry formats
    • decided to consider the merge of two different types of registries undefined behaviour, the users should always use the same type - Prometheus or OpenMetrics - if merging
  • Hadling the _total suffix on counters

Not implemented

  • Support for the _created suffix on any metrics

@SimenB
Copy link
Collaborator

SimenB commented Nov 26, 2021

Very exciting, thanks for working on this!

@@ -91,6 +92,8 @@ setInterval(() => {

server.get('/metrics', async (req, res) => {
try {
// register.contentType = Registry.OPENMETRICS_CONTENT_TYPE;
register.contentType = Registry.PROMETHEUS_CONTENT_TYPE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should go through setContentType I guess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah there is a lot of inconsistency in how that flag is set, I'll iron them out.

@voltbit voltbit force-pushed the add-openmetrics-support branch 5 times, most recently from bb10a72 to 6ae459f Compare December 3, 2021 10:49
@voltbit voltbit force-pushed the add-openmetrics-support branch 4 times, most recently from e9492ce to e9f6c44 Compare January 12, 2022 13:47
@voltbit voltbit marked this pull request as ready for review January 12, 2022 13:48
@voltbit
Copy link
Contributor Author

voltbit commented Jan 13, 2022

Hi @zbjornson could you please trigger the tests again? The PR should be ready for review :).

@zbjornson
Copy link
Collaborator

CI is green! I'll try to review this this weekend and hopefully @SimenB and/or @siimon can also review soon.

@shyimo
Copy link

shyimo commented Jan 23, 2022

thanks @voltbit & @zbjornson! we are excited about this feature!

@shyimo
Copy link

shyimo commented Feb 23, 2022

Hi @zbjornson.
Any news on that ?

Copy link
Collaborator

@SimenB SimenB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skimmed through, looks good! Lots of code, but existent tests seems to pass, so safe enough?

CHANGELOG.md Outdated
Comment on lines 19 to 21
- feat: new option for calling `observe()` and `inc()` methods on the histogram
and counter metric types that can be passed an object of format
`{labels: a, value: b, exemplarLabels: c}`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a separate PR?

Copy link
Contributor Author

@voltbit voltbit May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the implementation is quite tightly coupled to the feature for exemplars. I though about making this more generic from the beginning but I was worried about the performance impact. I tried to provide 100% backwards compatibility and minimal performance impact with the change.

Right now the call with inc({value, labels, exemplarLabels}) can only be done if the metric was created with exemplars enabled.

If I make this call more generic - allow inc() and observe() to be called with an object - I would need to differentiate between the inc({value, labels, exemplarLabels}) and the call that already exists inc({labelNames}). This might not be that bad, but it implies a check on the keys of the object every time inc/obs are used. I should also add it to all metrics, not just counter and histogram for the sake of consistency.

I am not sure which is the better aproach, let me know if you have feedback about this.

So the alternative to what we have now in the PR is: implement an object type check on all metric types and allow them to receive a single object containing all the relevant data for ins/obs operations. This might be useful on the long run, but it may as well be a feature that slows down the library for years until we need more fields like we do with exemplars (which are kind of a fringe feature anyway).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the alternative to what we have now in the PR is: implement an object type check on all metric types and allow them to receive a single object containing all the relevant data for ins/obs operations. This might be useful on the long run, but it may as well be a feature that slows down the library for years until we need more fields like we do with exemplars (which are kind of a fringe feature anyway).

One thing we could do is drop support for passing labels (inc({labelNames}) like you mention). That way you either pass a single value or an object. I don't typeof arg === 'number' adds much overhead, and then the object shape would be what we want.

Would require a major version of course, but that's not an issue. We wanna drop old nodes anyways

README.md Outdated Show resolved Hide resolved
Comment on lines +448 to +449
Merging registries of different types is undefined. The user needs to make sure
all used registries have the same type (Prometheus or OpenMetrics versions).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, can we throw a usweful error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs should be updated to say it throws, no? Not undefined behavior

index.d.ts Outdated Show resolved Hide resolved
index.d.ts Outdated Show resolved Hide resolved
lib/counter.js Outdated Show resolved Hide resolved
test/metrics/maxFileDescriptorsTest.js Outdated Show resolved Hide resolved
@dnutels
Copy link

dnutels commented Apr 24, 2022

Hi guys. Are there, by chance, any updates on how this is progressing and an estimate, hopefully?

@voltbit
Copy link
Contributor Author

voltbit commented Apr 24, 2022

Hi guys. Are there, by chance, any updates on how this is progressing and an estimate, hopefully?

Hi @dnutels I will start working again on this in a couple of weeks, I will implement the changes requested around mid May, but I cant work on it earlier.

@voltbit voltbit marked this pull request as draft May 12, 2022 10:26
@voltbit voltbit force-pushed the add-openmetrics-support branch 4 times, most recently from d16917e to 5e10b87 Compare May 18, 2022 08:45
@voltbit
Copy link
Contributor Author

voltbit commented May 18, 2022

New runs for benchmarks with the latest changes.

Expand for benchmark results

Summary:

✗ registry ➭ getMetricsAsJSON#1 with 64 is 11.43% slower.
✗ registry ➭ getMetricsAsJSON#2 with 8 is 155.0% slower.
✓ registry ➭ getMetricsAsJSON#2 with 4 and 2 with 2 is 58.48% faster.
✗ registry ➭ getMetricsAsJSON#2 with 2 and 2 with 4 is 23.80% slower.
⚠ registry ➭ getMetricsAsJSON#6 with 2 is 6.297% acceptably slower.
⚠ registry ➭ metrics#1 with 64 is 5.984% acceptably slower.
⚠ registry ➭ metrics#2 with 8 is 1.545% acceptably slower.
⚠ registry ➭ metrics#2 with 4 and 2 with 2 is 0.09495% acceptably slower.
✓ registry ➭ metrics#2 with 2 and 2 with 4 is 35.38% faster.
⚠ registry ➭ metrics#6 with 2 is 3.828% acceptably slower.
✓ histogram ➭ observe#1 with 64 is 0.2496% faster.
⚠ histogram ➭ observe#2 with 8 is 0.8472% acceptably slower.
⚠ histogram ➭ observe#2 with 4 and 2 with 2 is 0.7701% acceptably slower.
✓ histogram ➭ observe#2 with 2 and 2 with 4 is 3.590% faster.
⚠ histogram ➭ observe#6 with 2 is 3.784% acceptably slower.
✓ gauge ➭ inc is 2.087% faster.
✓ gauge ➭ inc with labels is 0.2869% faster.
✓ summary ➭ observe#1 with 64 is 3.924% faster.
✓ summary ➭ observe#2 with 8 is 0.2697% faster.
✓ summary ➭ observe#2 with 4 and 2 with 2 is 1.130% faster.
✓ summary ➭ observe#2 with 2 and 2 with 4 is 0.4797% faster.
✓ summary ➭ observe#6 with 2 is 1.389% faster.

Summary:

✓ registry ➭ getMetricsAsJSON#1 with 64 is 1.990% faster.
✓ registry ➭ getMetricsAsJSON#2 with 8 is 321.0% faster.
⚠ registry ➭ getMetricsAsJSON#2 with 4 and 2 with 2 is 8.586% acceptably slower.
✓ registry ➭ getMetricsAsJSON#2 with 2 and 2 with 4 is 13.94% faster.
✗ registry ➭ getMetricsAsJSON#6 with 2 is 16.18% slower.
✓ registry ➭ metrics#1 with 64 is 20.45% faster.
⚠ registry ➭ metrics#2 with 8 is 0.8526% acceptably slower.
✓ registry ➭ metrics#2 with 4 and 2 with 2 is 5.139% faster.
⚠ registry ➭ metrics#2 with 2 and 2 with 4 is 6.645% acceptably slower.
✓ registry ➭ metrics#6 with 2 is 0.7693% faster.
⚠ histogram ➭ observe#1 with 64 is 3.102% acceptably slower.
✓ histogram ➭ observe#2 with 8 is 1.538% faster.
✓ histogram ➭ observe#2 with 4 and 2 with 2 is 0.4875% faster.
✓ histogram ➭ observe#2 with 2 and 2 with 4 is 0.7254% faster.
✓ histogram ➭ observe#6 with 2 is 2.165% faster.
✓ gauge ➭ inc is 22.42% faster.
⚠ gauge ➭ inc with labels is 1.116% acceptably slower.
✓ summary ➭ observe#1 with 64 is 2.565% faster.
⚠ summary ➭ observe#2 with 8 is 0.2203% acceptably slower.
⚠ summary ➭ observe#2 with 4 and 2 with 2 is 0.08091% acceptably slower.
✓ summary ➭ observe#2 with 2 and 2 with 4 is 1.113% faster.
✓ summary ➭ observe#6 with 2 is 1.459% faster.

Summary:

✗ registry ➭ getMetricsAsJSON#1 with 64 is 12.81% slower.
✗ registry ➭ getMetricsAsJSON#2 with 8 is 563.2% slower.
✓ registry ➭ getMetricsAsJSON#2 with 4 and 2 with 2 is 7.191% faster.
⚠ registry ➭ getMetricsAsJSON#2 with 2 and 2 with 4 is 8.622% acceptably slower.
✗ registry ➭ getMetricsAsJSON#6 with 2 is 13.17% slower.
✗ registry ➭ metrics#1 with 64 is 25.55% slower.
⚠ registry ➭ metrics#2 with 8 is 2.849% acceptably slower.
⚠ registry ➭ metrics#2 with 4 and 2 with 2 is 3.033% acceptably slower.
⚠ registry ➭ metrics#2 with 2 and 2 with 4 is 2.256% acceptably slower.
⚠ registry ➭ metrics#6 with 2 is 4.078% acceptably slower.
⚠ histogram ➭ observe#1 with 64 is 1.686% acceptably slower.
✓ histogram ➭ observe#2 with 8 is 0.3382% faster.
⚠ histogram ➭ observe#2 with 4 and 2 with 2 is 0.2078% acceptably slower.
⚠ histogram ➭ observe#2 with 2 and 2 with 4 is 6.005% acceptably slower.
⚠ histogram ➭ observe#6 with 2 is 2.634% acceptably slower.
✓ gauge ➭ inc is 12.39% faster.
✓ gauge ➭ inc with labels is 2.485% faster.
✓ summary ➭ observe#1 with 64 is 9374% faster.
✓ summary ➭ observe#2 with 8 is 2.167% faster.
✗ summary ➭ observe#2 with 4 and 2 with 2 is 19.97% slower.
⚠ summary ➭ observe#2 with 2 and 2 with 4 is 1.552% acceptably slower.
⚠ summary ➭ observe#6 with 2 is 0.3165% acceptably slower.

@voltbit voltbit force-pushed the add-openmetrics-support branch 2 times, most recently from d24e4fc to ebdbc91 Compare May 19, 2022 11:41
@voltbit voltbit marked this pull request as ready for review May 20, 2022 08:22
@voltbit voltbit requested a review from SimenB May 20, 2022 08:23
@shyimo
Copy link

shyimo commented Jun 7, 2022

Hi @zbjornson. any news regarding the PR ?

@hitwill
Copy link

hitwill commented Jun 22, 2022

Bumping for interest

@vjsamuel
Copy link

Bumping for interest again :)

@isaac-elvt
Copy link

Bumping for interest as well, this would save my organization so much pain!

@xal3xhx
Copy link

xal3xhx commented Oct 1, 2022

bump for intrest. i really need this

@shyimo
Copy link

shyimo commented Oct 12, 2022

bump for interest. really important task

@ejba
Copy link

ejba commented Oct 27, 2022

bumping for interest!

Copy link
Collaborator

@SimenB SimenB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this generally LGTM. Would love another set of eyes, tho.

Thanks for a fantastic contribution, @voltbit!

@@ -0,0 +1,66 @@
import * as prom from '../index';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need both a JS and a TS example - could you remove this one?

@@ -150,16 +185,29 @@ interface MetricConfiguration<T extends string> {
name: string;
help: string;
labelNames?: T[] | readonly T[];
registers?: Registry[];
registers?: Registry<PrometheusContentType | OpenMetricsContentType>[];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
registers?: Registry<PrometheusContentType | OpenMetricsContentType>[];
registers?: (Registry<PrometheusContentType> | Registry<OpenMetricsContentType>)[];

right? A registry itself cannot contain either

@@ -84,19 +84,30 @@ class AggregatorRegistry extends Registry {
});
}

get contentType() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed? won't the class just inherit from super?

@@ -27,14 +43,21 @@ class Histogram extends Metric {
return acc;
}, {});

this.bucketExemplars = this.upperBounds.reduce((acc, upperBound) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be created even if config.enableExemplars isn't true?

package.json Outdated Show resolved Hide resolved
@SimenB
Copy link
Collaborator

SimenB commented Oct 27, 2022

@zbjornson @siimon PTAL 🙂

@voltbit
Copy link
Contributor Author

voltbit commented Oct 27, 2022

Thanks for all the feedback and all the interest shown! I'll rebase and check the comments as soon as possible (this weekend most likely).

@ejba
Copy link

ejba commented Oct 31, 2022

@voltbit is there anything we can do to help you?

@voltbit voltbit marked this pull request as draft November 1, 2022 15:58
- Added support for OpenMetrics format, including exemplars
- Added support for exemplars with OpenTelemetry tracing data on
  default metrics
- Added the option of passing params as one object to observe() and inc()
methods
@@ -1,6 +1,8 @@
'use strict';

const OtelApi = require('@opentelemetry/api');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be added to package.json

regContentType !== Registry.PROMETHEUS_CONTENT_TYPE &&
regContentType !== Registry.OPENMETRICS_CONTENT_TYPE
) {
throw new TypeError('Content type unsupported');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new TypeError('Content type unsupported');
throw new TypeError(`Content type ${regContentType} is unsupported`);

@voltbit
Copy link
Contributor Author

voltbit commented Nov 14, 2022

hi @shyimo & @ejba & other contributors
I am not sure I will get the time to work on this again before the holiday season. If this work is urgent for you and you want to pick up the task feel free to do so.

@vothanhbinhlt

This comment was marked as off-topic.

@SimenB

This comment was marked as off-topic.

@vothanhbinhlt

This comment was marked as spam.

karlodwyer added a commit to karlodwyer/prom-client that referenced this pull request Mar 3, 2023
karlodwyer added a commit to karlodwyer/prom-client that referenced this pull request Mar 3, 2023
@SimenB
Copy link
Collaborator

SimenB commented Mar 6, 2023

Superseded by #544

@SimenB SimenB closed this Mar 6, 2023
@SimenB
Copy link
Collaborator

SimenB commented Mar 9, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet