Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: align compliancy to Prometheus naming convention #284

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

izonder
Copy link

@izonder izonder commented Aug 27, 2019

Fixes #283

@dswarbrick
Copy link

There seems to be some confusion about how to expose a "good" Prometheus metric in this library. Take this example:

# HELP nodejs_active_handles Number of active libuv handles grouped by handle type. Every handle type is C++ class name.
# TYPE nodejs_active_handles gauge
nodejs_active_handles{type="WriteStream",component="billing-dc-traffic",worker="0"} 2
nodejs_active_handles{type="Server",component="billing-dc-traffic",worker="0"} 1
nodejs_active_handles{type="Timer",component="billing-dc-traffic",worker="0"} 2
nodejs_active_handles{type="Socket",component="billing-dc-traffic",worker="0"} 1

# HELP nodejs_active_handles_total Total number of active handles.
# TYPE nodejs_active_handles_total gauge
nodejs_active_handles_total{component="billing-dc-traffic",worker="0"} 6

Aside from the fact that the _total suffix in a the name of non-counter metric will trigger a warning about violating best practice naming conventions, the mere existence of this nodejs_active_handles_total is something of a Prometheus anti-pattern.

If I wanted a total of active handles (by component & worker), I could simply use the following promql query:

sum without (type) (nodejs_active_handles)

@SimenB
Copy link
Collaborator

SimenB commented Aug 29, 2019

@izonder CI is broken, which is why I haven't reviewed this yet.


@dswarbrick that was to avoid a breaking change, see discussion in #260. If there are other examples, we'd be happy to fix them 🙂

Merging this PR is also a breaking change, so we can clean up all names at once.

@izonder
Copy link
Author

izonder commented Sep 18, 2019

@SimenB please pay your attention there are some troubles with Node.js v6.x.x environment in Travis: https://travis-ci.org/siimon/prom-client/jobs/586419309

Seems the problem is out of the task scope.

@markmsmith
Copy link

markmsmith commented Nov 22, 2019

Are CI issues still blocking this PR? I was kind of hoping all the renaming would settle before we roll out prometheus and have to deal with updating queries, alerts and graphs from these breaking changes.

@zbjornson
Copy link
Collaborator

The latest CI failures are real test failures that need to be addressed before this can be merged.

I haven't had a chance to look at this in depth though, considering #284 (comment) and the desire to clean up all metric names at once.

@WOLFinBULLcity
Copy link

WOLFinBULLcity commented Feb 20, 2020

It looks like the failed tests are potentially as simple as just needing to account for the addition of the timestamp property: https://travis-ci.org/siimon/prom-client/jobs/615513387?utm_medium=notification&utm_source=github_status

FAIL  test/clusterTest.js
  ● AggregatorRegistry › AggregatorRegistry.aggregate() › uses `aggregate` method defined for nodejs_eventloop_lag_seconds
    expect(received).toEqual(expected) // deep equality
    - Expected
    + Received
    @@ -4,9 +4,10 @@
        "name": "nodejs_eventloop_lag_seconds",
        "type": "gauge",
        "values": Array [
          Object {
            "labels": Object {},
    +       "timestamp": 1502075832298,
            "value": 0.0085,
          },
        ],
      }
      203 | 				.getSingleMetric('nodejs_eventloop_lag_seconds')
      204 | 				.get();
    > 205 | 			expect(ell).toEqual({
          | 			            ^
      206 | 				help: 'Lag of event loop in seconds.',
      207 | 				name: 'nodejs_eventloop_lag_seconds',
      208 | 				type: 'gauge',
      at Object.toEqual (test/clusterTest.js:205:16)

@sam-github
Copy link
Contributor

I just checked this out, rebased it against master, and it passed. The last commit adds timestamp support, and all timestampe support is unneeded. So, I pushed that commit off my branch, and retested, and it passed.... that shows the last commit lacked unit tests! :-)

But, since it should be removed, it doesn't matter.

I didn't actually look at the output and run it pass the promtool checker, though, so I've no comment on the substance of the PR yet.


const NODEJS_ACTIVE_HANDLES = 'nodejs_active_handles';
const NODEJS_ACTIVE_HANDLES_TOTAL = 'nodejs_active_handles_total';
const NODEJS_ACTIVE_HANDLES_TOTAL_NUMBER = 'nodejs_active_handles_total_number';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core problem here is that promtool thinks that anything called *_total should be COUNT, right, not a GAUGE? Would renaming to nodejs_active_handles also address this?

Copy link
Author

@izonder izonder Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every renaming is very easy to check with built-in prometheus "linter". Probably, it even could be a good idea just to add the linting as a job to CI pipeline

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need any help with this? I've love to pay it forward

Dmitry Morgachev added 2 commits April 14, 2021 18:35
…iimon-master

# Conflicts:
#	lib/metrics/processHandles.js
#	lib/metrics/processRequests.js
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compliancy the default metrics with Prometheus Naming Convention
8 participants