Very long NodeJS event loop lag #543

shappir · 2023-03-01T15:10:40Z

We are using prom-client with NestJS (together with @willsoto/nestjs-prometheus). We have several custom metrics - in particular histograms. As a result, the overall metrics size grows up to 8MB. This results in the NodeJS event loop lag getting to be as large as 1 second!

What we see:

Reducing the amount of data collected, e.g. by removing some metrics, improves the situation (however we would like to keep the metrics we have)
Explicitly invoking register.resetMetrics() causes the lag to go down almost to 0. It then starts increasing again
CPU usage is constant at around 15%
CPU profile shows long time spent in Node internal network functions, such as getPeerCertificate and destroySSL

Has anyone encountered this sort of behavior? Any suggestions?

The text was updated successfully, but these errors were encountered:

ngavalas · 2023-03-08T01:31:35Z

I've noticed this as well. There are long, blocking calls when converting the metrics to the string that gets output to the client, though my CPU traces squarely place the blame on the userland code and not the node internal network stack. Can you share you trace?

I saw you landed a PR that improved this a bit (thanks); I'm benchmarking another fix on top of yours to see if it helps. Would love an update if you figure this out.

zbjornson · 2023-03-08T03:39:27Z

Is something inflating the number of metrics, e.g. using request paths as label values? What types of metrics are you using?

14.2.0 has #540 that might help if you're using summaries.

#480 is another known perf issue with summaries, and I started a fix in #486.

ngavalas · 2023-03-08T16:01:49Z

@zbjornson From my traces, the other major offender when we have a slow call is rendering histograms. In particular, this function seems quite slow: https://github.com/siimon/prom-client/blob/master/lib/histogram.js#L210-L217

I believe we could improve the performance of this by internally having a concept of "shared labels" across an entire set of buckets so that we don't need to copy all the labels N times. In particular, histogram.get() would return this set of shared labels and we'd merge in the specific entries labels on top of that.

For histograms, this means we only build { le: upperBound } per bucket and everything else is shared. I would need to go poke around in the code a bit and see what else this would affect, but it seems like a relatively straightforward win.

ngavalas · 2023-03-08T16:35:40Z

Okay so from what I can tell, <metric>.get() is only called in the Registry, so this is a pretty self-contained change. I'll benchmark it but I expect this to be a win in practice for us.

nflaig · 2023-03-08T16:55:58Z

@ngavalas <metric>.get() is an external API and can be consumed by anyone who wants to get a single metric value

ngavalas · 2023-03-08T17:11:30Z

Ah. Thoughts on exposing an internal version for getMetricAsPrometheusString and have the existing get flatten out the labels so external callers are undisturbed? Happy to put up the PR this afternoon to start the discussion.

zbjornson · 2023-03-08T18:38:20Z

@ngavalas can you provide a reproduction case (either a full example that can be profiled, or a benchmark in the benchmarks suite) please?

ngavalas · 2023-03-08T18:51:56Z

Yeah, I'll work on a repro; I've only observed this in production so I'll need to try to figure out which labels are causing issues for histograms. I can share a snippet of a v8 CPU profile as well in a few.

ngavalas · 2023-03-08T19:24:01Z

@zbjornson FWIW, I think the existing registry benchmarks capture the problem fairly well. I have my proposed change working locally, passing all tests, and...

### Results:

┌────────────────────────────────────────┬────────────────────┬────────────────────┐
│ registry                               │ current            │ prom-client@latest │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ getMetricsAsJSON#1 with 64             │ 3747.0556369320916 │ 2000.6075672702098 │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ getMetricsAsJSON#2 with 8              │ 3458.670353309119  │ 1674.8824925227625 │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ getMetricsAsJSON#2 with 4 and 2 with 2 │ 3660.534565081909  │ 1608.8334436915613 │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ getMetricsAsJSON#2 with 2 and 2 with 4 │ 3330.496324702882  │ 452.3735183797067  │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ getMetricsAsJSON#6 with 2              │ 3081.6146752509944 │ 1982.9467290709053 │
└────────────────────────────────────────┴────────────────────┴────────────────────┘

### Summary:

✓ registry ➭ getMetricsAsJSON#1 with 64 is 87.30% faster.
✓ registry ➭ getMetricsAsJSON#2 with 8 is 106.5% faster.
✓ registry ➭ getMetricsAsJSON#2 with 4 and 2 with 2 is 127.5% faster.
✓ registry ➭ getMetricsAsJSON#2 with 2 and 2 with 4 is 636.2% faster.
✓ registry ➭ getMetricsAsJSON#6 with 2 is 55.41% faster.

ngavalas · 2023-03-08T19:24:56Z

I can put up the PR if you want to discuss?

zbjornson · 2023-03-08T19:30:33Z

Awesome. A PR would be great!

ngavalas · 2023-03-08T23:22:10Z

@zbjornson One more possible improvement idea: can we skip labels with empty values at serialization time? As far as I can tell, prometheus doesn't differentiate between {foo=""} and {}; empty labels are the same as missing.

Apparently, we tried to keep the label set consistent across all writes to a given metric specifically because of this code: https://github.com/siimon/prom-client/blob/master/lib/validation.js#L17-L27. It was complaining if the labels didn't match up.

This means we, in practice, end up serializing a large amount of totally empty labels on most metrics, which is a huge waste of serialization time and prom<->node transfer bytes. If we can just skip serializing them on the library side, all would be well. I can move to a separate issue if you want, but this is part of our histogram performance problem and I'm happy to put up the PR for that as well.

SuperQ · 2023-03-09T10:03:32Z

@ngavalas Prometheus doesn't differentiate at query time, but it does require label consistency at scrape time.

OK:

foo{bar=""} 1
foo{bar="baz"} 1

Invalid:

foo{} 1
foo{bar="baz"} 1

shappir · 2023-03-09T14:49:56Z

I'm working on another PR that dramatically improves the serialization of histograms when only a few entries changed since the previous serialization. I think that would be a common scenario when there are more than a few labels / label values. This is accomplished by retaining the serialized representation (string) per entry from the pervious invocation of getMetricAsPrometheusString inside the histogram. And entry's cached representation is discarded when a new value is observed for it.

Had to change the benchmark for registry to modify some values, otherwise it ran so fast that the benchmarking utility blew up :-P

Should I create the PR or to of master or next ?

ngavalas · 2023-03-09T15:01:01Z

@SuperQ Facepalm. Do you happen to know where it says that in the Prometheus docs? The reason I ask is because I need to know if the label ORDER also matters since that’s a detail I wasn’t sure about in my PR.

@shappir That also sounds like a good idea. I wonder if both our changes are needed or if one or the other is fast enough. I’ll try out your approach once the PR is up!

shappir · 2023-03-09T15:10:19Z

My approach is only beneficial if most of the entries in a histogram hash haven't been changed. Specifically, if none have been changed then the time to generate the Prometheus string for the histogram goes down to almost zero. But if most are changed then there is little or no benefit.

Asking again: on top of master or next ?

ngavalas · 2023-03-09T15:19:45Z

@shappir it looks like next merged into master yesterday (for example, this is in master now: a19a1ad), so probably master?

SuperQ · 2023-03-10T08:00:44Z

@ngavalas I don't think label sets are called out in the docs. Ordering within a metric family doesn't matter tho.

I think the label name sets was implied in Prometheus, but we formalized it in the OpenMetrics spec.

shappir · 2023-03-12T11:02:15Z

I implemented the the caching mechanism: #555

I changed the registry metrics benchmark to update an eighth of the histogram for each iteration, otherwise it was too fast :-) With this update, the performance improvement is in the range of 100% - 200%.

ngavalas · 2023-03-13T18:04:32Z

FYI, I deployed the new pre-release version that has my fix in it and I'm seeing the metric render time drop from 500ms+ to ~40ms. We can probably squeeze some more juice out of this, but we've made huge progress for the histogram-heavy workload we have in real life.

shappir · 2023-03-15T10:08:26Z

I really like what you did - it's a great idea. Hope to see it officially released soon so that I can check the impact on our production servers. (And see if metrics response generation is indeed the bottleneck - I know the issue is caused by Prometheus integration but I'm not yet sure as to the cause, as I indicated in my original comment.)

I am still checking if I can implement my optimization on top of yours as I think it can provide noticeable added benefit.

shappir · 2023-03-15T11:47:43Z

I did create a PR which is mostly code cleanup after the shared labels modification, and also some minor optimizations: #559

shappir · 2023-03-27T10:26:41Z

Turns out that the value reported by nodejs_eventloop_lag_seconds is not the average Node event loop lag. Rather it's the time required to compute and send the metrics response. This is why it only goes up and never comes down as more and more data is collected. This should be fixed. See this issue for details.

SimenB · 2023-10-13T07:17:11Z

v15 is out with both of your improvements now (thanks a bunch!). Are we ok now, or should we keep this issue open?

shappir · 2023-10-15T09:00:38Z

v15 is out with both of your improvements now

This is great news! We have lots of graphs monitoring the performance of services using prom-client. We will upgrade and I will share the results. I think that will provide good feedback to where we are.

Currently the only major performance improvement I can think of that hasn't been done yet is to cache the string representation of each entry, in addition to the actual values. This way each entry that hasn't changed (which would be most when cardinality is high) could reuse it instead of computing it again.

The obvious downside is that memory consumption would remain high after metrics have been reported.

SimenB · 2023-10-16T06:32:31Z

Great stuff, looking forward to it!

Caching the strings seems like something we can probably do behind an option or some runtime toggle? Or possibly only if a metric has a cardinality of something larger than some n.

shappir · 2023-10-19T10:22:43Z

Here is the graphic. As you can see, the time to generate the prom-client response is still lengthy (grows up to ~400ms) but reduced 60% relative to what it was with the previous version. No other changes to the system

Adding an option may be a good idea. Using cardinality to determine this behavior isn't applicable in my opinion:

Low cardinality would benefit less, but would also consume little memory
High cardinality would benefits the most, but would also consume a lot of memory

SimenB · 2023-10-27T12:32:02Z

#594 might help as well

shappir · 2023-11-13T11:43:37Z

Here are the results several weeks later. Since upgrading to 1.5 at the end of September the maximum response time to the Prometheus service has been reduced by 30%.

SimenB · 2024-01-03T18:48:31Z

@shappir more improvements in https://github.com/siimon/prom-client/releases/tag/v15.1.0

shappir mentioned this issue Mar 2, 2023

Very long NodeJS event loop lag willsoto/nestjs-prometheus#1659

Closed

ngavalas mentioned this issue Mar 8, 2023

perf: swap order of escape regexes to avoid lookahead #546

Merged

ngavalas mentioned this issue Mar 8, 2023

perf: improve performance of rendering metrics to Prometheus string #549

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very long NodeJS event loop lag #543

Very long NodeJS event loop lag #543

shappir commented Mar 1, 2023

ngavalas commented Mar 8, 2023 •

edited

zbjornson commented Mar 8, 2023

ngavalas commented Mar 8, 2023

ngavalas commented Mar 8, 2023 •

edited

nflaig commented Mar 8, 2023 •

edited

ngavalas commented Mar 8, 2023 •

edited

zbjornson commented Mar 8, 2023

ngavalas commented Mar 8, 2023

ngavalas commented Mar 8, 2023 •

edited

ngavalas commented Mar 8, 2023

zbjornson commented Mar 8, 2023

ngavalas commented Mar 8, 2023 •

edited

SuperQ commented Mar 9, 2023

shappir commented Mar 9, 2023

ngavalas commented Mar 9, 2023 •

edited

shappir commented Mar 9, 2023

ngavalas commented Mar 9, 2023 •

edited

SuperQ commented Mar 10, 2023

shappir commented Mar 12, 2023

ngavalas commented Mar 13, 2023

shappir commented Mar 15, 2023

shappir commented Mar 15, 2023

shappir commented Mar 27, 2023

SimenB commented Oct 13, 2023

shappir commented Oct 15, 2023

SimenB commented Oct 16, 2023

shappir commented Oct 19, 2023

SimenB commented Oct 27, 2023

shappir commented Nov 13, 2023

SimenB commented Jan 3, 2024

Very long NodeJS event loop lag #543

Very long NodeJS event loop lag #543

Comments

shappir commented Mar 1, 2023

ngavalas commented Mar 8, 2023 • edited

zbjornson commented Mar 8, 2023

ngavalas commented Mar 8, 2023

ngavalas commented Mar 8, 2023 • edited

nflaig commented Mar 8, 2023 • edited

ngavalas commented Mar 8, 2023 • edited

zbjornson commented Mar 8, 2023

ngavalas commented Mar 8, 2023

ngavalas commented Mar 8, 2023 • edited

ngavalas commented Mar 8, 2023

zbjornson commented Mar 8, 2023

ngavalas commented Mar 8, 2023 • edited

SuperQ commented Mar 9, 2023

shappir commented Mar 9, 2023

ngavalas commented Mar 9, 2023 • edited

shappir commented Mar 9, 2023

ngavalas commented Mar 9, 2023 • edited

SuperQ commented Mar 10, 2023

shappir commented Mar 12, 2023

ngavalas commented Mar 13, 2023

shappir commented Mar 15, 2023

shappir commented Mar 15, 2023

shappir commented Mar 27, 2023

SimenB commented Oct 13, 2023

shappir commented Oct 15, 2023

SimenB commented Oct 16, 2023

shappir commented Oct 19, 2023

SimenB commented Oct 27, 2023

shappir commented Nov 13, 2023

SimenB commented Jan 3, 2024

ngavalas commented Mar 8, 2023 •

edited

ngavalas commented Mar 8, 2023 •

edited

nflaig commented Mar 8, 2023 •

edited

ngavalas commented Mar 8, 2023 •

edited

ngavalas commented Mar 8, 2023 •

edited

ngavalas commented Mar 8, 2023 •

edited

ngavalas commented Mar 9, 2023 •

edited

ngavalas commented Mar 9, 2023 •

edited