Replies: 4 comments 2 replies
-
Is this causing actual performance issues? We should only run relabeling once per metric, even when scraped multiple times. Addidionally. Prometheus regexes are optimized for prefix patterns in regexes: #7453 However, there might be interest to include matching directly in client libraries (client_java and client_python apparently can do that), so you could use a list in
cc @fstab for more context |
Beta Was this translation helpful? Give feedback.
-
I just went through the exercise of actually building the regular expression so we'll see if there are performance implications. You could say that there is a performance issue around precision with my example. I have This is the regular expression:
If you examine the regex in more depth it is entirely prefix matching, but cumbersome to deduce that on account of regex syntax. Really the best way for me to move forward with a regex based solution is just to have a list of the individual metrics I want and the groups I prefixes. Then join them together with I think I understand the trick behind #7453. During metric queries if people were not adding I can see how that works with some small regexes for label matching. |
Beta Was this translation helpful? Give feedback.
-
Maybe that's a good case to fix the exporter, too. |
Beta Was this translation helpful? Give feedback.
-
Another idea is to use a list:
|
Beta Was this translation helpful? Give feedback.
-
Proposal
Use case. Why is this important?
When I onboard a new prometheus exporter into a system, my team mates and I review the exposed metrics to determine which are valuable to retain and which should be dropped.
Prometheus metric names are usually prefixed with a "subsystem" that the metric belongs to. I find that during the metric selection process I usually select 2-3 metrics from each subsystem. The disparate subsystems with varying numbers of child metrics makes efficient regular expressions very difficult in my experience and I assume that most users in my situation concatenate all the metric names into a single regex using
|
. I expect this concatenated "monster regex" to be both time and space inefficient.In my own experience the Go regex engine compiles
|
patterns into a loop over each pattern. So if we have to evaluate that loop overN
metric names and there areM
|
patterns then the algorithm isO(M*N)
to see if we should keep or drop a metric. There would also be I expectO(M)
memory used to hold each of the patterns to match.Set inclusion would require upfront time to compile the set data structure but should have be much more efficient per comparison.
Let's look at an example scrape config from
Prometheus Trainings
Regex works excellently because we want to keep all the metrics of these two subsystems, and drop the rest. What if there are many subsystems of
http
and you only want to select a handful of metrics?A more ergonomic (maybe) and performant (maybe) solution would be to use "prefix set inclusion" via a trie.
My actual case is even more extreme as I have about 15 different subsystems each containing one or two metrics that I want to keep. In my situation I would prefer just to explicitly list all the metrics I would like to keep and have prometheus compile it into an efficient prefix matching structure.
In my mind the situation here is similar to pattern matching URLs in a web framework or load balancer. Prefix matching is preferred due to efficiencies, but regex is great to have for captures and complex patterns.
Beta Was this translation helpful? Give feedback.
All reactions