-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tsdb: check for context cancel before regex matching postings #14096
tsdb: check for context cancel before regex matching postings #14096
Conversation
Regex matching can be heavy if the regex takes a lot of cycles to evaluate and we can get stuck evaluating postings for a long time without this fix. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
We could probably have more sophisticated generated tests, but these show the intent at least. |
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (modulo a couple of nits)
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a small question
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM! (modulo @pracucci comments)
// checkContextEveryNIterations is used in some tight loops to check if the context is done. | ||
checkContextEveryNIterations = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be 1024
- I feel like
100
is too often. How often do we want to check? I'd say every 10ms or even 100ms would be enough. For 10ms, this would mean that each check takes 100 microseconds, and that sounds like a lot (do we have numbers?) - Can we make it a power of two like
1024
? We don't really care if it's every 1000 or every 1024 (or 100 vs 128), but it's way cheaper for the CPU to do the latter operation. (I did the benchmarks, and there's a noticeable difference when switching to 128. Interestingly, same difference as checkingcount == count>>7<<7
but for some reason, checkingcount & (1<<7-1) == 0
is less performant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also happy to follow up later with a separate PR and benchmarks that would support this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that checking every 100ms is enough.
We didn't have timings; I guessed that 100 was right.
The case that prompted this PR had a 7KB regexp with a 40-deep stack.
Also the CPU was throttled because it was well over its request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a favorite in this race, we've seen 1000 be effective in a test we did. @bboreham are you ok changing to 1024?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we put out the fire in our own infra, I would like to see one timing of one comparable case before changing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge with 100 and I'll bring proper benchmarks to update the value later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up: #14118
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
grafana/mimir-prometheus@cf682ab Relevant PRs: prometheus/prometheus#14096 prometheus/prometheus#14068 prometheus/prometheus#13669 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Followup to prometheus#14096 Unfortunately the previous PR introduced this bug by not releasing the lock before returning. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Update mimir-prometheus to e5b85c151ba8 grafana/mimir-prometheus@e5b85c1 prometheus/prometheus#14103 prometheus/prometheus#14096 prometheus/prometheus#14068 prometheus/prometheus#13669 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Update mimir-prometheus to e5b85c151ba8 grafana/mimir-prometheus@e5b85c1 prometheus/prometheus#14103 prometheus/prometheus#14096 prometheus/prometheus#14068 prometheus/prometheus#13669 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> (cherry picked from commit 307567a)
* Update mimir-prometheus to e5b85c151ba8 grafana/mimir-prometheus@e5b85c1 prometheus/prometheus#14103 prometheus/prometheus#14096 prometheus/prometheus#14068 prometheus/prometheus#13669 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> (cherry picked from commit 307567a) Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
…heus#14096) * tsdb: check for context cancel before regex matching postings Regex matching can be heavy if the regex takes a lot of cycles to evaluate and we can get stuck evaluating postings for a long time without this fix. The constant checkContextEveryNIterations=100 may be changed later. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
Followup to prometheus#14096 Unfortunately the previous PR introduced this bug by not releasing the lock before returning. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
Quote label name in matchers when needed When the label name of a matcher contains non-standard characters, like a dot, or starts with a digit, it should be quoted. If it's not quoted, then `VectorSelector.String()` isn't a valid PromQL. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Signed-off-by: Paulo Dias <paulodias.gm@gmail.com> Optimize Matcher.String() Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Signed-off-by: Paulo Dias <paulodias.gm@gmail.com> Use bytes.Buffer from stack buf in Matcher.String() Also removed the growing until there's a benchmark for that. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Signed-off-by: Paulo Dias <paulodias.gm@gmail.com> tsdb: check for context cancel before regex matching postings (prometheus#14096) * tsdb: check for context cancel before regex matching postings Regex matching can be heavy if the regex takes a lot of cycles to evaluate and we can get stuck evaluating postings for a long time without this fix. The constant checkContextEveryNIterations=100 may be changed later. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: Paulo Dias <paulodias.gm@gmail.com> tsdb/index/postings: fix missing lock unlock Followup to prometheus#14096 Unfortunately the previous PR introduced this bug by not releasing the lock before returning. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: Paulo Dias <paulodias.gm@gmail.com> fix: rebase bugfix: allow opting-out of multi-cluster setups Allow users to opt-out of the multi-cluster setup for Prometheus dashboard, in environments where it isn't applicable. Refer: prometheus#13180. Signed-off-by: Pranshu Srivastava <rexagod@gmail.com> Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
Follow up on prometheus#14096 As promised, I bring a benchmark, which shows a very small improvement if context is checked every 128 iterations of label instead of every 100. It's much easier for a computer to check modulo 128 than modulo 100. This is a very small 0-2% improvement but I'd say this is one of the hottest paths of the app so this is still relevant. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Follow up on #14096 As promised, I bring a benchmark, which shows a very small improvement if context is checked every 128 iterations of label instead of every 100. It's much easier for a computer to check modulo 128 than modulo 100. This is a very small 0-2% improvement but I'd say this is one of the hottest paths of the app so this is still relevant. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Update mimir-prometheus to e5b85c151ba8 grafana/mimir-prometheus@e5b85c1 prometheus/prometheus#14103 prometheus/prometheus#14096 prometheus/prometheus#14068 prometheus/prometheus#13669 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Regex matching can be heavy if the regex takes a lot of cycles to evaluate and we can get stuck evaluating postings for a long time without this fix.
This PR adds a missing unit test for checking that the index reader honors context cancel when doing
r.traversePostingOffsets
iteration.