-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn when targets relabelled to same labels #9589
base: main
Are you sure you want to change the base?
Conversation
4fb8205
to
20348b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just a few comments.
This indeed does not belong to the discovery manager but to the scrape manager. |
thanks @LeviHarrison, addressed your comments.
@roidelapluie the logic is currently in scrape manager's |
20348b8
to
edb4410
Compare
discovery/targetgroup/targetgroup.go
Outdated
@@ -20,7 +20,7 @@ import ( | |||
"github.com/prometheus/common/model" | |||
) | |||
|
|||
// Group is a set of targets with a common label set(production , test, staging etc.). | |||
// Group is a set of targets with a common label set(production, test, staging etc.). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the comment is already being changed...
// Group is a set of targets with a common label set(production, test, staging etc.). | |
// Group is a set of targets with a common label set (production, test, staging etc). |
scrape/manager.go
Outdated
activeTargets := make(map[uint64]*Target) | ||
for _, scrapePool := range m.scrapePools { | ||
for _, target := range scrapePool.activeTargets { | ||
if t, ok := activeTargets[target.hash()]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of calculating the hash again, we can use the key of scrapePool.activeTargets
.
scrape/manager.go
Outdated
for _, scrapePool := range m.scrapePools { | ||
for _, target := range scrapePool.activeTargets { | ||
if t, ok := activeTargets[target.hash()]; ok { | ||
level.Warn(m.logger).Log("msg", "Found targets with same labels after relabelling", "target", t, "target", target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is that the identifiers for the targets (t
and target
) are a 64-bit integer and a URL, the former being not very helpful, and both not matching.
I'm partial to just including the URL of the target, and because both will be the same we only need one, but I'll turn to @roidelapluie for a second opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, my comment is wrong. Both of these come out to be the URL of the target, and since that's the same, we only need one.
I think @roidelapluie is affirming the location in this PR is correct. |
Could you please also add a quick test for this? |
Picking this up during our bug scrub. @darshanime are you still up to adding a test? |
Discussed again at the bug scrub; seems like a useful change. @LeviHarrison since you looked through it could you add a test please? |
3cb631e
to
4e8c000
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could extract the new check to its own function - reload
is getting a bit long.
scrape/manager.go
Outdated
for _, target := range scrapePool.activeTargets { | ||
if t, ok := activeTargets[target.labels.Hash()]; ok { | ||
level.Warn(m.logger).Log( | ||
"msg", "Found targets with same labels after relabelling", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should print the full set of labels? (target.labels.String()
)
scrape/manager.go
Outdated
activeTargets := make(map[uint64]*Target) | ||
for _, scrapePool := range m.scrapePools { | ||
for _, target := range scrapePool.activeTargets { | ||
if t, ok := activeTargets[target.labels.Hash()]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think we should return early.
t, ok := activeTargets[target.labels.Hash()]
if !ok {
continue
}
... log here ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, TFR!
EDIT: that probably not going to work as you're looking for dups across sLoops.
|
ran the benchmark:
|
d42b893
to
07410e7
Compare
Please add |
e55df3e
to
5f0bcc6
Compare
Signed-off-by: darshanime <deathbullet@gmail.com>
Signed-off-by: darshanime <deathbullet@gmail.com>
Signed-off-by: darshanime <deathbullet@gmail.com>
Signed-off-by: darshanime <deathbullet@gmail.com>
Signed-off-by: darshanime <deathbullet@gmail.com>
Signed-off-by: darshanime <deathbullet@gmail.com>
5f0bcc6
to
9d8d802
Compare
@bboreham, i have created a new benchmark; as expected the operation isn't memory intensive for 10k targets...
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When posting benchmark results it is traditional to give the before/after comparison.
However when I tried to run the benchmark against the code "before", it hung.
This is because the benchmark does more work when it runs faster, which makes it an invalid benchmark.
discovery/targetgroup/targetgroup.go
Outdated
@@ -20,7 +20,7 @@ import ( | |||
"github.com/prometheus/common/model" | |||
) | |||
|
|||
// Group is a set of targets with a common label set(production , test, staging etc.). | |||
// Group is a set of targets with a common label set(production, test, staging etc). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change seems unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, unrelated but entirely routine and uncontroversial. Are you suggesting I remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. When I look back over the history of a file I want to see changes labeled with the reason they were made.
For me it is routine and uncontroversial to put distinct changes in different PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, fair enough. Will also remove the new lines I added elsewhere in the PR.
scrape/manager_test.go
Outdated
activeTargets: map[uint64]*Target{}, | ||
} | ||
|
||
for i := 0; i < b.N; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is essential to do the same amount of work each time the benchmark is run, so varying the number of targets with b.N
is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are running the reload
function 10k times, each time with 10k targets via go test -bench=BenchmarkManagerReload -benchmem -run=- -count 6 -benchtime=10000x
. This is similar to this benchmark.
Do you think it would be better to hardcode the #targets 10k instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to this benchmark.
Another invalid benchmark.
Do you think it would be better to hardcode the #targets 10k instead?
Yes.
m.scrapePools["default"] = sp | ||
|
||
m.reload() | ||
require.Contains(t, output, "Found targets with same labels after relabelling") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test do any relabeling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, we manually add 2 targets with the same label sets and assert that the log output contains the desired warning.
scrape/manager.go
Outdated
lHash := target.labels.Hash() | ||
t, ok := activeTargets[lHash] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some risk that two sets of labels will hash to the same value; it would be safer to make the map key labels.Bytes()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, xxHash
is non-cryptographic, TIL.
We have a tradeoff here between the inconvenience caused by a (rare) spurious warn
log and a bit more memory usage. Made the change, let me know if you change your mind.
Benchmark after using labels.Bytes()
as key
$ go test -bench=BenchmarkManagerReload -benchmem -run=- -count 6 -benchtime=10000x
goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/scrape
BenchmarkManagerReload-10 10000 842551 ns/op 698718 B/op 10003 allocs/op
BenchmarkManagerReload-10 10000 844213 ns/op 698716 B/op 10003 allocs/op
BenchmarkManagerReload-10 10000 853768 ns/op 698716 B/op 10003 allocs/op
BenchmarkManagerReload-10 10000 842443 ns/op 698716 B/op 10003 allocs/op
BenchmarkManagerReload-10 10000 841944 ns/op 698716 B/op 10003 allocs/op
BenchmarkManagerReload-10 10000 870490 ns/op 698716 B/op 10003 allocs/op
PASS
ok github.com/prometheus/prometheus/scrape 52.022s
comparison with original implementation of using Hash
goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/scrape
│ /tmp/old │ /tmp/new │
│ sec/op │ sec/op vs base │
ManagerReload-10 745.3µ ± 2% 838.8µ ± 1% +12.54% (p=0.002 n=6)
│ /tmp/old │ /tmp/new │
│ B/op │ B/op vs base │
ManagerReload-10 312.0Ki ± 0% 682.3Ki ± 0% +118.67% (p=0.002 n=6)
│ /tmp/old │ /tmp/new │
│ allocs/op │ allocs/op vs base │
ManagerReload-10 3.000 ± 0% 10003.000 ± 0% +333333.33% (p=0.002 n=6)
You are right about the delta being the traditional way to showcase benchmark results, but (as you found out too), without this patch, the loop does so little work that the numbers don't register at all (are mostly all 0s). How do you wish to proceed from here? imo, the "benchmark" shows that the patch only adds a single, inexpensive pass thru the target set; so it did its work. I propose we delete the benchmark altogether now that we know the patch doesn't do anything super expensive. |
Signed-off-by: darshanime <deathbullet@gmail.com>
Write a valid benchmark, that does the same amount of work per iteration. |
Signed-off-by: darshanime <deathbullet@gmail.com>
As I mentioned earlier, the benchmark without this patch is not interesting. Added it here nonetheless after hardcoding the target set size to 10k. lmk if I got your request wrong. Without this patch:
With this patch
Delta
|
Signed-off-by: darshanime <deathbullet@gmail.com>
I think it's interesting to know what the delta is. About 0.8ms for 10K targets. On a large Kubernetes cluster, with changes happening all the time, that might be significant. I'm not sure what the true baseline would be. 20ns is sufficiently small that it makes me check what the benchmark does in the 'before' version, which is nothing at all. |
closes #5136
Signed-off-by: darshanime deathbullet@gmail.com