Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid potential overhead of ThreadLocal.remove in LogbackMetrics #3952

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

spmason
Copy link

@spmason spmason commented Jul 3, 2023

In one of our apps we're seeing that over time the size of the ThreadLocalMap underpinning the ThreadLocal class gets large (due to other libraries overusing/abusing ThreadLocals). When this happens ThreadLocal.remove can start to become a performance bottleneck as it does some expungeStaleEntry calls that iterate over the whole map.

Because logger calls can be so frequent (even if they're suppressed this code is still called) the TurboFilter gets called a lot, and thus so does ThreadLocal.remove. Therefore Micrometer ends up slowing down all our code.

I agree that technically this isn't Micrometer's fault, but the potential for it to happen in any application is still there and MM should really do what it can to avoid becoming a bottleneck.

So this PR changes LogbackMetrics to remove the need for it to call ThreadLocal.remove and thus the potential for this bug. It does this by not clearing the ThreadLocal after use and instead setting it to a 'holder' class that has its own boolean flag that gets flipped on/off inside ignoreMetrics.

Running the LogbackBenchmarks this seems to be possibly faster than the regular code even when ThreadLocals aren't overloaded, so it seems like a decent improvement overall

Of course the tradeoff is that now micrometer 'pollutes' all threads with an extraneous ThreadLocal value that it never cleans up. However:

  1. If a thread doesn't log(?) then it won't get a ThreadLocal. If it does then the chances are it will log again (soon?) so the ThreadLocal will be needed again then anyway.
  2. It's not a big object, one boolean
  3. It should be cleaned up when the threads die

In developing this fix I did try a couple of other solutions:

  1. Just call ThreadLocal.set(false) instead of remove. Unfortunately set does some similar cleanup to remove so the issue would still be there
  2. Use an AtomicBoolean instead of a custom class to hold the flag. This wasn't performant enough as AtomicBoolean does a load of locking etc as it's meant to be used across threads - as these flags are explicitly only used on a single thread that locking is unnecessary overhead
  3. Use a boolean[] as the flag holder. This seems a bit too ugly to me

Note that there is a similar issue to this one that might have helped this situation (#3891) as most of the time this is happening for me on some suppressed logger.debug calls. However even without that fix this would still be an issue for every non-suppressed logger call that my app made.

In one of our apps we're seeing that over time the size of the ThreadLocalMap underpinning the ThreadLocal class gets large (due to other libraries overusing/abusing ThreadLocals). When this happens `ThreadLocal.remove` can start to become a performance bottleneck as it does some `expungeStaleEntry` calls that iterate over the whole map.

Because logger calls can be so frequent (even if they're suppressed this code is still called) the `TurboFilter` gets called _a lot_, and thus so does `ThreadLocal.remove`. Therefore Micrometer ends up slowing down all our code.

I agree that technically this isn't Micrometer's fault, but the potential for it to happen in any application is still there and MM should really do what it can to avoid becoming a bottleneck.

So this PR changes `LogbackMetrics` to remove the need for it to call `ThreadLocal.remove` and thus the potential for this bug. It does this by _not clearing the ThreadLocal after use_ and instead setting it to a 'holder' class that has its own boolean flag that gets flipped on/off inside `ignoreMetrics`.

Running the LogbackBenchmarks this seems to be possibly faster than the regular code even when ThreadLocals aren't overloaded, so it seems like a decent improvement overall

Of course the tradeoff is that now MM 'pollutes' all threads with an extraneous ThreadLocal that it never cleans up. However:

1. If a thread doesn't log(?) then it won't get a ThreadLocal. If it does then the chances are it will log again (soon?) so the ThreadLocal will be needed again then anyway.
2. It's not a big object, one boolean
3. It should be cleaned up when the threads die

In developing this fix I did try a couple of other solutions:
1. Just call `ThreadLocal.set(false)` instead of `remove`. Unfortunately `set` does some similar cleanup to `remove` so the issue would still be there
2. Use an `AtomicBoolean` instead of a custom class to hold the flag. This wasn't performant enough as `AtomicBoolean` does a load of locking etc as it's meant to be used across threads - as these flags are explicitly only used on a single thread that locking is unnecessary overhead
3. Use a boolean[] as the flag holder. This seems a bit too ugly to me

Note that there is a similar issue to this one that might have helped me here (micrometer-metrics#3891) as most of the time this is happening for me on some supressed `logger.debug` calls. However this would still be an issue for every non-suppressed logger call that my app made, without this fix.
@pivotal-cla
Copy link

@spmason Please sign the Contributor License Agreement!

Click here to manually synchronize the status of this Pull Request.

See the FAQ for frequently asked questions.

@pivotal-cla
Copy link

@spmason Thank you for signing the Contributor License Agreement!

@shakuzen shakuzen added performance Issues related to general performance module: micrometer-core An issue that is related to our core module enhancement A general enhancement labels Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A general enhancement module: micrometer-core An issue that is related to our core module performance Issues related to general performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants