Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid potential overhead of ThreadLocal.remove in LogbackMetrics #3952

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Commits on Jun 28, 2023

  1. Avoid potential overhead of ThreadLocal.remove in LogbackMetrics

    In one of our apps we're seeing that over time the size of the ThreadLocalMap underpinning the ThreadLocal class gets large (due to other libraries overusing/abusing ThreadLocals). When this happens `ThreadLocal.remove` can start to become a performance bottleneck as it does some `expungeStaleEntry` calls that iterate over the whole map.
    
    Because logger calls can be so frequent (even if they're suppressed this code is still called) the `TurboFilter` gets called _a lot_, and thus so does `ThreadLocal.remove`. Therefore Micrometer ends up slowing down all our code.
    
    I agree that technically this isn't Micrometer's fault, but the potential for it to happen in any application is still there and MM should really do what it can to avoid becoming a bottleneck.
    
    So this PR changes `LogbackMetrics` to remove the need for it to call `ThreadLocal.remove` and thus the potential for this bug. It does this by _not clearing the ThreadLocal after use_ and instead setting it to a 'holder' class that has its own boolean flag that gets flipped on/off inside `ignoreMetrics`.
    
    Running the LogbackBenchmarks this seems to be possibly faster than the regular code even when ThreadLocals aren't overloaded, so it seems like a decent improvement overall
    
    Of course the tradeoff is that now MM 'pollutes' all threads with an extraneous ThreadLocal that it never cleans up. However:
    
    1. If a thread doesn't log(?) then it won't get a ThreadLocal. If it does then the chances are it will log again (soon?) so the ThreadLocal will be needed again then anyway.
    2. It's not a big object, one boolean
    3. It should be cleaned up when the threads die
    
    In developing this fix I did try a couple of other solutions:
    1. Just call `ThreadLocal.set(false)` instead of `remove`. Unfortunately `set` does some similar cleanup to `remove` so the issue would still be there
    2. Use an `AtomicBoolean` instead of a custom class to hold the flag. This wasn't performant enough as `AtomicBoolean` does a load of locking etc as it's meant to be used across threads - as these flags are explicitly only used on a single thread that locking is unnecessary overhead
    3. Use a boolean[] as the flag holder. This seems a bit too ugly to me
    
    Note that there is a similar issue to this one that might have helped me here (micrometer-metrics#3891) as most of the time this is happening for me on some supressed `logger.debug` calls. However this would still be an issue for every non-suppressed logger call that my app made, without this fix.
    Steve Mason committed Jun 28, 2023
    Configuration menu
    Copy the full SHA
    fe9e944 View commit details
    Browse the repository at this point in the history