New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LeakCanary should surface distinct leak paths to the same instance as separate leaks #2502
Comments
Or did LeakCanary deprioritize the most relevant leak path because it misinterpreted the situation? In this case the path included a WeakHashMap, and the leaked value was inside one of the map values. Only the keys of a WeakHashMap are weak, not the values. Does LeakCanary deprioritize a specific path just because it goes through a WeakHashMap in some way? |
Could you share your specific example, what the leaky code looked like, what paths LeakCanary was showing you and which path turned out to be useful? LeakCanary shows a single path because, if an instance is held due to a single bug where we forgot to clear a field then you're generally guaranteed that all possible paths will go through that field. So we just pick one, and we pick the shortest to remove noise. Now, if there are multiple fields causing the leak, then you get only one path, fix it, find the leak again with another path, fix it, etc. LeakCanary ignores WeakReferences because WeakReferences do not prevent garbage collection, they don't cause leaks, unless you constantly call |
The leaky code was in androidx.core.hardware.display.DisplayManagerCompat: https://github.com/aosp-mirror/platform_frameworks_support/blob/a9ac247af2afd4115c3eb6d16c05bc92737d6305/compat/src/main/java/androidx/core/hardware/display/DisplayManagerCompat.java#L34-L35 The problem here was that the values of a WeakHashMap transitively referenced the keys (android.content.Context), causing the Contexts to leak. I was using DisplayManagerCompat from a MediaBrowserServiceCompat instance, causing the Service/Context to leak. However, the actual leak only appeared after excluding two other patterns that were not real leaks. Once I threw out DisplayManagerCompat there was no more leak. I created an inaccurate bug report against MediaBrowserServiceCompat years ago because of this. It was ignored, possibly because of the invalid leak trace. I was only aware of the first trace below at the time. Here are the LeakCanary logs, in order of successive pattern exclusions: Pseudo-leak no. 1:
After excluding this pattern I saw pseudo-leak no. 2:
After excluding this pattern I saw the actual leak:
After excluding this pattern, I saw yet another pseudo-leak:
|
Oh, and actually, those pseudo-leaks did not actually cause delayed finalization, finalization behavior was normal after addressing the real leak. I don't know what caused the pseudo-leaks to show up, maybe some special ART machinery for Services/Binders/Messengers/Handlers? Or whatever strange behavior might be tied to the "global variable in native code" instances. |
Thanks, that's super helpful. Let me reason through this, I'll try to restate what you wrote, let me know if I got this wrong. Here's the reasoning:
This is actually a common leak pattern in the Android framework: binders are held until the other side runs garbage collection and finalizes the binder on its end, no matter what we're doing with service lifecycle (that's because binders are used beyond services, the only way to be certain the other side won't call is when it doesn't have that pointer). The proper way to handler this is for the service code to finish its handler in onDestroy, or set the reference from the handler to the service to null in onDestroy. The first and second leak traces are identical, the problem is that MediaBrowserServiceCompat$ServiceHandler has several direct and indirect references to the service. The last leak is the exact same issue elsewhere, ExtraSession is a stub that won't be GCed until the other side has run its own GC https://github.com/aosp-mirror/platform_frameworks_support/blob/a9ac247af2afd4115c3eb6d16c05bc92737d6305/media/src/main/java/android/support/v4/media/session/MediaSessionCompat.java#L3413 So these 2 leaks are binder related leaks, and the leaks stay in place until a GC runs in the calling process, and you have no control over that. They really should be fixed. SummaryYou actually found 3 leaks in the framework code, 2 of which are transient and depend on GCs in other processes. You should file issues with Google. Based on that, the problem statement is slightly different: you don't actually want to see all leak paths as there as a lot more than just the 5 you found (you only found 5 by excluding specific references). What you really need is to see all distinct leak paths, where "distinct" only applies to a unique sub path that's made of suspect references. In other words, ideally LeakCanary would have presented you with exactly 3 distinct leaks, since there are 3 logic bugs, even though a single service instance was leaking in the end. Thinking about multi leaksIf there are N logic errors causing N bad references but all N end up leaking the same instance, we ideally should see N leaks. The core problem is we don't know which references in a path are bad. We already have logic to figure which references are definitely not bad, from which we deduce the maybe bad. We use this once we've found a path, not as we traverse the graph. So this would require a new decision point: currently, when we find an object that we've already visited, we ignore it, as we know we've already computed a better / shortest path to it. Here we'd need to approach this slightly differently and consider whether we just found a distinct / valuable new path to that object, or not. Keeping in my that "object" here isn't just instances we're tracking as leaking but really any object in the heap. Or maybe any object marked as leaking. Which requires evaluating whether objects are leaking during graph traversal, which could be a bit more expensive, even with some caching. => if I run into an object that I've already ran into, and that object is known as "leaking", then I compare my current path with the last path, but only the "unknown" parts of it (ignoring the "non leaking parts at the root). If the 2 paths are entirely distinct then we know there are distinct leaks and the new path should be stored as one of the alternative leaky paths. This isn't all good yet because we don't actually keep track of the leaky paths while they're building, they'll all just stored in the priority queue. So you can't easily store this, but you now sorta need to keep track of the wonky objects that have multiple possible leaky paths. ConclusionThis would likely require deep changes to the path finding algorithms, but it could yield a significant improvement in that LeakCanary could now surface all actual leak paths. |
Thank you for the elaboration. Sorry for my delayed response, I got swamped. Your restated reasoning looks sound. I did create an issue for the DisplayManagerCompat problem a month ago; I assume the cache is supposed to offer a performance boost: https://issuetracker.google.com/issues/279625765 I want to create issues for the other leaks, too, but I will need to look at your analysis more closely. I'd say the issues that I file against the framework are commonly ignored and closed, so I usually regret spending the time. I do think it's worth investing in the more advanced leak analysis that you are proposing: I only found the DisplayManagerCompat problem 6 years after I was first affected by it, and really only because I decided to allocate a boatload of time on leak analysis in this app; and in the end I still only found it because I got lucky with somewhat randomly going down the road of excluding leak paths. |
I've added your analysis to the Google issue I already had open for this, fingers crossed for a fix: https://issuetracker.google.com/issues/37137738 |
ooo looks like there's traction that's great! |
@pyricau any bandwidth for chiming in on Google's fix? |
I happened into this thread and was wondering how does one "exclude a pattern"? Im having a similar issue and suspect maybe the leak report is choosing the wrong leak reference path see thread androidx/media#1059 (comment) |
Problem description
I was tracking down an Android Service leak, but the leak trace that LeakCanary gave me was not particularly helpful. It was only when I finally added my own instanceFieldLeak descriptors that I found the actual leak. (And I had to exclude not just one but two other paths before LeakCanary finally turned up the meaningful one.). There was a total of 4 distinct reference paths that I had to turn up one by one by adding instanceFieldLeak descriptors one at a time. 3 of them did not matter quite so much, as they simply caused delayed finalization. However, at heap dump time, all 4 were still present, and LeakCanary somewhat arbitrarily picked one that was not helpful.
Potential solutions
Add an option to display all leak reference paths, at the cost of log clutter.
The text was updated successfully, but these errors were encountered: