Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating AnalysisResult objects is expensive due to IdentityHashMap copies #5172

Closed
msridhar opened this issue Jun 21, 2022 · 4 comments · Fixed by #5178
Closed

Creating AnalysisResult objects is expensive due to IdentityHashMap copies #5172

msridhar opened this issue Jun 21, 2022 · 4 comments · Fixed by #5178
Labels

Comments

@msridhar
Copy link
Contributor

I've been looking into profiling NullAway's dataflow analysis more carefully with a micro-benchmark to find unnecessary overheads. I created the following synthetic benchmark (may be useful for Nullness Checker as well), and I profiled repeatedly running NullAway on the benchmark in a loop:

https://github.com/msridhar/NullAway/blob/manu-dflow-bench-test/nullaway/src/test/resources/com/uber/nullaway/testdata/DFlowBench.java

I observed that on this benchmark, nearly half the execution time is spent creating IdentityHashMap objects, as part of constructing the final AnalysisResult after dataflow analysis completes. There are several IdentityHashMap fields inside AnalysisResult. I see that the contents of the maps get mutated inside the AnalysisResult.combine() method, but not elsewhere that I can see (it's possible subclasses are allowed to mutate the contents). Depending on the intended invariants, I am wondering if we could rewrite this code to lazily create copies of IdentityHashMaps (like only when combining results) rather than doing it eagerly, maybe using Collections.unmodifiableMap() to catch any unintended mutation without copying entire maps. I'm happy to put up a PR if we can agree on an approach.

@mernst
Copy link
Member

mernst commented Jun 21, 2022

This sounds good to me. Thanks for the suggestion. To avoid making empty IdentityHashMaps, fields could be made nullable. For lazy copy-constructors, at use points there would need to be a boolean variable or a test against the run-time class to determine whether it was necessary to actually make the copy.

@msridhar
Copy link
Contributor Author

My plan for fixing this was as follows:

  • Change the IdentityHashMap fields of AnalysisResult (like this one) to be non-final and of type Map.
  • Rather than eagerly copying incoming maps, initially assign these fields those maps wrapped in Collections.unmodifiableMap(). Keep a boolean flag to track that these maps have not yet been copied.
  • In combine(), if the maps have not yet been copied, make copies at that point, and set the flag.

I think this works at a high level, but it's ugly to change the declared type of the fields to Map; we need to then remember that when copying, the new Map implementation needs to be an IdentityHashMap. In fact, in prototyping the above, I found it won't even compile since Error Prone warns on mixing Map and IdentityHashMap:

https://errorprone.info/bugpattern/IdentityHashMapUsage

I think what we may want here is a method unmodifiableIdentityHashMap() that returns a sub-class of IdentityHashMap where mutator methods throw an error, just like unmodifiableMap(). Otherwise, if someone introduces an unexpected mutation on one of these maps in the future, it may be very difficult to debug.

I was hoping to get feedback on this proposal, in case someone sees a simpler solution. Thanks!

@mernst
Copy link
Member

mernst commented Jun 25, 2022

This sounds good to me.

It sounds like you don't think the creation of the empty IdentityHashMaps is a problem. (I don't know whether it is. The code could use a singleton for unmodifiable ones.)

A quick search didn't turn up anyone else who had implemented something like unmodifiableIdentityHashMap().

@msridhar
Copy link
Contributor Author

This sounds good to me.

It sounds like you don't think the creation of the empty IdentityHashMaps is a problem. (I don't know whether it is. The code could use a singleton for unmodifiable ones.)

Yeah, in the profiles, the main issue was calls to putAll() which were doing a lot of copying. I don't think empty IdentityHashMaps are a performance issue, at least I haven't seen it.

A quick search didn't turn up anyone else who had implemented something like unmodifiableIdentityHashMap().

I didn't find anything either. The solution is still a bit ugly, as we will have to subclass IdentityHashMap and then ignore the inherited state and just delegate. But I think unmodifiable wrappers will be the least disruptive way to address the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants