Creating `AnalysisResult` objects is expensive due to `IdentityHashMap` copies #5172

msridhar · 2022-06-21T18:56:37Z

I've been looking into profiling NullAway's dataflow analysis more carefully with a micro-benchmark to find unnecessary overheads. I created the following synthetic benchmark (may be useful for Nullness Checker as well), and I profiled repeatedly running NullAway on the benchmark in a loop:

https://github.com/msridhar/NullAway/blob/manu-dflow-bench-test/nullaway/src/test/resources/com/uber/nullaway/testdata/DFlowBench.java

I observed that on this benchmark, nearly half the execution time is spent creating IdentityHashMap objects, as part of constructing the final AnalysisResult after dataflow analysis completes. There are several IdentityHashMap fields inside AnalysisResult. I see that the contents of the maps get mutated inside the AnalysisResult.combine() method, but not elsewhere that I can see (it's possible subclasses are allowed to mutate the contents). Depending on the intended invariants, I am wondering if we could rewrite this code to lazily create copies of IdentityHashMaps (like only when combining results) rather than doing it eagerly, maybe using Collections.unmodifiableMap() to catch any unintended mutation without copying entire maps. I'm happy to put up a PR if we can agree on an approach.

The text was updated successfully, but these errors were encountered:

mernst · 2022-06-21T22:42:42Z

This sounds good to me. Thanks for the suggestion. To avoid making empty IdentityHashMaps, fields could be made nullable. For lazy copy-constructors, at use points there would need to be a boolean variable or a test against the run-time class to determine whether it was necessary to actually make the copy.

msridhar · 2022-06-24T23:20:22Z

My plan for fixing this was as follows:

Change the IdentityHashMap fields of AnalysisResult (like this one) to be non-final and of type Map.
Rather than eagerly copying incoming maps, initially assign these fields those maps wrapped in Collections.unmodifiableMap(). Keep a boolean flag to track that these maps have not yet been copied.
In combine(), if the maps have not yet been copied, make copies at that point, and set the flag.

I think this works at a high level, but it's ugly to change the declared type of the fields to Map; we need to then remember that when copying, the new Map implementation needs to be an IdentityHashMap. In fact, in prototyping the above, I found it won't even compile since Error Prone warns on mixing Map and IdentityHashMap:

https://errorprone.info/bugpattern/IdentityHashMapUsage

I think what we may want here is a method unmodifiableIdentityHashMap() that returns a sub-class of IdentityHashMap where mutator methods throw an error, just like unmodifiableMap(). Otherwise, if someone introduces an unexpected mutation on one of these maps in the future, it may be very difficult to debug.

I was hoping to get feedback on this proposal, in case someone sees a simpler solution. Thanks!

mernst · 2022-06-25T03:05:48Z

This sounds good to me.

It sounds like you don't think the creation of the empty IdentityHashMaps is a problem. (I don't know whether it is. The code could use a singleton for unmodifiable ones.)

A quick search didn't turn up anyone else who had implemented something like unmodifiableIdentityHashMap().

msridhar · 2022-06-25T16:34:44Z

This sounds good to me.

It sounds like you don't think the creation of the empty IdentityHashMaps is a problem. (I don't know whether it is. The code could use a singleton for unmodifiable ones.)

Yeah, in the profiles, the main issue was calls to putAll() which were doing a lot of copying. I don't think empty IdentityHashMaps are a performance issue, at least I haven't seen it.

A quick search didn't turn up anyone else who had implemented something like unmodifiableIdentityHashMap().

I didn't find anything either. The solution is still a bit ugly, as we will have to subclass IdentityHashMap and then ignore the inherited state and just delegate. But I think unmodifiable wrappers will be the least disruptive way to address the problem.

msridhar added the Dataflow label Jun 21, 2022

msridhar mentioned this issue Jun 27, 2022

Copy maps lazily in AnalysisResult #5178

Merged

mernst closed this as completed in #5178 Jun 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating `AnalysisResult` objects is expensive due to `IdentityHashMap` copies #5172

Creating `AnalysisResult` objects is expensive due to `IdentityHashMap` copies #5172

msridhar commented Jun 21, 2022

mernst commented Jun 21, 2022

msridhar commented Jun 24, 2022

mernst commented Jun 25, 2022

msridhar commented Jun 25, 2022

Creating AnalysisResult objects is expensive due to IdentityHashMap copies #5172

Creating AnalysisResult objects is expensive due to IdentityHashMap copies #5172

Comments

msridhar commented Jun 21, 2022

mernst commented Jun 21, 2022

msridhar commented Jun 24, 2022

mernst commented Jun 25, 2022

msridhar commented Jun 25, 2022

Creating `AnalysisResult` objects is expensive due to `IdentityHashMap` copies #5172

Creating `AnalysisResult` objects is expensive due to `IdentityHashMap` copies #5172