Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/2005 large composite disposable perf #2092

Merged
merged 4 commits into from Apr 29, 2024

Conversation

idg10
Copy link
Collaborator

@idg10 idg10 commented Mar 12, 2024

Addresses #2005

This change makes the CompositeDisposable far more efficient when it is dealing with large numbers (thousands) of IDisposable objects. This matters in "fan out/in" scenarios where items are split into large numbers of groups with GroupBy, and then later recombined (usually with some per-group processing having been applied) with either SelectMany or Merge, because these operators use CompositeDisposable to keep track of their subscriptions to upstream observables (the ones that emerge by GroupBy in the scenario being described here).

Fan out/in is an important technique in Rx.NET, as described at https://introtorx.com/chapters/transformation-of-sequences#the-significance-of-selectmany and https://introtorx.com/chapters/partitioning so we want this to work well even with large numbers of groups.

Before, CompositeDisposable used a List<IDisposable> internally, which meant that its Remove method had to perform a linear scan of the list to find the item to remove. This resulted in $O(N^2)$ performance (where $N$ is the number of groups being merged) when the source finally calls OnCompleted. If you had over 100,000 groups, this could end up taking seconds to complete!

With this change, the CompositeDisposable continues to use the same strategy as before for small and moderate sized lists because there's no observable performance problem with that, and it is likely to be more efficient for small collections than a strategy optimized for larger collections. And awful lot of CompositeDisposable instances end up with only a handful of items, so it's important that it continues to perform well in that scenario.

But with large numbers of disposables, it switches to a different strategy in which it uses a dictionary to track disposables.

We use a dictionary instead of the more obvious HashSet<IDisposable> because CompositeDisposable has always allowed you add the same disposable instance multiple times. Although it would not be a good idea to depend on this, we've never prevented it, and since this is a public type, some Rx users might depend on it. So the dictionary has an int value keeping track of how many times each particular disposable has been added. We expect this to be 1 in most cases in practice.

As the discussion at #2005 (comment) shows, we've added benchmarks that verify that this does address the performance problem described. We've also checked that this doesn't cause any significant change in performance for any other benchmarks. (That's important because CompositeDisposable is widely used within Rx.NET, so changes of this kind have the potential to have a broad negative impact. We wanted to make sure we hadn't made normal operation much slower just to improve this one particular and relatively unusual scenario.)

Once the number of elements in a CompositeDisposable gets into the thousands, Remove became very expensive because it performed a linear search. This change causes CompositeDisposable to switch into an alternate mode using a dictionary to enable hash-based lookup once it grows over a particular size.

With under 1024 elements, the behaviour is as before. The number of fields remains the same so this should not change the memory footprint of small CompositeDisposables. However, it does require it to check which of the two modes it is in, so there will likely be a small performance penalty for small CompositeDisposables, so we will need to see if this changes any benchmarks in a significant way before merging this change.
Not necessary with the dictionary. We did it for the list purely because we deliberately allowed the list to become peppered with nulls to avoid having to resize it on every call to Remove.
@idg10 idg10 marked this pull request as ready for review April 29, 2024 09:27
@idg10 idg10 merged commit 8da262d into main Apr 29, 2024
7 checks passed
@idg10 idg10 deleted the feature/2005-large-composite-disposable-perf branch April 29, 2024 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants