New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Map Performance #940
Comments
Was this measured against FSharp.Core 5.0.0 or 4.7.2? thanks! |
Hey, TBH I'm not entirely sure which version these benchmarks ran on, so I did a quick rerun on my notebook (different CPU, only for count=10) ensuring that
|
Hey, benchmarks are still running but I managed to optimize
Again these numbers are from my Notebook:
|
@krauthaufen possibly the BCL Array.Sort over two arrays (keys, values) would help you even more here https://docs.microsoft.com/en-us/dotnet/api/system.array.sort?view=net-5.0#System_Array_Sort_System_Array_System_Array_System_Int32_System_Int32_ |
Cool, I'll definetly investigate that. I'm also working on improving things for very small counts since the array seems to be slower for approx. |
Fingers crossed you persist and this gets in. I recall seeing PRs with significant collection optimizations that have gradually lost momentum and fizzled out. |
Update, the |
Hey @NinoFloris I tried using the BCL sort you posted, but it's actually a tiny bit slower. Maybe due to the fact that I need to maintain two arrays instead of one and lookups might be less cache-local. It's interesting to see, that the BCL code somehow manages to allocate less garbage but is nonetheless slower.
|
I just found that all of these sorts are non-stable which means that my logic for duplicates won't work (which my tests sadly missed). I'm currently thinking about ways to work around that problem. A quick test with Aardvark's TimSort shows that a stable sort might be a little slower, but nonetheless the speedup will be huge. |
That's huge. |
@krauthaufen I also wonder if a Concestor dictionary could replace the Map? apparently it was at least 4 times faster (a few years ago). Is that something worth considering? |
Hey, yep I've seen those and afaik they're included in FSharp.Core >= 5.0.0 right? The Concestor dictionary could be worth considering but my experience shows that in the end a simple balanced binary tree always outperforms other comparison-based datastructures. I'm definetly not against considering that, it's just a different kind of monster with other problems. I think that for such a low-level datastructure there is simply no super-elegant super-fast solution. Regarding the problem from above (duplicates in Current StateI think I'd like to take a little more time to improve that but I also think that the speedup is still huge (see table below) for Override problemMaybe someone has a better idea? any ideas welcome
The stable property is needed for correctly resolving conflicts The problem is that stable sorts (such as my simple mergesort) tend to be slower than e.g. quicksort. This can be seen here:
Things i tried so far:
Things that could be done to improve the stable-sorting a little:
|
Hey, I just updated the benchmarks. |
@krauthaufen have you tried to run: https://github.com/buybackoff/fsharp-benchmarks/blob/master/FSharpCoreOptim/Bench.fs I'd also like to test for small key/value (intint) vs. larger key/value (intsome_reftype), (int*large_valuetype) etc. wrt |
Hey, I will run the benchmarks on other key/value types soon, but the ones you mentioned seem to be "unfair" since they only test adding/removing sorted keys. Mine actually look quite similar but with randomized data. Obviously this creates a little jitter in the runtimes, but nonetheless I think it is "fairer" |
Fair point. I brought that up because it contains somewhat richer pattern (for all stuff in the map, access by key and then ignore etc.) Your trees are mostly balanced so it'll probably do a good job in those tests. |
hmm I don't understand, do you want to mimic the behavior of keeping the last entry of the same key? otherwise stability doesn't affect dedup. another idea is to allocate the temp array of actual map nodes so you don't have to create the nodes again after the sort -- not sure if it's a good idea because of another level of indirection, though. (memory efficiency ++, speed --) |
I'd like to also mention the runtime profiles -- I'm running the benchmark now and it looks like it's only testing the workstation concurrent GC scenario. We should perhaps also test with the config option |
Here's the workstation GC result for count = 100 on my machine:
Looks like we've got similar specs :) |
Nice to see that I was not hallucinating my benchmark results 😋
Precisely. That's why stable sorting is crucial here and my simple
Not sure if this can help, I'd need to allocate leaves for everything and then combine them together in inner-nodes, etc. During this process i need to allocate 2 auxillary |
The gcServer profile: BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.685 (2004/?/20H1)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.101
[Host] : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT DEBUG
Job-HPVMQI : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
Server=True
Conclusion: even more improvements with server profile! |
Sorting.fs,L221: while sortedLengthDbl <= src.Length do consider change to Sorting.fs,L224: if sortedLength < src.Length then
let cnt = mergeSeqHandleDuplicates cmp 0 sortedLength sortedLength src dst
struct(dst, cnt)
else |
This was actually intentional since it saves iterating the array once and saves a few percent. |
I'm currently focusing on getting ofList/ofSeq/ofArray faster for very small counts (benchmarks currently running). The code was a bit painful to write but I think it might be worth it. |
Hey, my optimizations for super-small counts seem to work decently. For counts in
|
Hmm if so why not force the condition |
Like, krauthaufen/MapNew#2 ? |
good idea, thanks. |
How do we maintain compatibility with the serialization libraries? |
Hey, I just did some Benchmarks with However the A possible explanation for this is that, given an expensive (boxing) comparison, the new implementation of The serialization thing is really a little tricky, since I have no idea how these serializations look internally? |
A quick check revealed that Json.Net does: let m = MapNew.ofArray [|1,2;3,4;5,6|]
let json = Json.Net.JsonNet.Serialize(m)
printfn "%s" json // => ["1":2,"3":4,"5":6] Maybe due to my |
Quite good then! It matches the results of the old map. |
Amazing work here! For serialization I believe the contract is a Is this feasible? [<System.NonSerialized>]
// This type is logically immutable. This field is only mutated during deserialization.
let mutable comparer = comparer
[<System.NonSerialized>]
// This type is logically immutable. This field is only mutated during deserialization.
let mutable tree = tree
// WARNING: The compiled name of this field may never be changed because it is part of the logical
// WARNING: permanent serialization format for this type.
let mutable serializedData = null
[<System.Runtime.Serialization.OnSerializingAttribute>]
member __.OnSerializing(context: System.Runtime.Serialization.StreamingContext) =
ignore context
serializedData <- MapTree.toArray tree |> Array.map (fun (k, v) -> KeyValuePair(k, v))
[<System.Runtime.Serialization.OnDeserializedAttribute>]
member __.OnDeserialized(context: System.Runtime.Serialization.StreamingContext) =
ignore context
comparer <- LanguagePrimitives.FastGenericComparer<'Key>
tree <- serializedData |> Array.map (fun kvp -> kvp.Key, kvp.Value) |> MapTree.ofArray comparer
serializedData <- null |
Hey, we could certainly do that. Btw I finally started creating a Fable app for visualizing benchmark results and it's just amazing how simple that is 😁 I'll add the serialization-stuff once I'm done with my visualization |
The optimizer is aware of this path and will try to de-virtualize the call (see See:
I think we can definitely do better than the most generic comparer in some cases (e.g. structs, non-null stuff) -- we can add them to the optimizer as separate comparison de-virtualization cases. Edit: for well-known C# types (Guid etc.) we can directly add them to the fast generic comparer table, see |
Hey, The virtual call is only one part of the problem here I think. here is a So for structs we effectively pay 2 boxings and 1 virtual call (and maybe a tuple allocation if the compiler doesn't optimize it) I have no idea how this is avoided in Btw the same goes for custom structs and But I think that's a different issue. |
If properly optimized it won't go into this path. The optimizer will rewrite the AST. |
Regarding the Map implementation:
The preliminary benchmark results (small iteration counts) can be seen here |
Fantastic visualization 😄 |
This is absolutely fantastic, thank you for all that work @krauthaufen! |
@krauthaufen maybe I can help with the optimizer if this suggestion is approved. We can then set up a branch and work with |
@krauthaufen If this pans out, it will be absurdly useful. Thanks for all of your hard work, this is exciting, and educational for all of us here on the thread! |
@krauthaufen let's take this forward into dotnet/fsharp as a PR to both map and set (or separate PRs). Note that the compiler also uses, effectively, a copy of these data structures internally where the same optimizations would likely have to be made. But it's not too bad. |
Hey, will do tomorrow. Cool that people are interested in this. I hope the set implementation will show similar speedups. Cheers |
Just curious - is it considered a breaking change to move away from an AVL tree? I remember some discussion in the past about this; I'm guessing it probably is but I remember people being receptive to changing that at the time. The reason I ask is that from my experiments and coding a custom map for work purposes you can get significantly more benefit from other data structures (e.g. HAMT). Unfortunately in my work I was forced to roll out my own since F# map was too slow; it would of been nice to have a fast friendly implementation out of the box. Performance aside a fast immutable HashMap implementation that doesn't need the 'comparison' constraint would of helped significantly in some previous projects where I didn't have control of the key type (e.g. C# sourced objects). The benefits can be quite substantial - I was seeing performance improvements often between 3 and 10x for small and large collections respectively. e.g from https://github.com/mvkara/fsharp-hashcollections with more detailed benchmarks for a simple ''tryFind' operation.
|
Switching the data structure under the covers isn't a breaking change, no. At least not one we really consider here. We're free to change internals at any point, and if someone is able to use reflection to depend on that then that's a risk we're assuming they are willingly taking. That said, the churn factor for something like that is quite high, and changes that aren't incremental are much more difficult to accept. |
Hey, after doing yet another map implementation (
When maintaining a count per inner-node (allowing for
However note that the If you're still interested in the (now smaller) improvements I can create a new PR (or adapt this one). Note that I have something in mind for keeping the overall count (not per inner node but globally per map) that should be quite efficient. |
@mvkara I think changing the
However I think adding a separate immutable |
Closing this out as there is significant progress made in dotnet/fsharp and discussions going on in there |
Improved Map Performance
I recently started a new
Map<'Key, 'Value>
implementation forFSharp.Data.Adaptive
that uses subtyping and virtual calls instead of discriminated unions and pattern matching for representing the tree.After a few hours it became clear that this implementation performs much better and so I deciced to start a discussion here.
So here's my repo containing the implementation, several tests and benchmarks. Note that I will likely add more combinators there and I don't expect all of it to be merged into FSharp.Core but I think F# shouldn't miss out on the potential speedup.
repo
Also note that the same technique should be applied to
Set
when merging this.I'd also like to mention that the abstract data-type is still the AVL-like tree as in FSharp.Core today. The only difference is really how operations are implemented.
I would be very happy to help with integrating this into FSharp.Core and I'm of course open to suggestions.
Pros and Cons
The advantages of making this adjustment to F# are significant performance improvements.
The disadvantages of making this adjustment to F# are the slightly less readable code. Although the original Map implementation isn't exactly readable.
Extra information
Estimated cost (XS, S, M, L, XL, XXL): L
Related suggestions: (put links to related suggestions here)
Affidavit (please submit!)
Please tick this by placing a cross in the box:
Please tick all that apply:
For Readers
If you would like to see this issue implemented, please click the 👍 emoji on this issue. These counts are used to generally order the suggestions by engagement.
The text was updated successfully, but these errors were encountered: