Improved Map Performance #940

krauthaufen · 2020-12-10T15:59:58Z

Improved Map Performance

I recently started a new Map<'Key, 'Value> implementation for FSharp.Data.Adaptive that uses subtyping and virtual calls instead of discriminated unions and pattern matching for representing the tree.

After a few hours it became clear that this implementation performs much better and so I deciced to start a discussion here.

So here's my repo containing the implementation, several tests and benchmarks. Note that I will likely add more combinators there and I don't expect all of it to be merged into FSharp.Core but I think F# shouldn't miss out on the potential speedup.

repo

Also note that the same technique should be applied to Set when merging this.

I'd also like to mention that the abstract data-type is still the AVL-like tree as in FSharp.Core today. The only difference is really how operations are implemented.

I would be very happy to help with integrating this into FSharp.Core and I'm of course open to suggestions.

Pros and Cons

The advantages of making this adjustment to F# are significant performance improvements.

The disadvantages of making this adjustment to F# are the slightly less readable code. Although the original Map implementation isn't exactly readable.

Extra information

Estimated cost (XS, S, M, L, XL, XXL): L

Related suggestions: (put links to related suggestions here)

Affidavit (please submit!)

Please tick this by placing a cross in the box:

This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
I have searched both open and closed suggestions on this site and believe this is not a duplicate
This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.

Please tick all that apply:

This is not a breaking change to the F# language design
I or my company would be willing to help implement and/or test this

For Readers

If you would like to see this issue implemented, please click the 👍 emoji on this issue. These counts are used to generally order the suggestions by engagement.

The text was updated successfully, but these errors were encountered:

dsyme · 2020-12-11T01:01:34Z

Was this measured against FSharp.Core 5.0.0 or 4.7.2?

thanks!

krauthaufen · 2020-12-11T08:20:31Z

Hey,

TBH I'm not entirely sure which version these benchmarks ran on, so I did a quick rerun on my notebook (different CPU, only for count=10) ensuring that FSharp.Core >= 5.0.0 is used and the general behaviour seems to be similar. (running the benchmark with all counts on the other machine right now)

Method	Count	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Map_ofArray	10	765.045 ns	15.2483 ns	22.3507 ns	0.1974	-	-	1240 B
MapNew_ofArray	10	645.144 ns	11.9210 ns	16.7116 ns	0.1574	-	-	992 B
Map_toArray	10	194.027 ns	3.8896 ns	4.4792 ns	0.1056	0.0002	-	664 B
MapNew_toArray	10	76.341 ns	0.5347 ns	0.4465 ns	0.0548	0.0001	-	344 B
Map_enumerate	10	298.305 ns	3.7254 ns	3.4848 ns	0.1144	-	-	720 B
MapNew_enumerate	10	189.360 ns	2.4233 ns	2.2668 ns	0.0637	-	-	400 B
Map_containsKey_all	10	214.089 ns	2.5381 ns	2.3742 ns	-	-	-	-
MapNew_containsKey_all	10	106.402 ns	0.9764 ns	0.9134 ns	-	-	-	-
Map_containsKey_nonexisting	10	26.962 ns	0.5622 ns	0.5259 ns	-	-	-	-
MapNew_containsKey_nonexisting	10	7.740 ns	0.0657 ns	0.0614 ns	-	-	-	-
Map_remove_all	10	697.105 ns	10.8650 ns	9.0727 ns	0.2060	-	-	1296 B
MapNew_remove_all	10	486.468 ns	8.3427 ns	14.1665 ns	0.1612	-	-	1016 B

krauthaufen · 2020-12-11T12:00:17Z

Hey, benchmarks are still running but I managed to optimize ofArray/ofSeq/ofList dramatically (up to 5x) with the following procedure

copy data to a struct('Key * 'Value)[]
Array.sortInPlace by key
handle duplicates with correct override semantics
build the tree

Again these numbers are from my Notebook:

Method	Count	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Map_ofArray	1	14.59 ns	0.075 ns	0.062 ns	0.0102	-	-	64 B
MapNew_ofArray	1	10.95 ns	0.068 ns	0.063 ns	0.0089	-	-	56 B
MapNew_ofArray_optimized	1	10.37 ns	0.069 ns	0.065 ns	0.0089	-	-	56 B
Map_ofArray	2	26.58 ns	0.154 ns	0.129 ns	0.0178	-	-	112 B
MapNew_ofArray	2	32.52 ns	0.259 ns	0.229 ns	0.0166	-	-	104 B
MapNew_ofArray_optimized	2	78.19 ns	0.430 ns	0.402 ns	0.0318	-	-	200 B
Map_ofArray	10	750.98 ns	3.208 ns	3.000 ns	0.2050	-	-	1288 B
MapNew_ofArray	10	578.95 ns	3.925 ns	3.479 ns	0.1421	-	-	896 B
MapNew_ofArray_optimized	10	298.51 ns	1.078 ns	0.900 ns	0.0968	-	-	608 B
Map_ofArray	100	21,386.32 ns	74.178 ns	69.387 ns	4.4250	0.1221	-	27856 B
MapNew_ofArray	100	11,883.74 ns	78.727 ns	73.641 ns	1.6479	0.0458	-	10424 B
MapNew_ofArray_optimized	100	2,757.47 ns	8.749 ns	8.183 ns	0.8278	0.0229	-	5216 B
Map_ofArray	1000	407,528.56 ns	6,147.472 ns	5,449.573 ns	70.8008	16.1133	-	445456 B
MapNew_ofArray	1000	227,403.18 ns	973.302 ns	812.751 ns	22.4609	4.8828	-	142232 B
MapNew_ofArray_optimized	1000	41,938.29 ns	223.936 ns	209.470 ns	7.6904	1.5259	-	48368 B

NinoFloris · 2020-12-11T12:17:11Z

@krauthaufen possibly the BCL Array.Sort over two arrays (keys, values) would help you even more here https://docs.microsoft.com/en-us/dotnet/api/system.array.sort?view=net-5.0#System_Array_Sort_System_Array_System_Array_System_Int32_System_Int32_

krauthaufen · 2020-12-11T12:21:07Z

Cool, I'll definetly investigate that. I'm also working on improving things for very small counts since the array seems to be slower for approx. count < 5. I'll be busy with other things over the weekend but I'll continue there next week.

kerams · 2020-12-11T12:25:37Z

Fingers crossed you persist and this gets in. I recall seeing PRs with significant collection optimizations that have gradually lost momentum and fizzled out.

krauthaufen · 2020-12-11T13:17:25Z

Update, the ofArray benchmarks executed on a real machine (notebooks tend to get hot and therefore slower)
https://github.com/krauthaufen/MapNew/wiki/Optimized-ofArray

krauthaufen · 2020-12-12T09:58:37Z

Hey @NinoFloris I tried using the BCL sort you posted, but it's actually a tiny bit slower. Maybe due to the fact that I need to maintain two arrays instead of one and lookups might be less cache-local.

It's interesting to see, that the BCL code somehow manages to allocate less garbage but is nonetheless slower.
I will try using the BCL sort on struct('Key * 'Value) next.

Method	Count	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Map_ofArray	100	21.772 us	0.1405 us	0.1246 us	4.5776	0.1221	-	28.14 KB
MapNew_ofArray	100	12.866 us	0.0318 us	0.0248 us	2.0294	0.0610	-	12.52 KB
MapNew_ofArray_optimized	100	2.765 us	0.0125 us	0.0117 us	0.8278	0.0229	-	5.09 KB
MapNew_ofArray_bcl	100	3.720 us	0.0174 us	0.0163 us	0.7629	0.0229	-	4.68 KB
Map_ofArray	1000	382.301 us	2.8486 us	2.6645 us	70.8008	16.6016	-	436.05 KB
MapNew_ofArray	1000	232.351 us	0.9637 us	0.8047 us	24.6582	5.8594	-	151.3 KB
MapNew_ofArray_optimized	1000	42.696 us	0.4129 us	0.3862 us	7.6904	1.5259	-	47.23 KB
MapNew_ofArray_bcl	1000	63.367 us	0.5040 us	0.4714 us	6.9580	1.2207	-	43.3 KB

krauthaufen · 2020-12-12T11:48:06Z

I just found that all of these sorts are non-stable which means that my logic for duplicates won't work (which my tests sadly missed).

I'm currently thinking about ways to work around that problem.

A quick test with Aardvark's TimSort shows that a stable sort might be a little slower, but nonetheless the speedup will be huge.

yatli · 2020-12-12T18:27:18Z

Map_ofArray 10000 9,227,999.61 ns 183,278.495 ns 180,003.989 ns 968.7500 265.6250 62.5000 6088782 B

MapNew_ofArray 10000 5,473,464.24 ns 105,595.415 ns 112,985.911 ns 296.8750 148.4375 - 1899776 B

MapNew_ofArray_optimized 10000 888,288.21 ns 7,726.569 ns 6,452.035 ns 79.1016 39.0625 - 501801 B

That's huge.

goswinr · 2020-12-12T19:27:01Z

@krauthaufen
I guess you have seen the recent changes to Map.

I also wonder if a Concestor dictionary could replace the Map? apparently it was at least 4 times faster (a few years ago). Is that something worth considering?

krauthaufen · 2020-12-12T22:06:32Z

Hey, yep I've seen those and afaik they're included in FSharp.Core >= 5.0.0 right?

The Concestor dictionary could be worth considering but my experience shows that in the end a simple balanced binary tree always outperforms other comparison-based datastructures. I'm definetly not against considering that, it's just a different kind of monster with other problems. I think that for such a low-level datastructure there is simply no super-elegant super-fast solution.

Regarding the problem from above (duplicates in ofArray, etc.) i implemented a workaround using a stable mergeSort implementation.

Current State

I think I'd like to take a little more time to improve that but I also think that the speedup is still huge (see table below) for ofArray, ofSeq, ofList and all the other combinators (see benchmarks) also significantly outperform the current implementation.

Override problem

Maybe someone has a better idea? any ideas welcome

ofArray currently works as follows:

The input is an array of 'Key * 'Value tuples and it is ensured (via copy, etc) that my sorting may mutate the array-contents
sort the things by their key (in a stable way)
"compact" identical keys via iterating over the array again (and maintain a new count)
visit the array in a binary-search-like way to create the nodes.

The stable property is needed for correctly resolving conflicts [|(0,1); (0,3)|] should produce a map holding (0,3)

The problem is that stable sorts (such as my simple mergesort) tend to be slower than e.g. quicksort. This can be seen here:

Method	Count	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Map_ofArray	10	797.8 ns	15.18 ns	14.20 ns	0.2165	-	-	1360 B
MapNew_ofArray_optimized	10	514.7 ns	10.36 ns	13.83 ns	0.0992	-	-	624 B
MapNew_ofArray_WRONG	10	429.1 ns	8.00 ns	7.10 ns	0.0968	-	-	608 B
Map_ofArray	100	21,382.7 ns	129.49 ns	114.79 ns	4.4250	0.1221	-	27856 B
MapNew_ofArray_optimized	100	6,307.1 ns	105.20 ns	93.25 ns	0.8850	0.0229	-	5592 B
MapNew_ofArray_WRONG	100	3,859.6 ns	73.73 ns	81.95 ns	0.8240	0.0229	-	5216 B
Map_ofArray	1000	399,811.7 ns	5,783.82 ns	5,410.19 ns	70.8008	16.1133	-	445144 B
MapNew_ofArray_optimized	1000	102,484.6 ns	1,988.17 ns	1,762.46 ns	8.3008	1.5869	-	52344 B
MapNew_ofArray_WRONG	1000	59,636.1 ns	824.82 ns	731.18 ns	7.6904	1.4648	-	48368 B

Things i tried so far:

using Array.sort with struct('Key * int) (the int being the source index) ended up to be quite slow
tweaking a quicksort to remove duplicates inline (didn't manage to get my head around that)

Things that could be done to improve the stable-sorting a little:

inline the duplicate-removal in the last merge step (saves iterating the array once)
optimize the mergeSort
conisder uising something more sophisticated (e.g. TimSort) -> hard to implement

krauthaufen · 2020-12-13T22:07:53Z

Hey, I just updated the benchmarks.
Everything got a little faster again, so almost all combinators are now 1.2x - 3x faster than the original map.
ofArray for 10k elements is even 5 times faster.

yatli · 2020-12-14T04:44:34Z

@krauthaufen have you tried to run: https://github.com/buybackoff/fsharp-benchmarks/blob/master/FSharpCoreOptim/Bench.fs

I'd also like to test for small key/value (intint) vs. larger key/value (intsome_reftype), (int*large_valuetype) etc.

wrt ofArray: one idea is to create map nodes during a partitioned sort so you don't have to go over the result again later. the result tree can be balanced fairly easily.

krauthaufen · 2020-12-14T06:58:32Z

Hey, I will run the benchmarks on other key/value types soon, but the ones you mentioned seem to be "unfair" since they only test adding/removing sorted keys. Mine actually look quite similar but with randomized data. Obviously this creates a little jitter in the runtimes, but nonetheless I think it is "fairer"

yatli · 2020-12-14T07:18:16Z

Fair point. I brought that up because it contains somewhat richer pattern (for all stuff in the map, access by key and then ignore etc.)

Your trees are mostly balanced so it'll probably do a good job in those tests.

yatli · 2020-12-14T07:23:13Z

I just found that all of these sorts are non-stable which means that my logic for duplicates won't work (which my tests sadly missed).

hmm I don't understand, do you want to mimic the behavior of keeping the last entry of the same key? otherwise stability doesn't affect dedup.

another idea is to allocate the temp array of actual map nodes so you don't have to create the nodes again after the sort -- not sure if it's a good idea because of another level of indirection, though. (memory efficiency ++, speed --)

yatli · 2020-12-14T07:37:47Z

I'd like to also mention the runtime profiles -- I'm running the benchmark now and it looks like it's only testing the workstation concurrent GC scenario. We should perhaps also test with the config option <gcServer>true</gcServer>

yatli · 2020-12-14T07:40:01Z

Here's the workstation GC result for count = 100 on my machine:

// * Summary *
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.685 (2004/?/20H1)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.101
  [Host]     : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT DEBUG
  DefaultJob : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

Method	Count	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Map_ofArray	100	21,540.72 ns	130.719 ns	109.156 ns	6.5918	-	-	27616 B
MapNew_ofArray	100	6,030.92 ns	72.043 ns	60.159 ns	1.3351	-	-	5592 B
Map_ofList	100	22,660.64 ns	441.946 ns	526.105 ns	6.6223	-	-	27712 B
MapNew_ofList	100	6,572.73 ns	71.084 ns	59.358 ns	1.8158	-	-	7608 B
Map_ofSeq	100	23,373.67 ns	133.495 ns	111.475 ns	6.8665	-	-	28744 B
MapNew_ofSeq	100	7,197.75 ns	95.528 ns	84.683 ns	1.8234	-	-	7648 B
Map_toArray	100	1,821.67 ns	21.711 ns	20.308 ns	1.5354	-	-	6424 B
MapNew_toArray	100	784.71 ns	6.124 ns	5.429 ns	0.7763	-	-	3248 B
Map_toList	100	1,445.31 ns	24.479 ns	25.138 ns	1.3390	-	-	5600 B
MapNew_toList	100	1,031.56 ns	5.313 ns	4.710 ns	1.3390	-	-	5600 B
Map_enumerate	100	3,570.17 ns	21.298 ns	19.922 ns	1.8616	-	-	7800 B
MapNew_enumerate	100	1,764.33 ns	33.628 ns	35.981 ns	0.9556	-	-	4000 B
Map_toSeq_enum	100	5,274.36 ns	10.658 ns	8.900 ns	2.3270	-	-	9752 B
MapNew_toSeq_enum	100	4,716.17 ns	81.952 ns	100.644 ns	1.5717	-	-	6592 B
Map_containsKey_all	100	5,530.57 ns	93.365 ns	99.900 ns	-	-	-	-
MapNew_containsKey_all	100	2,424.88 ns	22.708 ns	20.130 ns	-	-	-	-
Map_containsKey_nonexisting	100	53.51 ns	0.570 ns	0.445 ns	-	-	-	-
MapNew_containsKey_nonexisting	100	24.84 ns	0.525 ns	0.491 ns	-	-	-	-
Map_tryFind	100	50.59 ns	0.159 ns	0.132 ns	0.0057	-	-	24 B
MapNew_tryFind	100	27.06 ns	0.065 ns	0.060 ns	0.0057	-	-	24 B
Map_tryFind_nonexisting	100	41.01 ns	0.745 ns	0.697 ns	-	-	-	-
MapNew_tryFind_nonexisting	100	22.09 ns	0.131 ns	0.116 ns	-	-	-	-
Map_remove_all	100	18,913.17 ns	72.131 ns	63.942 ns	6.4697	-	-	27104 B
MapNew_remove_all	100	15,602.63 ns	219.163 ns	183.011 ns	5.6763	-	-	23816 B
Map_exists	100	704.56 ns	9.028 ns	8.003 ns	0.0057	-	-	24 B
MapNew_exists	100	353.35 ns	2.403 ns	2.248 ns	0.0057	-	-	24 B
Map_fold	100	639.10 ns	2.954 ns	2.467 ns	0.0057	-	-	24 B
MapNew_fold	100	393.61 ns	7.322 ns	6.849 ns	0.0057	-	-	24 B
Map_foldBack	100	637.56 ns	6.142 ns	5.445 ns	0.0057	-	-	24 B
MapNew_foldBack	100	383.18 ns	2.343 ns	1.957 ns	0.0057	-	-	24 B

Looks like we've got similar specs :)

krauthaufen · 2020-12-14T07:51:55Z

Nice to see that I was not hallucinating my benchmark results 😋

hmm I don't understand, do you want to mimic the behavior of keeping the last entry of the same key?

Precisely. That's why stable sorting is crucial here and my simple mergeSort was (so far) faster than anything else i tried.

another idea is to allocate the temp array of actual map nodes so you don't have to create the nodes again after the sort -- not sure if it's a good idea because of another level of indirection, though. (memory efficiency ++, speed --)

Not sure if this can help, I'd need to allocate leaves for everything and then combine them together in inner-nodes, etc.
The current implementation just sorts the tuples (without allocating new ones) and then allocates precisely the output-nodes.

During this process i need to allocate 2 auxillary Key*Value arrays of length n for the mergeSort, but I think that's reasonable and they get unreachable directly after the tree-build. Of course 0/1 aux array would be better but then we would need an in-place stable sort (like TimSort) that is equally fast.

yatli · 2020-12-14T08:16:37Z

krauthaufen/MapNew#1

yatli · 2020-12-14T08:25:57Z

The gcServer profile:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.685 (2004/?/20H1)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.101
  [Host]     : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT DEBUG
  Job-HPVMQI : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

Server=True

Method	Count	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
Map_ofArray	100	21,309.23 ns	92.137 ns	86.185 ns	1.00	0.00	0.7324	-	-	27400 B
MapNew_ofArray	100	5,874.02 ns	113.713 ns	166.680 ns	0.28	0.01	0.1373	-	-	5592 B

Map_ofList	100	23,646.63 ns	396.236 ns	486.613 ns	1.00	0.00	0.7019	-	-	28768 B
MapNew_ofList	100	6,447.32 ns	97.939 ns	91.612 ns	0.27	0.01	0.1907	-	-	7608 B

Map_ofSeq	100	23,871.15 ns	386.093 ns	396.489 ns	1.00	0.00	0.7019	-	-	28552 B
MapNew_ofSeq	100	6,790.94 ns	46.358 ns	36.193 ns	0.28	0.01	0.1907	-	-	7648 B

Map_toArray	100	1,862.72 ns	31.103 ns	25.973 ns	?	?	0.1659	-	-	6424 B
MapNew_toArray	100	850.00 ns	16.036 ns	15.750 ns	?	?	0.0849	-	-	3248 B

Map_toList	100	1,486.52 ns	3.886 ns	3.245 ns	?	?	0.1411	-	-	5600 B
MapNew_toList	100	1,138.65 ns	4.410 ns	3.683 ns	?	?	0.1431	-	-	5600 B

Map_enumerate	100	3,757.96 ns	74.561 ns	111.599 ns	1.00	0.00	0.1907	-	-	7680 B
MapNew_enumerate	100	1,625.09 ns	6.558 ns	6.135 ns	0.42	0.01	0.1030	-	-	4000 B

Map_toSeq_enum	100	5,496.65 ns	72.550 ns	67.863 ns	1.00	0.00	0.2594	-	-	10112 B
MapNew_toSeq_enum	100	4,813.82 ns	95.631 ns	146.039 ns	0.89	0.02	0.1602	-	-	6592 B

Map_containsKey_all	100	5,308.16 ns	28.346 ns	26.515 ns	1.00	0.00	-	-	-	-
MapNew_containsKey_all	100	2,206.53 ns	27.926 ns	21.803 ns	0.42	0.00	-	-	-	-

Map_containsKey_nonexisting	100	53.82 ns	0.841 ns	0.826 ns	1.00	0.00	-	-	-	-
MapNew_containsKey_nonexisting	100	22.35 ns	0.120 ns	0.101 ns	0.41	0.01	-	-	-	-

Map_tryFind	100	54.65 ns	0.321 ns	0.268 ns	1.00	0.00	0.0006	-	-	24 B
MapNew_tryFind	100	21.80 ns	0.478 ns	0.424 ns	0.40	0.01	0.0006	-	-	24 B

Map_tryFind_nonexisting	100	55.36 ns	0.601 ns	0.562 ns	1.00	0.00	-	-	-	-
MapNew_tryFind_nonexisting	100	21.59 ns	0.407 ns	0.381 ns	0.39	0.01	-	-	-	-

Map_remove_all	100	20,126.44 ns	150.161 ns	133.114 ns	1.00	0.00	0.7019	-	-	28136 B
MapNew_remove_all	100	15,826.98 ns	294.046 ns	275.051 ns	0.79	0.01	0.6104	-	-	23288 B

Map_exists	100	691.36 ns	3.394 ns	3.175 ns	1.00	0.00	-	-	-	24 B
MapNew_exists	100	367.96 ns	5.424 ns	4.808 ns	0.53	0.01	0.0005	-	-	24 B

Map_fold	100	610.81 ns	3.087 ns	2.736 ns	1.00	0.00	-	-	-	24 B
MapNew_fold	100	386.97 ns	0.978 ns	0.867 ns	0.63	0.00	0.0005	-	-	24 B

Map_foldBack	100	594.96 ns	2.724 ns	2.548 ns	1.00	0.00	-	-	-	24 B
MapNew_foldBack	100	383.56 ns	7.311 ns	6.839 ns	0.64	0.01	0.0005	-	-	24 B

Conclusion: even more improvements with server profile!

yatli · 2020-12-14T08:57:42Z

Sorting.fs,L221:

        while sortedLengthDbl <= src.Length do

consider change to while sortedLength < src.Length? In doing so we can remove:

Sorting.fs,L224:

        if sortedLength < src.Length then
            let cnt = mergeSeqHandleDuplicates cmp 0 sortedLength sortedLength src dst
            struct(dst, cnt)
        else

krauthaufen · 2020-12-14T10:10:49Z

This was actually intentional since it saves iterating the array once and saves a few percent.

krauthaufen · 2020-12-14T10:12:44Z

I'm currently focusing on getting ofList/ofSeq/ofArray faster for very small counts (benchmarks currently running). The code was a bit painful to write but I think it might be worth it.

krauthaufen · 2020-12-14T10:26:59Z

Hey, my optimizations for super-small counts seem to work decently.

For counts in [1..8] the new implementation is always at least as fast (within error margins) as the original one and gets a whole lot faster for large inputs.

// * Summary *

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
  [Host]     : .NET Core 3.1.10 (CoreCLR 4.700.20.51601, CoreFX 4.700.20.51901), X64 RyuJIT DEBUG
  Job-DPNHNG : .NET Core 3.1.10 (CoreCLR 4.700.20.51601, CoreFX 4.700.20.51901), X64 RyuJIT

Server=True

Method	Count	Mean	Error	StdDev	Median	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
Map_ofArray	1	17.07 ns	0.113 ns	0.106 ns	17.03 ns	1.00	0.00	0.0007	-	-	64 B
MapNew_ofArray	1	14.48 ns	0.355 ns	0.816 ns	14.79 ns	0.79	0.07	0.0006	-	-	56 B

Map_ofList	1	27.63 ns	0.136 ns	0.127 ns	27.66 ns	1.00	0.00	0.0010	-	-	88 B
MapNew_ofList	1	11.73 ns	0.121 ns	0.113 ns	11.70 ns	0.42	0.00	0.0006	-	-	56 B

Map_ofSeq	1	30.38 ns	0.188 ns	0.176 ns	30.38 ns	1.00	0.00	0.0010	-	-	88 B
MapNew_ofSeq	1	15.98 ns	0.051 ns	0.040 ns	15.99 ns	0.53	0.00	0.0006	-	-	56 B

Map_ofArray	2	31.17 ns	0.141 ns	0.132 ns	31.19 ns	1.00	0.00	0.0013	-	-	112 B
MapNew_ofArray	2	32.12 ns	0.164 ns	0.154 ns	32.18 ns	1.03	0.01	0.0011	-	-	104 B

Map_ofList	2	41.19 ns	0.166 ns	0.147 ns	41.18 ns	1.00	0.00	0.0015	-	-	136 B
MapNew_ofList	2	31.41 ns	0.199 ns	0.186 ns	31.37 ns	0.76	0.00	0.0011	-	-	104 B

Map_ofSeq	2	43.78 ns	0.164 ns	0.137 ns	43.78 ns	1.00	0.00	0.0015	-	-	136 B
MapNew_ofSeq	2	37.03 ns	0.448 ns	0.397 ns	36.94 ns	0.85	0.01	0.0011	-	-	104 B

Map_ofArray	3	84.56 ns	0.516 ns	0.482 ns	84.45 ns	1.00	0.00	0.0023	-	-	208 B
MapNew_ofArray	3	90.89 ns	0.630 ns	0.589 ns	90.69 ns	1.07	0.01	0.0021	-	-	200 B

Map_ofList	3	79.10 ns	0.253 ns	0.224 ns	79.11 ns	1.00	0.00	0.0023	-	-	208 B
MapNew_ofList	3	54.52 ns	0.347 ns	0.324 ns	54.42 ns	0.69	0.00	0.0014	-	-	128 B

Map_ofSeq	3	98.81 ns	0.500 ns	0.443 ns	98.70 ns	1.00	0.00	0.0025	-	-	232 B
MapNew_ofSeq	3	90.07 ns	0.668 ns	0.592 ns	90.21 ns	0.91	0.01	0.0023	-	-	200 B

Map_ofArray	4	182.91 ns	0.601 ns	0.563 ns	182.89 ns	1.00	0.00	0.0041	-	-	376 B
MapNew_ofArray	4	94.31 ns	1.924 ns	2.760 ns	92.69 ns	0.52	0.02	0.0019	-	-	176 B

Map_ofList	4	218.61 ns	0.670 ns	0.627 ns	218.55 ns	1.00	0.00	0.0050	-	-	448 B
MapNew_ofList	4	105.21 ns	0.439 ns	0.390 ns	105.22 ns	0.48	0.00	0.0025	-	-	224 B

Map_ofSeq	4	176.96 ns	0.594 ns	0.526 ns	176.78 ns	1.00	0.00	0.0038	-	-	352 B
MapNew_ofSeq	4	140.27 ns	1.661 ns	1.387 ns	140.14 ns	0.79	0.01	0.0029	-	-	272 B

Map_ofArray	5	255.16 ns	1.817 ns	1.517 ns	254.98 ns	1.00	0.00	0.0052	-	-	496 B
MapNew_ofArray	5	126.41 ns	0.737 ns	0.690 ns	126.46 ns	0.50	0.00	0.0024	-	-	224 B

Map_ofList	5	188.18 ns	0.677 ns	0.566 ns	188.12 ns	1.00	0.00	0.0043	-	-	400 B
MapNew_ofList	5	122.11 ns	0.233 ns	0.207 ns	122.14 ns	0.65	0.00	0.0021	-	-	200 B

Map_ofSeq	5	223.41 ns	1.351 ns	1.264 ns	223.52 ns	1.00	0.00	0.0050	-	-	448 B
MapNew_ofSeq	5	276.21 ns	2.683 ns	2.509 ns	275.32 ns	1.24	0.01	0.0057	-	-	512 B

Map_ofArray	6	265.37 ns	1.009 ns	0.895 ns	265.47 ns	1.00	0.00	0.0057	-	-	520 B
MapNew_ofArray	6	207.73 ns	0.587 ns	0.520 ns	207.74 ns	0.78	0.00	0.0043	-	-	392 B

Map_ofList	6	272.02 ns	1.520 ns	1.422 ns	271.67 ns	1.00	0.00	0.0057	-	-	544 B
MapNew_ofList	6	221.65 ns	0.863 ns	0.765 ns	221.63 ns	0.81	0.00	0.0052	-	-	472 B

Map_ofSeq	6	275.55 ns	0.819 ns	0.726 ns	275.70 ns	1.00	0.00	0.0057	-	-	544 B
MapNew_ofSeq	6	232.76 ns	0.823 ns	0.770 ns	232.49 ns	0.84	0.00	0.0052	-	-	472 B

Map_ofArray	7	451.17 ns	1.268 ns	1.124 ns	451.04 ns	1.00	0.00	0.0086	-	-	784 B
MapNew_ofArray	7	237.82 ns	1.106 ns	0.980 ns	237.71 ns	0.53	0.00	0.0048	-	-	432 B

Map_ofList	7	362.27 ns	1.705 ns	1.595 ns	362.23 ns	1.00	0.00	0.0072	-	-	664 B
MapNew_ofList	7	261.09 ns	1.908 ns	1.692 ns	261.01 ns	0.72	0.00	0.0052	-	-	504 B

Map_ofSeq	7	506.84 ns	2.857 ns	2.533 ns	506.19 ns	1.00	0.00	0.0095	-	-	904 B
MapNew_ofSeq	7	259.65 ns	1.002 ns	0.937 ns	259.80 ns	0.51	0.00	0.0052	-	-	504 B

Map_ofArray	8	574.78 ns	3.576 ns	3.345 ns	574.83 ns	1.00	0.00	0.0105	-	-	1000 B
MapNew_ofArray	8	282.62 ns	0.973 ns	0.863 ns	282.76 ns	0.49	0.00	0.0052	-	-	496 B

Map_ofList	8	613.85 ns	2.704 ns	2.397 ns	613.49 ns	1.00	0.00	0.0124	-	-	1096 B
MapNew_ofList	8	307.42 ns	1.513 ns	1.415 ns	307.66 ns	0.50	0.00	0.0062	-	-	560 B

Map_ofSeq	8	606.14 ns	2.841 ns	2.657 ns	606.27 ns	1.00	0.00	0.0114	-	-	1048 B
MapNew_ofSeq	8	311.47 ns	1.205 ns	1.068 ns	311.31 ns	0.51	0.00	0.0062	-	-	560 B

yatli · 2020-12-14T11:58:05Z

This was actually intentional since it saves iterating the array once and saves a few percent.

Hmm if so why not force the condition sortedLength < src.Length? Currently arrays of length 2^n do not benefit this last round run.

yatli · 2020-12-14T12:06:58Z

Like, krauthaufen/MapNew#2 ?

krauthaufen · 2020-12-14T13:05:56Z

good idea, thanks.
it was a little late when i coded that 😅

yatli · 2020-12-14T13:09:32Z

How do we maintain compatibility with the serialization libraries?
Like Json.Net etc.

krauthaufen · 2020-12-14T13:18:16Z

Hey, I just did some Benchmarks with decimal as Key and/or Value and a ReferenceType (custom wrapped decimal) which all show more or less the identical behaviour as the int*int case.

However the LanguagePrimitives.FastGenericComparer boxes custom structs (also Guid for example) and when that happens the new implementation performs more or less identical to the old one. I should note that the overall runtime for almost all operations is drastically increased for both implementations in that case. (I think there's an old issue covering that)

A possible explanation for this is that, given an expensive (boxing) comparison, the new implementation of ofArray may perform slightly more comparisons than the old one.

The serialization thing is really a little tricky, since I have no idea how these serializations look internally?
Another question is whether or not these shall even be compatible?
FsPickler for example has a special-case for Map (using toArray, ofArray afaik.)

krauthaufen · 2020-12-14T13:21:11Z

A quick check revealed that Json.Net does:

let m = MapNew.ofArray [|1,2;3,4;5,6|]
let json = Json.Net.JsonNet.Serialize(m)
printfn "%s" json // => ["1":2,"3":4,"5":6]

Maybe due to my IDictionary<'Key, 'Value> implementation?

yatli · 2020-12-14T14:20:18Z

["1":2,"3":4,"5":6]

Quite good then! It matches the results of the old map.

dsyme · 2020-12-15T12:20:25Z

Amazing work here!

For serialization I believe the contract is a serializedData field that gets populated on-demand, here's the current implementation

Is this feasible?

    [<System.NonSerialized>]
    // This type is logically immutable. This field is only mutated during deserialization.
    let mutable comparer = comparer
 
    [<System.NonSerialized>]
    // This type is logically immutable. This field is only mutated during deserialization.
    let mutable tree = tree

    // WARNING: The compiled name of this field may never be changed because it is part of the logical
    // WARNING: permanent serialization format for this type.
    let mutable serializedData = null

    [<System.Runtime.Serialization.OnSerializingAttribute>]
    member __.OnSerializing(context: System.Runtime.Serialization.StreamingContext) =
        ignore context
        serializedData <- MapTree.toArray tree |> Array.map (fun (k, v) -> KeyValuePair(k, v))

    [<System.Runtime.Serialization.OnDeserializedAttribute>]
    member __.OnDeserialized(context: System.Runtime.Serialization.StreamingContext) =
        ignore context
        comparer <- LanguagePrimitives.FastGenericComparer<'Key>
        tree <- serializedData |> Array.map (fun kvp -> kvp.Key, kvp.Value) |> MapTree.ofArray comparer
        serializedData <- null

krauthaufen · 2020-12-15T12:24:16Z

Hey, we could certainly do that.

Btw I finally started creating a Fable app for visualizing benchmark results and it's just amazing how simple that is 😁

I'll add the serialization-stuff once I'm done with my visualization

krauthaufen · 2020-12-15T13:49:29Z

https://aardvarkians.com/demo/MapNew/

yatli · 2020-12-15T15:20:31Z

However the LanguagePrimitives.FastGenericComparer boxes custom structs (also Guid for example) and when that happens the new implementation performs more or less identical to the old one.

The optimizer is aware of this path and will try to de-virtualize the call (see generic_comparison_inner_vref)
But it only works on F# generated structs (those with CompareTo)

See:

prim-type.fs,L1095,GenericComparisonIntrinsic
Optimizer.fs,L2602, ... when CanDevirtualizeApplication cenv v cenv.g.generic_comparison_inner_vref ty args ...

I think we can definitely do better than the most generic comparer in some cases (e.g. structs, non-null stuff) -- we can add them to the optimizer as separate comparison de-virtualization cases.

Edit: for well-known C# types (Guid etc.) we can directly add them to the fast generic comparer table, see prim-types.fs,L2084, type FastGenericEqualityComparerTable<'T> ...

krauthaufen · 2020-12-15T18:00:44Z

Hey, The virtual call is only one part of the problem here I think.

here is a :? IComparable which works but still boxes the struct value and therefore allocates garbage. Then the virtual CompareTo(o : obj) gets invoked which again boxes the second argument.

So for structs we effectively pay 2 boxings and 1 virtual call (and maybe a tuple allocation if the compiler doesn't optimize it)

I have no idea how this is avoided in System.Collections.Generic.Comparer<'T>.Default but it somehow seems to be (at least my tests are way faster when using it)

Btw the same goes for custom structs and Equals (when structs override Equals): the struct gets boxed and the virtual Equals is invoked.

But I think that's a different issue.

yatli · 2020-12-15T18:12:10Z

here is a :? IComparable which works but still boxes the struct value and therefore allocates garbage. Then the virtual CompareTo(o : obj) gets invoked which again boxes the second argument.

If properly optimized it won't go into this path. The optimizer will rewrite the AST.

krauthaufen · 2020-12-15T18:13:18Z

Regarding the Map implementation:

I added several new things: union/unionWith/GetSlice/etc.
All benchmarks for the current state are running atm. with better iteration-counts, etc. (may take a while)
I think that I'm done optimizing the code and profiles indicate that there is very little left to optimize. (I'm also out of ideas 🤣)
So when you really want this in FSharp.Core I could create a Set implementation and put together a PR for FSharp.Core

The preliminary benchmark results (small iteration counts) can be seen here

yatli · 2020-12-15T18:15:25Z

Fantastic visualization 😄

vzarytovskii · 2020-12-15T18:20:16Z

This is absolutely fantastic, thank you for all that work @krauthaufen!

yatli · 2020-12-16T17:07:36Z

@krauthaufen maybe I can help with the optimizer if this suggestion is approved. We can then set up a branch and work with FSharp.Core and fsc directly.

JustinWick · 2020-12-20T01:37:22Z

@krauthaufen If this pans out, it will be absurdly useful. Thanks for all of your hard work, this is exciting, and educational for all of us here on the thread!

cartermp · 2020-12-20T05:46:37Z

@krauthaufen let's take this forward into dotnet/fsharp as a PR to both map and set (or separate PRs). Note that the compiler also uses, effectively, a copy of these data structures internally where the same optimizations would likely have to be made. But it's not too bad.

krauthaufen · 2020-12-20T09:22:05Z

Hey, will do tomorrow. Cool that people are interested in this.

I hope the set implementation will show similar speedups.

Cheers

mvkara · 2021-01-11T08:27:11Z

Just curious - is it considered a breaking change to move away from an AVL tree? I remember some discussion in the past about this; I'm guessing it probably is but I remember people being receptive to changing that at the time.

The reason I ask is that from my experiments and coding a custom map for work purposes you can get significantly more benefit from other data structures (e.g. HAMT). Unfortunately in my work I was forced to roll out my own since F# map was too slow; it would of been nice to have a fast friendly implementation out of the box. Performance aside a fast immutable HashMap implementation that doesn't need the 'comparison' constraint would of helped significantly in some previous projects where I didn't have control of the key type (e.g. C# sourced objects).

The benefits can be quite substantial - I was seeing performance improvements often between 3 and 10x for small and large collections respectively. e.g from https://github.com/mvkara/fsharp-hashcollections with more detailed benchmarks for a simple ''tryFind' operation.

|                           Method | CollectionSize |        Mean |    Error |   StdDev |
|--------------------------------- |--------------- |------------:|---------:|---------:|
|                       HAMTMap    |             10 |    11.07 ns | 0.071 ns | 0.063 ns |
|                GetImToolsHashMap |             10 |    28.52 ns | 0.203 ns | 0.190 ns |
|                     GetFSharpMap |             10 |    39.55 ns | 0.540 ns | 0.505 ns |
|                GetFSharpXHashMap |             10 |   103.17 ns | 1.195 ns | 1.118 ns |
| GetSystemCollectionsImmutableMap |             10 |    24.38 ns | 0.318 ns | 0.298 ns |
|         GetFSharpDataAdaptiveMap |             10 |    21.88 ns | 0.088 ns | 0.082 ns |

|                       HAMTMap    |            100 |    15.80 ns | 0.027 ns | 0.025 ns |
|                GetImToolsHashMap |            100 |    42.50 ns | 0.831 ns | 0.816 ns |
|                     GetFSharpMap |            100 |    70.69 ns | 1.385 ns | 1.595 ns |
|                GetFSharpXHashMap |            100 |   114.97 ns | 1.520 ns | 1.422 ns |
| GetSystemCollectionsImmutableMap |            100 |    36.47 ns | 0.349 ns | 0.327 ns |
|         GetFSharpDataAdaptiveMap |            100 |    43.15 ns | 0.098 ns | 0.082 ns |

|                       HAMTMap    |           1000 |    12.83 ns | 0.017 ns | 0.014 ns |
|                GetImToolsHashMap |           1000 |    63.37 ns | 0.473 ns | 0.442 ns |
|                     GetFSharpMap |           1000 |   107.45 ns | 0.750 ns | 0.702 ns |
|                GetFSharpXHashMap |           1000 |   121.31 ns | 2.338 ns | 2.187 ns |
| GetSystemCollectionsImmutableMap |           1000 |    55.16 ns | 0.103 ns | 0.097 ns |
|         GetFSharpDataAdaptiveMap |           1000 |    66.20 ns | 0.463 ns | 0.433 ns |

|                       HAMTMap    |        5000000 |   142.23 ns | 0.050 ns | 0.047 ns |
|                GetImToolsHashMap |        5000000 | 1,282.78 ns | 0.596 ns | 0.558 ns |
|                     GetFSharpMap |        5000000 |   851.10 ns | 1.426 ns | 1.334 ns |
|                GetFSharpXHashMap |        5000000 |   375.16 ns | 0.812 ns | 0.719 ns |
| GetSystemCollectionsImmutableMap |        5000000 |   850.68 ns | 0.405 ns | 0.359 ns |
|         GetFSharpDataAdaptiveMap |        5000000 |   608.75 ns | 0.250 ns | 0.233 ns |

cartermp · 2021-01-11T23:14:53Z

Switching the data structure under the covers isn't a breaking change, no. At least not one we really consider here. We're free to change internals at any point, and if someone is able to use reflection to depend on that then that's a risk we're assuming they are willingly taking.

That said, the churn factor for something like that is quite high, and changes that aren't incremental are much more difficult to accept.

krauthaufen · 2021-01-16T10:34:55Z

Hey, after doing yet another map implementation (Yam) with the insights gained here I managed to make most operations a bit faster (especially note the ofArray performance which now doesn't copy anything and therefore runs with O(1) scratch-memory)

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.746 (2004/?/20H1)
Intel Core i7-4930K CPU 3.40GHz (Haswell), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100
  [Host]     : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT DEBUG
  Job-WQVKON : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

Server=False

Method	Count	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
Yam_add	100	15,401.56 ns	243.887 ns	216.199 ns	0.80	0.02	5.8289	-	-	36608 B
Map_add	100	19,254.27 ns	378.406 ns	566.381 ns	1.00	0.00	5.7373	-	-	36136 B

Yam_remove	100	15,461.79 ns	307.595 ns	554.657 ns	0.79	0.04	4.9744	-	-	31232 B
Map_remove	100	19,426.96 ns	381.377 ns	546.959 ns	1.00	0.00	5.4321	-	-	34144 B

Yam_ofArray	100	9,095.73 ns	181.949 ns	367.546 ns	0.60	0.02	0.7019	0.0153	-	4416 B
Map_ofArray	100	15,010.79 ns	194.285 ns	162.237 ns	1.00	0.00	4.4556	0.1373	-	28000 B

Yam_toArray	100	1,379.79 ns	27.016 ns	42.850 ns	0.65	0.03	0.5131	0.0114	-	3224 B
Map_toArray	100	2,122.62 ns	41.669 ns	66.091 ns	1.00	0.00	1.0223	0.0191	-	6424 B

Yam_containsKey_all	100	3,586.66 ns	55.260 ns	51.690 ns	1.00	0.08	-	-	-	-
Map_containsKey_all	100	3,622.26 ns	71.927 ns	176.438 ns	1.00	0.00	-	-	-	-

Yam_containsKey_nonexisting	100	28.18 ns	0.570 ns	0.610 ns	0.85	0.02	-	-	-	-
Map_containsKey_nonexisting	100	33.33 ns	0.581 ns	0.515 ns	1.00	0.00	-	-	-	-

Yam_exists	100	380.36 ns	5.128 ns	4.546 ns	0.92	0.01	0.0038	-	-	24 B
Map_exists	100	413.72 ns	3.673 ns	3.436 ns	1.00	0.00	0.0038	-	-	24 B

When maintaining a count per inner-node (allowing for O(1) count and O(log N) positional queries) the results are still quite acceptable:

// * Summary *

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.746 (2004/?/20H1)
Intel Core i7-4930K CPU 3.40GHz (Haswell), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100
  [Host]     : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT DEBUG
  Job-NTZQCI : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

Server=False

Method	Count	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
Yam_add	100	17,137.13 ns	338.754 ns	527.398 ns	0.88	0.03	5.7983	-	-	36424 B
Map_add	100	19,499.56 ns	309.162 ns	289.190 ns	1.00	0.00	5.8594	-	-	36832 B

Yam_remove	100	17,596.50 ns	344.689 ns	423.309 ns	0.98	0.03	5.8289	-	-	36672 B
Map_remove	100	17,981.42 ns	359.338 ns	413.814 ns	1.00	0.00	5.4321	-	-	34112 B

Yam_ofArray	100	11,144.64 ns	221.712 ns	217.751 ns	0.67	0.02	0.7782	0.0153	-	4912 B
Map_ofArray	100	16,669.08 ns	331.241 ns	622.151 ns	1.00	0.00	4.5776	0.1221	-	28744 B

Yam_toArray	100	1,025.46 ns	18.328 ns	30.114 ns	0.48	0.03	0.5131	0.0114	-	3224 B
Map_toArray	100	2,137.10 ns	40.995 ns	66.199 ns	1.00	0.00	1.0223	0.0191	-	6424 B

Yam_containsKey_all	100	3,575.39 ns	70.659 ns	96.719 ns	1.05	0.03	-	-	-	-
Map_containsKey_all	100	3,428.17 ns	67.072 ns	74.550 ns	1.00	0.00	-	-	-	-

Yam_containsKey_nonexisting	100	33.52 ns	0.708 ns	1.038 ns	1.08	0.03	-	-	-	-
Map_containsKey_nonexisting	100	31.19 ns	0.516 ns	0.403 ns	1.00	0.00	-	-	-	-

Yam_exists	100	366.18 ns	7.328 ns	9.268 ns	0.95	0.03	0.0038	-	-	24 B
Map_exists	100	385.18 ns	5.914 ns	6.328 ns	1.00	0.00	0.0038	-	-	24 B

However note that the Map benchmarks differ between the two runs, so I'll investigate.

If you're still interested in the (now smaller) improvements I can create a new PR (or adapt this one). Note that I have something in mind for keeping the overall count (not per inner node but globally per map) that should be quite efficient.
Please let me know what you think. The implementation is here

krauthaufen · 2021-01-16T10:47:12Z

@mvkara I think changing the Map implementation to a HAMT actually is a breaking change since

it requires equality instead of comparison
it's no longer sorted (it can be sorted by hashcode but not by key)
things like tryMin tryMax would have O(N) runtime instead of O(log N)

However I think adding a separate immutable HashMap/HashSet to FSharp.Core would be really awesome.

cartermp · 2021-01-24T18:19:13Z

Closing this out as there is significant progress made in dotnet/fsharp and discussions going on in there

krauthaufen mentioned this issue Dec 21, 2020

Improved Map performance dotnet/fsharp#10768

Closed

cartermp closed this as completed Jan 24, 2021

Improved Map Performance #940

Improved Map Performance #940

Comments

krauthaufen commented Dec 10, 2020 • edited

Improved Map Performance

Pros and Cons

Extra information

Affidavit (please submit!)

For Readers

dsyme commented Dec 11, 2020

krauthaufen commented Dec 11, 2020

krauthaufen commented Dec 11, 2020

NinoFloris commented Dec 11, 2020

krauthaufen commented Dec 11, 2020

kerams commented Dec 11, 2020 • edited

krauthaufen commented Dec 11, 2020

krauthaufen commented Dec 12, 2020

krauthaufen commented Dec 12, 2020

yatli commented Dec 12, 2020 • edited

goswinr commented Dec 12, 2020

krauthaufen commented Dec 12, 2020 • edited

Current State

Override problem

krauthaufen commented Dec 13, 2020

yatli commented Dec 14, 2020

krauthaufen commented Dec 14, 2020

yatli commented Dec 14, 2020

yatli commented Dec 14, 2020

yatli commented Dec 14, 2020 • edited

yatli commented Dec 14, 2020 • edited

krauthaufen commented Dec 14, 2020

yatli commented Dec 14, 2020

yatli commented Dec 14, 2020 • edited

yatli commented Dec 14, 2020

krauthaufen commented Dec 14, 2020

krauthaufen commented Dec 14, 2020

krauthaufen commented Dec 14, 2020

yatli commented Dec 14, 2020

yatli commented Dec 14, 2020 • edited

krauthaufen commented Dec 14, 2020

yatli commented Dec 14, 2020

krauthaufen commented Dec 14, 2020

krauthaufen commented Dec 14, 2020

yatli commented Dec 14, 2020

dsyme commented Dec 15, 2020 • edited

krauthaufen commented Dec 15, 2020

krauthaufen commented Dec 15, 2020

yatli commented Dec 15, 2020 • edited

krauthaufen commented Dec 15, 2020

yatli commented Dec 15, 2020

krauthaufen commented Dec 15, 2020

yatli commented Dec 15, 2020

vzarytovskii commented Dec 15, 2020

yatli commented Dec 16, 2020

JustinWick commented Dec 20, 2020 • edited

cartermp commented Dec 20, 2020

krauthaufen commented Dec 20, 2020 • edited

mvkara commented Jan 11, 2021 • edited

cartermp commented Jan 11, 2021

krauthaufen commented Jan 16, 2021 • edited

krauthaufen commented Jan 16, 2021

cartermp commented Jan 24, 2021

krauthaufen commented Dec 10, 2020 •

edited

kerams commented Dec 11, 2020 •

edited

yatli commented Dec 12, 2020 •

edited

krauthaufen commented Dec 12, 2020 •

edited

yatli commented Dec 14, 2020 •

edited

yatli commented Dec 14, 2020 •

edited

yatli commented Dec 14, 2020 •

edited

yatli commented Dec 14, 2020 •

edited

dsyme commented Dec 15, 2020 •

edited

yatli commented Dec 15, 2020 •

edited

JustinWick commented Dec 20, 2020 •

edited

krauthaufen commented Dec 20, 2020 •

edited

mvkara commented Jan 11, 2021 •

edited

krauthaufen commented Jan 16, 2021 •

edited