Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Map Performance #940

Closed
5 tasks done
krauthaufen opened this issue Dec 10, 2020 · 51 comments
Closed
5 tasks done

Improved Map Performance #940

krauthaufen opened this issue Dec 10, 2020 · 51 comments

Comments

@krauthaufen
Copy link

krauthaufen commented Dec 10, 2020

Improved Map Performance

I recently started a new Map<'Key, 'Value> implementation for FSharp.Data.Adaptive that uses subtyping and virtual calls instead of discriminated unions and pattern matching for representing the tree.

After a few hours it became clear that this implementation performs much better and so I deciced to start a discussion here.

So here's my repo containing the implementation, several tests and benchmarks. Note that I will likely add more combinators there and I don't expect all of it to be merged into FSharp.Core but I think F# shouldn't miss out on the potential speedup.

repo

Also note that the same technique should be applied to Set when merging this.

I'd also like to mention that the abstract data-type is still the AVL-like tree as in FSharp.Core today. The only difference is really how operations are implemented.

I would be very happy to help with integrating this into FSharp.Core and I'm of course open to suggestions.

Pros and Cons

The advantages of making this adjustment to F# are significant performance improvements.

The disadvantages of making this adjustment to F# are the slightly less readable code. Although the original Map implementation isn't exactly readable.

Extra information

Estimated cost (XS, S, M, L, XL, XXL): L

Related suggestions: (put links to related suggestions here)

Affidavit (please submit!)

Please tick this by placing a cross in the box:

  • This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
  • I have searched both open and closed suggestions on this site and believe this is not a duplicate
  • This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.

Please tick all that apply:

  • This is not a breaking change to the F# language design
  • I or my company would be willing to help implement and/or test this

For Readers

If you would like to see this issue implemented, please click the 👍 emoji on this issue. These counts are used to generally order the suggestions by engagement.

@dsyme
Copy link
Collaborator

dsyme commented Dec 11, 2020

Was this measured against FSharp.Core 5.0.0 or 4.7.2?

thanks!

@krauthaufen
Copy link
Author

Hey,

TBH I'm not entirely sure which version these benchmarks ran on, so I did a quick rerun on my notebook (different CPU, only for count=10) ensuring that FSharp.Core >= 5.0.0 is used and the general behaviour seems to be similar. (running the benchmark with all counts on the other machine right now)

Method Count Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Map_ofArray 10 765.045 ns 15.2483 ns 22.3507 ns 0.1974 - - 1240 B
MapNew_ofArray 10 645.144 ns 11.9210 ns 16.7116 ns 0.1574 - - 992 B
Map_toArray 10 194.027 ns 3.8896 ns 4.4792 ns 0.1056 0.0002 - 664 B
MapNew_toArray 10 76.341 ns 0.5347 ns 0.4465 ns 0.0548 0.0001 - 344 B
Map_enumerate 10 298.305 ns 3.7254 ns 3.4848 ns 0.1144 - - 720 B
MapNew_enumerate 10 189.360 ns 2.4233 ns 2.2668 ns 0.0637 - - 400 B
Map_containsKey_all 10 214.089 ns 2.5381 ns 2.3742 ns - - - -
MapNew_containsKey_all 10 106.402 ns 0.9764 ns 0.9134 ns - - - -
Map_containsKey_nonexisting 10 26.962 ns 0.5622 ns 0.5259 ns - - - -
MapNew_containsKey_nonexisting 10 7.740 ns 0.0657 ns 0.0614 ns - - - -
Map_remove_all 10 697.105 ns 10.8650 ns 9.0727 ns 0.2060 - - 1296 B
MapNew_remove_all 10 486.468 ns 8.3427 ns 14.1665 ns 0.1612 - - 1016 B

@krauthaufen
Copy link
Author

Hey, benchmarks are still running but I managed to optimize ofArray/ofSeq/ofList dramatically (up to 5x) with the following procedure

  1. copy data to a struct('Key * 'Value)[]
  2. Array.sortInPlace by key
  3. handle duplicates with correct override semantics
  4. build the tree

Again these numbers are from my Notebook:

Method Count Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Map_ofArray 1 14.59 ns 0.075 ns 0.062 ns 0.0102 - - 64 B
MapNew_ofArray 1 10.95 ns 0.068 ns 0.063 ns 0.0089 - - 56 B
MapNew_ofArray_optimized 1 10.37 ns 0.069 ns 0.065 ns 0.0089 - - 56 B
Map_ofArray 2 26.58 ns 0.154 ns 0.129 ns 0.0178 - - 112 B
MapNew_ofArray 2 32.52 ns 0.259 ns 0.229 ns 0.0166 - - 104 B
MapNew_ofArray_optimized 2 78.19 ns 0.430 ns 0.402 ns 0.0318 - - 200 B
Map_ofArray 10 750.98 ns 3.208 ns 3.000 ns 0.2050 - - 1288 B
MapNew_ofArray 10 578.95 ns 3.925 ns 3.479 ns 0.1421 - - 896 B
MapNew_ofArray_optimized 10 298.51 ns 1.078 ns 0.900 ns 0.0968 - - 608 B
Map_ofArray 100 21,386.32 ns 74.178 ns 69.387 ns 4.4250 0.1221 - 27856 B
MapNew_ofArray 100 11,883.74 ns 78.727 ns 73.641 ns 1.6479 0.0458 - 10424 B
MapNew_ofArray_optimized 100 2,757.47 ns 8.749 ns 8.183 ns 0.8278 0.0229 - 5216 B
Map_ofArray 1000 407,528.56 ns 6,147.472 ns 5,449.573 ns 70.8008 16.1133 - 445456 B
MapNew_ofArray 1000 227,403.18 ns 973.302 ns 812.751 ns 22.4609 4.8828 - 142232 B
MapNew_ofArray_optimized 1000 41,938.29 ns 223.936 ns 209.470 ns 7.6904 1.5259 - 48368 B

@NinoFloris
Copy link

@krauthaufen possibly the BCL Array.Sort over two arrays (keys, values) would help you even more here https://docs.microsoft.com/en-us/dotnet/api/system.array.sort?view=net-5.0#System_Array_Sort_System_Array_System_Array_System_Int32_System_Int32_

@krauthaufen
Copy link
Author

Cool, I'll definetly investigate that. I'm also working on improving things for very small counts since the array seems to be slower for approx. count < 5. I'll be busy with other things over the weekend but I'll continue there next week.

@kerams
Copy link

kerams commented Dec 11, 2020

Fingers crossed you persist and this gets in. I recall seeing PRs with significant collection optimizations that have gradually lost momentum and fizzled out.

@krauthaufen
Copy link
Author

Update, the ofArray benchmarks executed on a real machine (notebooks tend to get hot and therefore slower)
https://github.com/krauthaufen/MapNew/wiki/Optimized-ofArray

@krauthaufen
Copy link
Author

Hey @NinoFloris I tried using the BCL sort you posted, but it's actually a tiny bit slower. Maybe due to the fact that I need to maintain two arrays instead of one and lookups might be less cache-local.

It's interesting to see, that the BCL code somehow manages to allocate less garbage but is nonetheless slower.
I will try using the BCL sort on struct('Key * 'Value) next.

Method Count Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Map_ofArray 100 21.772 us 0.1405 us 0.1246 us 4.5776 0.1221 - 28.14 KB
MapNew_ofArray 100 12.866 us 0.0318 us 0.0248 us 2.0294 0.0610 - 12.52 KB
MapNew_ofArray_optimized 100 2.765 us 0.0125 us 0.0117 us 0.8278 0.0229 - 5.09 KB
MapNew_ofArray_bcl 100 3.720 us 0.0174 us 0.0163 us 0.7629 0.0229 - 4.68 KB
Map_ofArray 1000 382.301 us 2.8486 us 2.6645 us 70.8008 16.6016 - 436.05 KB
MapNew_ofArray 1000 232.351 us 0.9637 us 0.8047 us 24.6582 5.8594 - 151.3 KB
MapNew_ofArray_optimized 1000 42.696 us 0.4129 us 0.3862 us 7.6904 1.5259 - 47.23 KB
MapNew_ofArray_bcl 1000 63.367 us 0.5040 us 0.4714 us 6.9580 1.2207 - 43.3 KB

@krauthaufen
Copy link
Author

I just found that all of these sorts are non-stable which means that my logic for duplicates won't work (which my tests sadly missed).

I'm currently thinking about ways to work around that problem.

A quick test with Aardvark's TimSort shows that a stable sort might be a little slower, but nonetheless the speedup will be huge.

@yatli
Copy link

yatli commented Dec 12, 2020

Map_ofArray 10000 9,227,999.61 ns 183,278.495 ns 180,003.989 ns 968.7500 265.6250 62.5000 6088782 B
MapNew_ofArray 10000 5,473,464.24 ns 105,595.415 ns 112,985.911 ns 296.8750 148.4375 - 1899776 B
MapNew_ofArray_optimized 10000 888,288.21 ns 7,726.569 ns 6,452.035 ns 79.1016 39.0625 - 501801 B

That's huge.

@goswinr
Copy link

goswinr commented Dec 12, 2020

@krauthaufen
I guess you have seen the recent changes to Map.

I also wonder if a Concestor dictionary could replace the Map? apparently it was at least 4 times faster (a few years ago). Is that something worth considering?

@krauthaufen
Copy link
Author

krauthaufen commented Dec 12, 2020

Hey, yep I've seen those and afaik they're included in FSharp.Core >= 5.0.0 right?

The Concestor dictionary could be worth considering but my experience shows that in the end a simple balanced binary tree always outperforms other comparison-based datastructures. I'm definetly not against considering that, it's just a different kind of monster with other problems. I think that for such a low-level datastructure there is simply no super-elegant super-fast solution.

Regarding the problem from above (duplicates in ofArray, etc.) i implemented a workaround using a stable mergeSort implementation.

Current State

I think I'd like to take a little more time to improve that but I also think that the speedup is still huge (see table below) for ofArray, ofSeq, ofList and all the other combinators (see benchmarks) also significantly outperform the current implementation.

Override problem

Maybe someone has a better idea? any ideas welcome

ofArray currently works as follows:

  1. The input is an array of 'Key * 'Value tuples and it is ensured (via copy, etc) that my sorting may mutate the array-contents
  2. sort the things by their key (in a stable way)
  3. "compact" identical keys via iterating over the array again (and maintain a new count)
  4. visit the array in a binary-search-like way to create the nodes.

The stable property is needed for correctly resolving conflicts [|(0,1); (0,3)|] should produce a map holding (0,3)

The problem is that stable sorts (such as my simple mergesort) tend to be slower than e.g. quicksort. This can be seen here:

Method Count Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Map_ofArray 10 797.8 ns 15.18 ns 14.20 ns 0.2165 - - 1360 B
MapNew_ofArray_optimized 10 514.7 ns 10.36 ns 13.83 ns 0.0992 - - 624 B
MapNew_ofArray_WRONG 10 429.1 ns 8.00 ns 7.10 ns 0.0968 - - 608 B
Map_ofArray 100 21,382.7 ns 129.49 ns 114.79 ns 4.4250 0.1221 - 27856 B
MapNew_ofArray_optimized 100 6,307.1 ns 105.20 ns 93.25 ns 0.8850 0.0229 - 5592 B
MapNew_ofArray_WRONG 100 3,859.6 ns 73.73 ns 81.95 ns 0.8240 0.0229 - 5216 B
Map_ofArray 1000 399,811.7 ns 5,783.82 ns 5,410.19 ns 70.8008 16.1133 - 445144 B
MapNew_ofArray_optimized 1000 102,484.6 ns 1,988.17 ns 1,762.46 ns 8.3008 1.5869 - 52344 B
MapNew_ofArray_WRONG 1000 59,636.1 ns 824.82 ns 731.18 ns 7.6904 1.4648 - 48368 B

Things i tried so far:

  1. using Array.sort with struct('Key * int) (the int being the source index) ended up to be quite slow
  2. tweaking a quicksort to remove duplicates inline (didn't manage to get my head around that)

Things that could be done to improve the stable-sorting a little:

  1. inline the duplicate-removal in the last merge step (saves iterating the array once)
  2. optimize the mergeSort
  3. conisder uising something more sophisticated (e.g. TimSort) -> hard to implement

@krauthaufen
Copy link
Author

Hey, I just updated the benchmarks.
Everything got a little faster again, so almost all combinators are now 1.2x - 3x faster than the original map.
ofArray for 10k elements is even 5 times faster.

@yatli
Copy link

yatli commented Dec 14, 2020

@krauthaufen have you tried to run: https://github.com/buybackoff/fsharp-benchmarks/blob/master/FSharpCoreOptim/Bench.fs

I'd also like to test for small key/value (intint) vs. larger key/value (intsome_reftype), (int*large_valuetype) etc.

wrt ofArray: one idea is to create map nodes during a partitioned sort so you don't have to go over the result again later. the result tree can be balanced fairly easily.

@krauthaufen
Copy link
Author

Hey, I will run the benchmarks on other key/value types soon, but the ones you mentioned seem to be "unfair" since they only test adding/removing sorted keys. Mine actually look quite similar but with randomized data. Obviously this creates a little jitter in the runtimes, but nonetheless I think it is "fairer"

@yatli
Copy link

yatli commented Dec 14, 2020

Fair point. I brought that up because it contains somewhat richer pattern (for all stuff in the map, access by key and then ignore etc.)

Your trees are mostly balanced so it'll probably do a good job in those tests.

@yatli
Copy link

yatli commented Dec 14, 2020

I just found that all of these sorts are non-stable which means that my logic for duplicates won't work (which my tests sadly missed).

hmm I don't understand, do you want to mimic the behavior of keeping the last entry of the same key? otherwise stability doesn't affect dedup.

another idea is to allocate the temp array of actual map nodes so you don't have to create the nodes again after the sort -- not sure if it's a good idea because of another level of indirection, though. (memory efficiency ++, speed --)

@yatli
Copy link

yatli commented Dec 14, 2020

I'd like to also mention the runtime profiles -- I'm running the benchmark now and it looks like it's only testing the workstation concurrent GC scenario. We should perhaps also test with the config option <gcServer>true</gcServer>

@yatli
Copy link

yatli commented Dec 14, 2020

Here's the workstation GC result for count = 100 on my machine:

// * Summary *
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.685 (2004/?/20H1)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.101
  [Host]     : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT DEBUG
  DefaultJob : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
Method Count Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Map_ofArray 100 21,540.72 ns 130.719 ns 109.156 ns 6.5918 - - 27616 B
MapNew_ofArray 100 6,030.92 ns 72.043 ns 60.159 ns 1.3351 - - 5592 B
Map_ofList 100 22,660.64 ns 441.946 ns 526.105 ns 6.6223 - - 27712 B
MapNew_ofList 100 6,572.73 ns 71.084 ns 59.358 ns 1.8158 - - 7608 B
Map_ofSeq 100 23,373.67 ns 133.495 ns 111.475 ns 6.8665 - - 28744 B
MapNew_ofSeq 100 7,197.75 ns 95.528 ns 84.683 ns 1.8234 - - 7648 B
Map_toArray 100 1,821.67 ns 21.711 ns 20.308 ns 1.5354 - - 6424 B
MapNew_toArray 100 784.71 ns 6.124 ns 5.429 ns 0.7763 - - 3248 B
Map_toList 100 1,445.31 ns 24.479 ns 25.138 ns 1.3390 - - 5600 B
MapNew_toList 100 1,031.56 ns 5.313 ns 4.710 ns 1.3390 - - 5600 B
Map_enumerate 100 3,570.17 ns 21.298 ns 19.922 ns 1.8616 - - 7800 B
MapNew_enumerate 100 1,764.33 ns 33.628 ns 35.981 ns 0.9556 - - 4000 B
Map_toSeq_enum 100 5,274.36 ns 10.658 ns 8.900 ns 2.3270 - - 9752 B
MapNew_toSeq_enum 100 4,716.17 ns 81.952 ns 100.644 ns 1.5717 - - 6592 B
Map_containsKey_all 100 5,530.57 ns 93.365 ns 99.900 ns - - - -
MapNew_containsKey_all 100 2,424.88 ns 22.708 ns 20.130 ns - - - -
Map_containsKey_nonexisting 100 53.51 ns 0.570 ns 0.445 ns - - - -
MapNew_containsKey_nonexisting 100 24.84 ns 0.525 ns 0.491 ns - - - -
Map_tryFind 100 50.59 ns 0.159 ns 0.132 ns 0.0057 - - 24 B
MapNew_tryFind 100 27.06 ns 0.065 ns 0.060 ns 0.0057 - - 24 B
Map_tryFind_nonexisting 100 41.01 ns 0.745 ns 0.697 ns - - - -
MapNew_tryFind_nonexisting 100 22.09 ns 0.131 ns 0.116 ns - - - -
Map_remove_all 100 18,913.17 ns 72.131 ns 63.942 ns 6.4697 - - 27104 B
MapNew_remove_all 100 15,602.63 ns 219.163 ns 183.011 ns 5.6763 - - 23816 B
Map_exists 100 704.56 ns 9.028 ns 8.003 ns 0.0057 - - 24 B
MapNew_exists 100 353.35 ns 2.403 ns 2.248 ns 0.0057 - - 24 B
Map_fold 100 639.10 ns 2.954 ns 2.467 ns 0.0057 - - 24 B
MapNew_fold 100 393.61 ns 7.322 ns 6.849 ns 0.0057 - - 24 B
Map_foldBack 100 637.56 ns 6.142 ns 5.445 ns 0.0057 - - 24 B
MapNew_foldBack 100 383.18 ns 2.343 ns 1.957 ns 0.0057 - - 24 B

Looks like we've got similar specs :)

@krauthaufen
Copy link
Author

Nice to see that I was not hallucinating my benchmark results 😋

hmm I don't understand, do you want to mimic the behavior of keeping the last entry of the same key?

Precisely. That's why stable sorting is crucial here and my simple mergeSort was (so far) faster than anything else i tried.

another idea is to allocate the temp array of actual map nodes so you don't have to create the nodes again after the sort -- not sure if it's a good idea because of another level of indirection, though. (memory efficiency ++, speed --)

Not sure if this can help, I'd need to allocate leaves for everything and then combine them together in inner-nodes, etc.
The current implementation just sorts the tuples (without allocating new ones) and then allocates precisely the output-nodes.

During this process i need to allocate 2 auxillary Key*Value arrays of length n for the mergeSort, but I think that's reasonable and they get unreachable directly after the tree-build. Of course 0/1 aux array would be better but then we would need an in-place stable sort (like TimSort) that is equally fast.

@yatli
Copy link

yatli commented Dec 14, 2020

krauthaufen/MapNew#1

@yatli
Copy link

yatli commented Dec 14, 2020

The gcServer profile:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.685 (2004/?/20H1)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.101
  [Host]     : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT DEBUG
  Job-HPVMQI : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

Server=True  
Method Count Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Map_ofArray 100 21,309.23 ns 92.137 ns 86.185 ns 1.00 0.00 0.7324 - - 27400 B
MapNew_ofArray 100 5,874.02 ns 113.713 ns 166.680 ns 0.28 0.01 0.1373 - - 5592 B
Map_ofList 100 23,646.63 ns 396.236 ns 486.613 ns 1.00 0.00 0.7019 - - 28768 B
MapNew_ofList 100 6,447.32 ns 97.939 ns 91.612 ns 0.27 0.01 0.1907 - - 7608 B
Map_ofSeq 100 23,871.15 ns 386.093 ns 396.489 ns 1.00 0.00 0.7019 - - 28552 B
MapNew_ofSeq 100 6,790.94 ns 46.358 ns 36.193 ns 0.28 0.01 0.1907 - - 7648 B
Map_toArray 100 1,862.72 ns 31.103 ns 25.973 ns ? ? 0.1659 - - 6424 B
MapNew_toArray 100 850.00 ns 16.036 ns 15.750 ns ? ? 0.0849 - - 3248 B
Map_toList 100 1,486.52 ns 3.886 ns 3.245 ns ? ? 0.1411 - - 5600 B
MapNew_toList 100 1,138.65 ns 4.410 ns 3.683 ns ? ? 0.1431 - - 5600 B
Map_enumerate 100 3,757.96 ns 74.561 ns 111.599 ns 1.00 0.00 0.1907 - - 7680 B
MapNew_enumerate 100 1,625.09 ns 6.558 ns 6.135 ns 0.42 0.01 0.1030 - - 4000 B
Map_toSeq_enum 100 5,496.65 ns 72.550 ns 67.863 ns 1.00 0.00 0.2594 - - 10112 B
MapNew_toSeq_enum 100 4,813.82 ns 95.631 ns 146.039 ns 0.89 0.02 0.1602 - - 6592 B
Map_containsKey_all 100 5,308.16 ns 28.346 ns 26.515 ns 1.00 0.00 - - - -
MapNew_containsKey_all 100 2,206.53 ns 27.926 ns 21.803 ns 0.42 0.00 - - - -
Map_containsKey_nonexisting 100 53.82 ns 0.841 ns 0.826 ns 1.00 0.00 - - - -
MapNew_containsKey_nonexisting 100 22.35 ns 0.120 ns 0.101 ns 0.41 0.01 - - - -
Map_tryFind 100 54.65 ns 0.321 ns 0.268 ns 1.00 0.00 0.0006 - - 24 B
MapNew_tryFind 100 21.80 ns 0.478 ns 0.424 ns 0.40 0.01 0.0006 - - 24 B
Map_tryFind_nonexisting 100 55.36 ns 0.601 ns 0.562 ns 1.00 0.00 - - - -
MapNew_tryFind_nonexisting 100 21.59 ns 0.407 ns 0.381 ns 0.39 0.01 - - - -
Map_remove_all 100 20,126.44 ns 150.161 ns 133.114 ns 1.00 0.00 0.7019 - - 28136 B
MapNew_remove_all 100 15,826.98 ns 294.046 ns 275.051 ns 0.79 0.01 0.6104 - - 23288 B
Map_exists 100 691.36 ns 3.394 ns 3.175 ns 1.00 0.00 - - - 24 B
MapNew_exists 100 367.96 ns 5.424 ns 4.808 ns 0.53 0.01 0.0005 - - 24 B
Map_fold 100 610.81 ns 3.087 ns 2.736 ns 1.00 0.00 - - - 24 B
MapNew_fold 100 386.97 ns 0.978 ns 0.867 ns 0.63 0.00 0.0005 - - 24 B
Map_foldBack 100 594.96 ns 2.724 ns 2.548 ns 1.00 0.00 - - - 24 B
MapNew_foldBack 100 383.56 ns 7.311 ns 6.839 ns 0.64 0.01 0.0005 - - 24 B

Conclusion: even more improvements with server profile!

@yatli
Copy link

yatli commented Dec 14, 2020

Sorting.fs,L221:

        while sortedLengthDbl <= src.Length do

consider change to while sortedLength < src.Length? In doing so we can remove:

Sorting.fs,L224:

        if sortedLength < src.Length then
            let cnt = mergeSeqHandleDuplicates cmp 0 sortedLength sortedLength src dst
            struct(dst, cnt)
        else

@krauthaufen
Copy link
Author

This was actually intentional since it saves iterating the array once and saves a few percent.

@krauthaufen
Copy link
Author

I'm currently focusing on getting ofList/ofSeq/ofArray faster for very small counts (benchmarks currently running). The code was a bit painful to write but I think it might be worth it.

@krauthaufen
Copy link
Author

Hey, my optimizations for super-small counts seem to work decently.

For counts in [1..8] the new implementation is always at least as fast (within error margins) as the original one and gets a whole lot faster for large inputs.

// * Summary *

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
  [Host]     : .NET Core 3.1.10 (CoreCLR 4.700.20.51601, CoreFX 4.700.20.51901), X64 RyuJIT DEBUG
  Job-DPNHNG : .NET Core 3.1.10 (CoreCLR 4.700.20.51601, CoreFX 4.700.20.51901), X64 RyuJIT

Server=True
Method Count Mean Error StdDev Median Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Map_ofArray 1 17.07 ns 0.113 ns 0.106 ns 17.03 ns 1.00 0.00 0.0007 - - 64 B
MapNew_ofArray 1 14.48 ns 0.355 ns 0.816 ns 14.79 ns 0.79 0.07 0.0006 - - 56 B
Map_ofList 1 27.63 ns 0.136 ns 0.127 ns 27.66 ns 1.00 0.00 0.0010 - - 88 B
MapNew_ofList 1 11.73 ns 0.121 ns 0.113 ns 11.70 ns 0.42 0.00 0.0006 - - 56 B
Map_ofSeq 1 30.38 ns 0.188 ns 0.176 ns 30.38 ns 1.00 0.00 0.0010 - - 88 B
MapNew_ofSeq 1 15.98 ns 0.051 ns 0.040 ns 15.99 ns 0.53 0.00 0.0006 - - 56 B
Map_ofArray 2 31.17 ns 0.141 ns 0.132 ns 31.19 ns 1.00 0.00 0.0013 - - 112 B
MapNew_ofArray 2 32.12 ns 0.164 ns 0.154 ns 32.18 ns 1.03 0.01 0.0011 - - 104 B
Map_ofList 2 41.19 ns 0.166 ns 0.147 ns 41.18 ns 1.00 0.00 0.0015 - - 136 B
MapNew_ofList 2 31.41 ns 0.199 ns 0.186 ns 31.37 ns 0.76 0.00 0.0011 - - 104 B
Map_ofSeq 2 43.78 ns 0.164 ns 0.137 ns 43.78 ns 1.00 0.00 0.0015 - - 136 B
MapNew_ofSeq 2 37.03 ns 0.448 ns 0.397 ns 36.94 ns 0.85 0.01 0.0011 - - 104 B
Map_ofArray 3 84.56 ns 0.516 ns 0.482 ns 84.45 ns 1.00 0.00 0.0023 - - 208 B
MapNew_ofArray 3 90.89 ns 0.630 ns 0.589 ns 90.69 ns 1.07 0.01 0.0021 - - 200 B
Map_ofList 3 79.10 ns 0.253 ns 0.224 ns 79.11 ns 1.00 0.00 0.0023 - - 208 B
MapNew_ofList 3 54.52 ns 0.347 ns 0.324 ns 54.42 ns 0.69 0.00 0.0014 - - 128 B
Map_ofSeq 3 98.81 ns 0.500 ns 0.443 ns 98.70 ns 1.00 0.00 0.0025 - - 232 B
MapNew_ofSeq 3 90.07 ns 0.668 ns 0.592 ns 90.21 ns 0.91 0.01 0.0023 - - 200 B
Map_ofArray 4 182.91 ns 0.601 ns 0.563 ns 182.89 ns 1.00 0.00 0.0041 - - 376 B
MapNew_ofArray 4 94.31 ns 1.924 ns 2.760 ns 92.69 ns 0.52 0.02 0.0019 - - 176 B
Map_ofList 4 218.61 ns 0.670 ns 0.627 ns 218.55 ns 1.00 0.00 0.0050 - - 448 B
MapNew_ofList 4 105.21 ns 0.439 ns 0.390 ns 105.22 ns 0.48 0.00 0.0025 - - 224 B
Map_ofSeq 4 176.96 ns 0.594 ns 0.526 ns 176.78 ns 1.00 0.00 0.0038 - - 352 B
MapNew_ofSeq 4 140.27 ns 1.661 ns 1.387 ns 140.14 ns 0.79 0.01 0.0029 - - 272 B
Map_ofArray 5 255.16 ns 1.817 ns 1.517 ns 254.98 ns 1.00 0.00 0.0052 - - 496 B
MapNew_ofArray 5 126.41 ns 0.737 ns 0.690 ns 126.46 ns 0.50 0.00 0.0024 - - 224 B
Map_ofList 5 188.18 ns 0.677 ns 0.566 ns 188.12 ns 1.00 0.00 0.0043 - - 400 B
MapNew_ofList 5 122.11 ns 0.233 ns 0.207 ns 122.14 ns 0.65 0.00 0.0021 - - 200 B
Map_ofSeq 5 223.41 ns 1.351 ns 1.264 ns 223.52 ns 1.00 0.00 0.0050 - - 448 B
MapNew_ofSeq 5 276.21 ns 2.683 ns 2.509 ns 275.32 ns 1.24 0.01 0.0057 - - 512 B
Map_ofArray 6 265.37 ns 1.009 ns 0.895 ns 265.47 ns 1.00 0.00 0.0057 - - 520 B
MapNew_ofArray 6 207.73 ns 0.587 ns 0.520 ns 207.74 ns 0.78 0.00 0.0043 - - 392 B
Map_ofList 6 272.02 ns 1.520 ns 1.422 ns 271.67 ns 1.00 0.00 0.0057 - - 544 B
MapNew_ofList 6 221.65 ns 0.863 ns 0.765 ns 221.63 ns 0.81 0.00 0.0052 - - 472 B
Map_ofSeq 6 275.55 ns 0.819 ns 0.726 ns 275.70 ns 1.00 0.00 0.0057 - - 544 B
MapNew_ofSeq 6 232.76 ns 0.823 ns 0.770 ns 232.49 ns 0.84 0.00 0.0052 - - 472 B
Map_ofArray 7 451.17 ns 1.268 ns 1.124 ns 451.04 ns 1.00 0.00 0.0086 - - 784 B
MapNew_ofArray 7 237.82 ns 1.106 ns 0.980 ns 237.71 ns 0.53 0.00 0.0048 - - 432 B
Map_ofList 7 362.27 ns 1.705 ns 1.595 ns 362.23 ns 1.00 0.00 0.0072 - - 664 B
MapNew_ofList 7 261.09 ns 1.908 ns 1.692 ns 261.01 ns 0.72 0.00 0.0052 - - 504 B
Map_ofSeq 7 506.84 ns 2.857 ns 2.533 ns 506.19 ns 1.00 0.00 0.0095 - - 904 B
MapNew_ofSeq 7 259.65 ns 1.002 ns 0.937 ns 259.80 ns 0.51 0.00 0.0052 - - 504 B
Map_ofArray 8 574.78 ns 3.576 ns 3.345 ns 574.83 ns 1.00 0.00 0.0105 - - 1000 B
MapNew_ofArray 8 282.62 ns 0.973 ns 0.863 ns 282.76 ns 0.49 0.00 0.0052 - - 496 B
Map_ofList 8 613.85 ns 2.704 ns 2.397 ns 613.49 ns 1.00 0.00 0.0124 - - 1096 B
MapNew_ofList 8 307.42 ns 1.513 ns 1.415 ns 307.66 ns 0.50 0.00 0.0062 - - 560 B
Map_ofSeq 8 606.14 ns 2.841 ns 2.657 ns 606.27 ns 1.00 0.00 0.0114 - - 1048 B
MapNew_ofSeq 8 311.47 ns 1.205 ns 1.068 ns 311.31 ns 0.51 0.00 0.0062 - - 560 B

@yatli
Copy link

yatli commented Dec 14, 2020

This was actually intentional since it saves iterating the array once and saves a few percent.

Hmm if so why not force the condition sortedLength < src.Length? Currently arrays of length 2^n do not benefit this last round run.

@yatli
Copy link

yatli commented Dec 14, 2020

Like, krauthaufen/MapNew#2 ?

@krauthaufen
Copy link
Author

good idea, thanks.
it was a little late when i coded that 😅

@yatli
Copy link

yatli commented Dec 14, 2020

How do we maintain compatibility with the serialization libraries?
Like Json.Net etc.

@krauthaufen
Copy link
Author

Hey, I just did some Benchmarks with decimal as Key and/or Value and a ReferenceType (custom wrapped decimal) which all show more or less the identical behaviour as the int*int case.

However the LanguagePrimitives.FastGenericComparer boxes custom structs (also Guid for example) and when that happens the new implementation performs more or less identical to the old one. I should note that the overall runtime for almost all operations is drastically increased for both implementations in that case. (I think there's an old issue covering that)

A possible explanation for this is that, given an expensive (boxing) comparison, the new implementation of ofArray may perform slightly more comparisons than the old one.

The serialization thing is really a little tricky, since I have no idea how these serializations look internally?
Another question is whether or not these shall even be compatible?
FsPickler for example has a special-case for Map (using toArray, ofArray afaik.)

@krauthaufen
Copy link
Author

A quick check revealed that Json.Net does:

let m = MapNew.ofArray [|1,2;3,4;5,6|]
let json = Json.Net.JsonNet.Serialize(m)
printfn "%s" json // => ["1":2,"3":4,"5":6]

Maybe due to my IDictionary<'Key, 'Value> implementation?

@yatli
Copy link

yatli commented Dec 14, 2020

["1":2,"3":4,"5":6]

Quite good then! It matches the results of the old map.

@dsyme
Copy link
Collaborator

dsyme commented Dec 15, 2020

Amazing work here!

For serialization I believe the contract is a serializedData field that gets populated on-demand, here's the current implementation

Is this feasible?

    [<System.NonSerialized>]
    // This type is logically immutable. This field is only mutated during deserialization.
    let mutable comparer = comparer
 
    [<System.NonSerialized>]
    // This type is logically immutable. This field is only mutated during deserialization.
    let mutable tree = tree

    // WARNING: The compiled name of this field may never be changed because it is part of the logical
    // WARNING: permanent serialization format for this type.
    let mutable serializedData = null

    [<System.Runtime.Serialization.OnSerializingAttribute>]
    member __.OnSerializing(context: System.Runtime.Serialization.StreamingContext) =
        ignore context
        serializedData <- MapTree.toArray tree |> Array.map (fun (k, v) -> KeyValuePair(k, v))

    [<System.Runtime.Serialization.OnDeserializedAttribute>]
    member __.OnDeserialized(context: System.Runtime.Serialization.StreamingContext) =
        ignore context
        comparer <- LanguagePrimitives.FastGenericComparer<'Key>
        tree <- serializedData |> Array.map (fun kvp -> kvp.Key, kvp.Value) |> MapTree.ofArray comparer
        serializedData <- null

@krauthaufen
Copy link
Author

Hey, we could certainly do that.

Btw I finally started creating a Fable app for visualizing benchmark results and it's just amazing how simple that is 😁

I'll add the serialization-stuff once I'm done with my visualization

@krauthaufen
Copy link
Author

https://aardvarkians.com/demo/MapNew/

@yatli
Copy link

yatli commented Dec 15, 2020

However the LanguagePrimitives.FastGenericComparer boxes custom structs (also Guid for example) and when that happens the new implementation performs more or less identical to the old one.

The optimizer is aware of this path and will try to de-virtualize the call (see generic_comparison_inner_vref)
But it only works on F# generated structs (those with CompareTo)

See:

  • prim-type.fs,L1095,GenericComparisonIntrinsic
  • Optimizer.fs,L2602, ... when CanDevirtualizeApplication cenv v cenv.g.generic_comparison_inner_vref ty args ...

I think we can definitely do better than the most generic comparer in some cases (e.g. structs, non-null stuff) -- we can add them to the optimizer as separate comparison de-virtualization cases.

Edit: for well-known C# types (Guid etc.) we can directly add them to the fast generic comparer table, see prim-types.fs,L2084, type FastGenericEqualityComparerTable<'T> ...

@krauthaufen
Copy link
Author

Hey, The virtual call is only one part of the problem here I think.

here is a :? IComparable which works but still boxes the struct value and therefore allocates garbage. Then the virtual CompareTo(o : obj) gets invoked which again boxes the second argument.

So for structs we effectively pay 2 boxings and 1 virtual call (and maybe a tuple allocation if the compiler doesn't optimize it)

I have no idea how this is avoided in System.Collections.Generic.Comparer<'T>.Default but it somehow seems to be (at least my tests are way faster when using it)

Btw the same goes for custom structs and Equals (when structs override Equals): the struct gets boxed and the virtual Equals is invoked.

But I think that's a different issue.

@yatli
Copy link

yatli commented Dec 15, 2020

here is a :? IComparable which works but still boxes the struct value and therefore allocates garbage. Then the virtual CompareTo(o : obj) gets invoked which again boxes the second argument.

If properly optimized it won't go into this path. The optimizer will rewrite the AST.

@krauthaufen
Copy link
Author

Regarding the Map implementation:

  • I added several new things: union/unionWith/GetSlice/etc.
  • All benchmarks for the current state are running atm. with better iteration-counts, etc. (may take a while)
  • I think that I'm done optimizing the code and profiles indicate that there is very little left to optimize. (I'm also out of ideas 🤣)
  • So when you really want this in FSharp.Core I could create a Set implementation and put together a PR for FSharp.Core

The preliminary benchmark results (small iteration counts) can be seen here

@yatli
Copy link

yatli commented Dec 15, 2020

Fantastic visualization 😄

@vzarytovskii
Copy link

This is absolutely fantastic, thank you for all that work @krauthaufen!

@yatli
Copy link

yatli commented Dec 16, 2020

@krauthaufen maybe I can help with the optimizer if this suggestion is approved. We can then set up a branch and work with FSharp.Core and fsc directly.

@JustinWick
Copy link

JustinWick commented Dec 20, 2020

@krauthaufen If this pans out, it will be absurdly useful. Thanks for all of your hard work, this is exciting, and educational for all of us here on the thread!

@cartermp
Copy link
Member

@krauthaufen let's take this forward into dotnet/fsharp as a PR to both map and set (or separate PRs). Note that the compiler also uses, effectively, a copy of these data structures internally where the same optimizations would likely have to be made. But it's not too bad.

@krauthaufen
Copy link
Author

krauthaufen commented Dec 20, 2020

Hey, will do tomorrow. Cool that people are interested in this.

I hope the set implementation will show similar speedups.

Cheers

@mvkara
Copy link

mvkara commented Jan 11, 2021

Just curious - is it considered a breaking change to move away from an AVL tree? I remember some discussion in the past about this; I'm guessing it probably is but I remember people being receptive to changing that at the time.

The reason I ask is that from my experiments and coding a custom map for work purposes you can get significantly more benefit from other data structures (e.g. HAMT). Unfortunately in my work I was forced to roll out my own since F# map was too slow; it would of been nice to have a fast friendly implementation out of the box. Performance aside a fast immutable HashMap implementation that doesn't need the 'comparison' constraint would of helped significantly in some previous projects where I didn't have control of the key type (e.g. C# sourced objects).

The benefits can be quite substantial - I was seeing performance improvements often between 3 and 10x for small and large collections respectively. e.g from https://github.com/mvkara/fsharp-hashcollections with more detailed benchmarks for a simple ''tryFind' operation.

|                           Method | CollectionSize |        Mean |    Error |   StdDev |
|--------------------------------- |--------------- |------------:|---------:|---------:|
|                       HAMTMap    |             10 |    11.07 ns | 0.071 ns | 0.063 ns |
|                GetImToolsHashMap |             10 |    28.52 ns | 0.203 ns | 0.190 ns |
|                     GetFSharpMap |             10 |    39.55 ns | 0.540 ns | 0.505 ns |
|                GetFSharpXHashMap |             10 |   103.17 ns | 1.195 ns | 1.118 ns |
| GetSystemCollectionsImmutableMap |             10 |    24.38 ns | 0.318 ns | 0.298 ns |
|         GetFSharpDataAdaptiveMap |             10 |    21.88 ns | 0.088 ns | 0.082 ns |

|                       HAMTMap    |            100 |    15.80 ns | 0.027 ns | 0.025 ns |
|                GetImToolsHashMap |            100 |    42.50 ns | 0.831 ns | 0.816 ns |
|                     GetFSharpMap |            100 |    70.69 ns | 1.385 ns | 1.595 ns |
|                GetFSharpXHashMap |            100 |   114.97 ns | 1.520 ns | 1.422 ns |
| GetSystemCollectionsImmutableMap |            100 |    36.47 ns | 0.349 ns | 0.327 ns |
|         GetFSharpDataAdaptiveMap |            100 |    43.15 ns | 0.098 ns | 0.082 ns |

|                       HAMTMap    |           1000 |    12.83 ns | 0.017 ns | 0.014 ns |
|                GetImToolsHashMap |           1000 |    63.37 ns | 0.473 ns | 0.442 ns |
|                     GetFSharpMap |           1000 |   107.45 ns | 0.750 ns | 0.702 ns |
|                GetFSharpXHashMap |           1000 |   121.31 ns | 2.338 ns | 2.187 ns |
| GetSystemCollectionsImmutableMap |           1000 |    55.16 ns | 0.103 ns | 0.097 ns |
|         GetFSharpDataAdaptiveMap |           1000 |    66.20 ns | 0.463 ns | 0.433 ns |

|                       HAMTMap    |        5000000 |   142.23 ns | 0.050 ns | 0.047 ns |
|                GetImToolsHashMap |        5000000 | 1,282.78 ns | 0.596 ns | 0.558 ns |
|                     GetFSharpMap |        5000000 |   851.10 ns | 1.426 ns | 1.334 ns |
|                GetFSharpXHashMap |        5000000 |   375.16 ns | 0.812 ns | 0.719 ns |
| GetSystemCollectionsImmutableMap |        5000000 |   850.68 ns | 0.405 ns | 0.359 ns |
|         GetFSharpDataAdaptiveMap |        5000000 |   608.75 ns | 0.250 ns | 0.233 ns |

@cartermp
Copy link
Member

Switching the data structure under the covers isn't a breaking change, no. At least not one we really consider here. We're free to change internals at any point, and if someone is able to use reflection to depend on that then that's a risk we're assuming they are willingly taking.

That said, the churn factor for something like that is quite high, and changes that aren't incremental are much more difficult to accept.

@krauthaufen
Copy link
Author

krauthaufen commented Jan 16, 2021

Hey, after doing yet another map implementation (Yam) with the insights gained here I managed to make most operations a bit faster (especially note the ofArray performance which now doesn't copy anything and therefore runs with O(1) scratch-memory)

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.746 (2004/?/20H1)
Intel Core i7-4930K CPU 3.40GHz (Haswell), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100
  [Host]     : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT DEBUG
  Job-WQVKON : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

Server=False
Method Count Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Yam_add 100 15,401.56 ns 243.887 ns 216.199 ns 0.80 0.02 5.8289 - - 36608 B
Map_add 100 19,254.27 ns 378.406 ns 566.381 ns 1.00 0.00 5.7373 - - 36136 B
Yam_remove 100 15,461.79 ns 307.595 ns 554.657 ns 0.79 0.04 4.9744 - - 31232 B
Map_remove 100 19,426.96 ns 381.377 ns 546.959 ns 1.00 0.00 5.4321 - - 34144 B
Yam_ofArray 100 9,095.73 ns 181.949 ns 367.546 ns 0.60 0.02 0.7019 0.0153 - 4416 B
Map_ofArray 100 15,010.79 ns 194.285 ns 162.237 ns 1.00 0.00 4.4556 0.1373 - 28000 B
Yam_toArray 100 1,379.79 ns 27.016 ns 42.850 ns 0.65 0.03 0.5131 0.0114 - 3224 B
Map_toArray 100 2,122.62 ns 41.669 ns 66.091 ns 1.00 0.00 1.0223 0.0191 - 6424 B
Yam_containsKey_all 100 3,586.66 ns 55.260 ns 51.690 ns 1.00 0.08 - - - -
Map_containsKey_all 100 3,622.26 ns 71.927 ns 176.438 ns 1.00 0.00 - - - -
Yam_containsKey_nonexisting 100 28.18 ns 0.570 ns 0.610 ns 0.85 0.02 - - - -
Map_containsKey_nonexisting 100 33.33 ns 0.581 ns 0.515 ns 1.00 0.00 - - - -
Yam_exists 100 380.36 ns 5.128 ns 4.546 ns 0.92 0.01 0.0038 - - 24 B
Map_exists 100 413.72 ns 3.673 ns 3.436 ns 1.00 0.00 0.0038 - - 24 B

When maintaining a count per inner-node (allowing for O(1) count and O(log N) positional queries) the results are still quite acceptable:

// * Summary *

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.746 (2004/?/20H1)
Intel Core i7-4930K CPU 3.40GHz (Haswell), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100
  [Host]     : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT DEBUG
  Job-NTZQCI : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

Server=False
Method Count Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Yam_add 100 17,137.13 ns 338.754 ns 527.398 ns 0.88 0.03 5.7983 - - 36424 B
Map_add 100 19,499.56 ns 309.162 ns 289.190 ns 1.00 0.00 5.8594 - - 36832 B
Yam_remove 100 17,596.50 ns 344.689 ns 423.309 ns 0.98 0.03 5.8289 - - 36672 B
Map_remove 100 17,981.42 ns 359.338 ns 413.814 ns 1.00 0.00 5.4321 - - 34112 B
Yam_ofArray 100 11,144.64 ns 221.712 ns 217.751 ns 0.67 0.02 0.7782 0.0153 - 4912 B
Map_ofArray 100 16,669.08 ns 331.241 ns 622.151 ns 1.00 0.00 4.5776 0.1221 - 28744 B
Yam_toArray 100 1,025.46 ns 18.328 ns 30.114 ns 0.48 0.03 0.5131 0.0114 - 3224 B
Map_toArray 100 2,137.10 ns 40.995 ns 66.199 ns 1.00 0.00 1.0223 0.0191 - 6424 B
Yam_containsKey_all 100 3,575.39 ns 70.659 ns 96.719 ns 1.05 0.03 - - - -
Map_containsKey_all 100 3,428.17 ns 67.072 ns 74.550 ns 1.00 0.00 - - - -
Yam_containsKey_nonexisting 100 33.52 ns 0.708 ns 1.038 ns 1.08 0.03 - - - -
Map_containsKey_nonexisting 100 31.19 ns 0.516 ns 0.403 ns 1.00 0.00 - - - -
Yam_exists 100 366.18 ns 7.328 ns 9.268 ns 0.95 0.03 0.0038 - - 24 B
Map_exists 100 385.18 ns 5.914 ns 6.328 ns 1.00 0.00 0.0038 - - 24 B

However note that the Map benchmarks differ between the two runs, so I'll investigate.

If you're still interested in the (now smaller) improvements I can create a new PR (or adapt this one). Note that I have something in mind for keeping the overall count (not per inner node but globally per map) that should be quite efficient.
Please let me know what you think. The implementation is here

@krauthaufen
Copy link
Author

@mvkara I think changing the Map implementation to a HAMT actually is a breaking change since

  1. it requires equality instead of comparison
  2. it's no longer sorted (it can be sorted by hashcode but not by key)
  3. things like tryMin tryMax would have O(N) runtime instead of O(log N)

However I think adding a separate immutable HashMap/HashSet to FSharp.Core would be really awesome.

@cartermp
Copy link
Member

Closing this out as there is significant progress made in dotnet/fsharp and discussions going on in there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants