Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow in those type of calcs #92184

Open
johnnygiter opened this issue Sep 16, 2023 · 15 comments
Open

Very slow in those type of calcs #92184

johnnygiter opened this issue Sep 16, 2023 · 15 comments
Labels
Milestone

Comments

@johnnygiter
Copy link

Net (all versions, 6, 7 and coming 8 are all very slow compared to other languages)
I do not expect to be as fast as e.g. rust, but being 10x slower than slow python is a bit of a shame:

https://programming-language-benchmarks.vercel.app/problem/edigits

I'm not sure if this will work there, but this is a suggestion for possible optimisations.
Hope that version 9 will do something with that.

@johnnygiter johnnygiter added the tenet-performance Performance related issue label Sep 16, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Sep 16, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 16, 2023
@neon-sunset
Copy link
Contributor

neon-sunset commented Sep 16, 2023

Seems to be an issue with BigInteger performance.

Given n = 250001, VS profiler shows most time is spent at these two lines: https://github.com/hanabi1224/Programming-Language-Benchmarks/blob/main/bench/algorithm/edigits/1.cs#L18-L19

@Suchiman
Copy link
Contributor

To be precise, most time is spent in BigInteger.op_division

@tannergooding
Copy link
Member

tannergooding commented Sep 16, 2023

These benchmark games are not necessarily representative of real world workloads. Nor do they always implement code the idiomatic way nor do they always do them the best way.

There are improvements that could be made to BigInteger to further improve its performance, but likewise one needs to account for whether that scenario is likely to be encountered in the real world and if there is a viable and obvious alternative that would better handle the scenario natively for the language in question.

Python largely only has 3 numeric types: int, float, complex and as such it is explicitly optimized to best handle the fact that these are the only numeric types it has. There are scenarios that you simply cannot handle well in Python because of this.

On the other hand, C#/.NET has many numeric types with different precisions and signedness: byte/sbyte, short/ushort, int/uint, long/ulong, nint/nuint, Int128/UInt128, float, double, Half, BigInteger, Complex, etc.

C#/.NET are correspondingly optimized around this fact and it is often expected that you select the right data type for your scenario, rather than solely relying on BigInteger for everything.

Much in the same way since Java doesn't have value types, their JIT has a ton of optimizations to work around this limitation. C#/.NET explicitly have value types and so we have less need to provide the same since it is expected you pick the right type for your scenario.

This allows C#/.NET users more control and often the ability to achieve overall better perf when you select the right tool for the job. Simple naive transpilation is often not sufficient nor a good representation of language performance.

@davidfowl
Copy link
Member

While that's true @tannergooding most of the benchmarks are using a big integer implementation so it looks pretty fair from that POV. I agree that it's unclear if it's real-world benchmark but if there's low hanging fruit in big integer this might be the motivation to improve it.

@tannergooding
Copy link
Member

Yes. BigInteger needs a full rewrite eventually, but it's much lower priority compared to other work.

The two key considerations are:

  1. BigInteger is readonly so every operation is a new allocation and copy
  2. BigInteger currently uses uint[] as its backing storage when it could benefit from using nuint[] so it can process twice as many bits per cycle on 64-bit and simplify it's multiplication/division algorithms

Accounting for the readonly factor would likely make the large factor in the perf here

@tannergooding
Copy link
Member

We have a higher perf mutable and inline buffer based BigInteger we use internally for things like floating-point parsing and formatting

@davidfowl
Copy link
Member

If there's a work item on BigInteger performance, we can link to these benchmarks.

@johnnygiter
Copy link
Author

johnnygiter commented Sep 17, 2023

Unfortunately explanations like "i's not real world scenario" and similar are not fully true
Actually this is is real world scenario someone want to do some calculations and other languages outperforming not 10 or 30% but 1000%.
IMHO definetely there is lot of room for improvements even in those benchmark games to be on top results rather than beeing on the bottom or "out of time" or at least not behind slow languages like python.

Good example is this simple code I did
On my hardware version 8 RC runs ~3x faster than version 7, and 6 with code below
Unfortunately 8.0 in mentioned benchmark in my first post is still horribly slow.

So potential hope for new version 9.0 about BigInteger improvements.

public class MyBenchmark
{
    private const int Iterations = 50000; 

    [Benchmark]
    public void BenchmarkSubtraction()
    {
        BigInteger total = BigInteger.Pow(10, 10000);
        for (int i = 0; i < Iterations; i++)
        {
            total -= i;
        }
    }

}

@davidfowl
Copy link
Member

Actually this is is real world scenario someone want to do some calculations and other languages outperforming not 10 or 30% but 1000%.

Maybe, but it's also possible they would use a different number type when using C#. That's not an excuse, but this isn't the only to do calculations in C# as @tannergooding mentioned.

@johnnygiter
Copy link
Author

Actually this is is real world scenario someone want to do some calculations and other languages outperforming not 10 or 30% but 1000%.

Maybe, but it's also possible they would use a different number type when using C#. That's not an excuse, but this isn't the only to do calculations in C# as @tannergooding mentioned.

That's why I called it "fully" because partially I can agree with tannergooding

@tannergooding
Copy link
Member

tannergooding commented Sep 17, 2023

The intent wasn't to say that using BigInteger to perform calculations wasn't "real world". But rather that many micro-benchmarks aren't representative of how such calculations might be done for scenarios that require it.

Microbenchmarks are, as the name implies, "micro". They are often a very small piece of the overall picture and may often have various constraints around the inputs/outputs that do not align with how you might use such code in a fully integrated production environment.

As I mentioned, Python primarily performs better in this particular case because they only have BigInteger and so it is always their hot path, for every application/library/framework. As such, they have done significant work to ensure it performs well across all scenarios.

However, in .NET, BigInteger is not typically part of the hot path. It is restricted to more specific domains and edge cases instead. As such, it does not get as much focus. Effort put into BigInteger helps a small percentage of apps, while effort put into things like Span<T> and LINQ, help almost all apps.

It is, of course, still important we optimize it further. But its prioritization may differ from other languages accordingly. And that when we optimize, we take a look at what the microbenchmark is doing and if it looks represenative of the work a non static scenario would need to compute. -- For example, using BigInteger to do floating-point parsing/formatting is a much more representative example than e-digits. This is because the former is a core algorithm used by all langauges/runtimes, in machine learning, graphics, etc. It is a very core operation. The latter is very domain specific and doesn't have many real world usages. The scenarios where e digits is useful is largely just for knowledge purposes and is better handled by a highly tuned scenario specific algorithm that might get distributed to run on a supercomputer (much as algorithms such as pi digits are done).


We are aware of some efforts that are needed to improve BigInteger and why it performs slower in this particular case.

For this scenario the biggest issue is that BigInteger is immutable and so a new allocation is required as part of every operation. This allocation isn't actually the expensive part, but rather having to work with and touch the additional memory as part of every operation.

The other factor is that our BigInteger implementation is still based on uint[] and so for larger values, it has to do more work than if it were backed by nuint[] instead. This is particularly impactful to multiplication and division operations.

The primary solution to the first factor would be to provide a class BigIntegerBuilder like type which is itself mutable.

The primary solution to the second factor would be to rewrite the implementation of BigInteger to better take advantage of 64-bit processors.

Neither of these are trivial tasks and would require significant work. The first would require API design and review, while the latter would simply require someone to do the implementation and review it.

There are notably some other more minor improvements that could be done as well. But they would be overall less impactful outside targeted scenarios like this. As an examples, we could provide a specialization for Pow(10, ...) or an explicit Exp2 and Exp10 function, to match what single/double have. We could likewise provide specialized functions for scenarios such as (a * b) / c (much as many places provide ModPow).

@ghost
Copy link

ghost commented Sep 18, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Net (all versions, 6, 7 and coming 8 are all very slow compared to other languages)
I do not expect to be as fast as e.g. rust, but being 10x slower than slow python is a bit of a shame:

https://programming-language-benchmarks.vercel.app/problem/edigits

I'm not sure if this will work there, but this is a suggestion for possible optimisations.
Hope that version 9 will do something with that.

Author: johnnygiter
Assignees: -
Labels:

area-System.Runtime, tenet-performance, untriaged, needs-area-label

Milestone: -

@tannergooding tannergooding added area-System.Numerics and removed area-System.Runtime untriaged New issue has not been triaged by the area owner needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Sep 18, 2023
@tannergooding tannergooding added this to the Future milestone Sep 18, 2023
@ghost
Copy link

ghost commented Sep 18, 2023

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Net (all versions, 6, 7 and coming 8 are all very slow compared to other languages)
I do not expect to be as fast as e.g. rust, but being 10x slower than slow python is a bit of a shame:

https://programming-language-benchmarks.vercel.app/problem/edigits

I'm not sure if this will work there, but this is a suggestion for possible optimisations.
Hope that version 9 will do something with that.

Author: johnnygiter
Assignees: -
Labels:

area-System.Numerics, tenet-performance

Milestone: -

@mjsabby
Copy link
Contributor

mjsabby commented Apr 28, 2024

BCryptEncrypt which admittedly is super optimized for the case it cares about ... but still, it's about 7 times faster than equivalent math using BigInteger. I've pasted this gist that compares two implementations

https://gist.github.com/mjsabby/b87b629fa902cd641b955892b1ab9604

@kzrnm
Copy link
Contributor

kzrnm commented Apr 28, 2024

This issue may be fixed in #96895.


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3527/23H2/2023Update/SunValley3)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 9.0.100-preview.3.24204.13
  [Host]     : .NET 9.0.0 (9.0.24.17209), X64 RyuJIT AVX2
  Job-HUDNJV : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-ZLKUHU : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2


Method Job Toolchain n Mean Error StdDev Ratio Code Size Gen0 Allocated Alloc Ratio
Run Job-HUDNJV \main\corerun.exe 100000 562.1 ms 5.64 ms 5.00 ms 1.00 12,619 B - 4.28 MB 1.00
Run Job-ZLKUHU \pr\corerun.exe 100000 160.0 ms 0.90 ms 0.80 ms 0.28 14,862 B 250.0000 4.28 MB 1.00
Run Job-HUDNJV \main\corerun.exe 250001 3,503.8 ms 25.70 ms 24.04 ms 1.00 12,680 B - 11.27 MB 1.00
Run Job-ZLKUHU \pr\corerun.exe 250001 936.8 ms 9.63 ms 9.01 ms 0.27 12,624 B - 11.27 MB 1.00
benchmark code
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Configs;
using System.Numerics;
using System.Runtime.InteropServices;

[DisassemblyDiagnoser]
[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByMethod)]
public class PerformanceTest
{
    [Benchmark]
    [Arguments(100000)]
    [Arguments(250001)]
    public string Run(int n)
    {
        var result = new System.Text.StringBuilder();
        var k = BinarySearch(n);
        var (p, q) = SumTerms(0, k - 1);
        p += q;
        var a = BigInteger.Pow(new BigInteger(10), n - 1);
        var answer = p * a / q;
        var answerStr = answer.ToString();
        Span<char> sb = stackalloc char[10];
        for (var i = 0; i < n; i += 10)
        {
            var count = i + 10;
            if (count > n)
            {
                count = n;
            }
            for (var j = i; j < i + 10; j++)
            {
                if (j < n)
                {
                    sb[j - i] = answerStr[j];
                }
                else
                {
                    sb[j - i] = ' ';
                }
            }

            result.AppendLine($"{new String(sb)}\t:{count}");
        }
        return result.ToString();
    }

    static (BigInteger, BigInteger) SumTerms(int a, int b)
    {
        if (b == a + 1)
        {
            return (new BigInteger(1), new BigInteger(b));
        }
        var mid = (a + b) / 2;
        var (pLeft, qLeft) = SumTerms(a, mid);
        var (pRight, qRight) = SumTerms(mid, b);
        return (pLeft * qRight + pRight, qLeft * qRight);
    }

    static int BinarySearch(int n)
    {
        var a = 0;
        var b = 1;
        while (!TestK(n, b))
        {
            a = b;
            b *= 2;
        }
        while (b - a > 1)
        {
            var m = (a + b) / 2;
            if (TestK(n, m))
            {
                b = m;
            }
            else
            {
                a = m;
            }
        }
        return b;
    }

    static bool TestK(int n, int k)
    {
        if (k < 0)
        {
            return false;
        }
        var lnKFactorial = k * (Math.Log((double)k) - 1) + 0.5 * Math.Log(Math.PI * 2);
        var log10KFactorial = lnKFactorial / Math.Log(10);
        return log10KFactorial >= (double)(n + 50);
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants