Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handwritten implementation of IEnumerator to avoid allocations #1255

Merged
merged 1 commit into from Apr 26, 2019

Conversation

Pliner
Copy link
Contributor

@Pliner Pliner commented Dec 10, 2018

Hi!

What issue does this PR address?
There are some allocations during an enumeration of ImmutableStack:

immutablestack enumerator

The PR fixes them by introducing handwritten enumerator which is struct and most of the time will live on the stack and will not be boxed.

Does this PR introduce a breaking change?
Nope

Please check if the PR fulfills these requirements

  • The commit follows our guidelines
  • Unit Tests for the changes have been added (for bug fixes / features)

Other information:

@tsimbalar
Copy link
Member

Thanks Yury for diving into this one!

Did you by any chance run the Performance tests to see which overall perf impact it has on the codebase ?

@Pliner
Copy link
Contributor Author

Pliner commented Dec 11, 2018

I created very simple benchmark:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

namespace ImmutableStackBenchmark
{
    public static class EntryPoint
    {
        public static void Main()
        {
            BenchmarkRunner.Run<ImmutableStack>();
        }
    }

    [ClrJob]
    public class ImmutableStack
    {
        [Params(8, 16, 32, 64, 128, 256, 512, 1024)]
        public int N;

        private OptimizedImmutableStack<int> optimizedImmutableStack;
        private ImmutableStack<int> immutableStack;

        [GlobalSetup]
        public void Setup()
        {
            optimizedImmutableStack = Enumerable.Range(0, N)
                                                .Aggregate(OptimizedImmutableStack<int>.Empty, (x, y) => x.Push(y));

            immutableStack = Enumerable.Range(0, N)
                                       .Aggregate(ImmutableStack<int>.Empty, (x, y) => x.Push(y));
        }

        [Benchmark]
        public int OptimizedForeach()
        {
            var sum = 0;
            foreach (var element in optimizedImmutableStack)
            {
                sum += element;
            }

            return sum;
        }

        [Benchmark]
        public int OptimizedLinq() => optimizedImmutableStack.Sum();

        [Benchmark]
        public int NonOptimizedForeach()
        {
            var sum = 0;
            foreach (var element in immutableStack)
            {
                sum += element;
            }

            return sum;
        }

        [Benchmark]
        public int NonOptimizedLinq() => immutableStack.Sum();
    }

    internal class OptimizedImmutableStack<T> : IEnumerable<T>
    {
        private readonly OptimizedImmutableStack<T> _under;

        private OptimizedImmutableStack()
        {
        }

        private OptimizedImmutableStack(OptimizedImmutableStack<T> under, T top)
        {
            if (under == null)
            {
                throw new ArgumentNullException(nameof(under));
            }

            _under = under;
            Count = under.Count + 1;
            Top = top;
        }

        public static OptimizedImmutableStack<T> Empty { get; } = new OptimizedImmutableStack<T>();

        public int Count { get; }

        public bool IsEmpty => _under == null;

        public T Top { get; }

        public Enumerator GetEnumerator() => new Enumerator(this);

        public OptimizedImmutableStack<T> Push(T t) => new OptimizedImmutableStack<T>(this, t);

        IEnumerator IEnumerable.GetEnumerator() => new Enumerator(this);

        IEnumerator<T> IEnumerable<T>.GetEnumerator() => new Enumerator(this);

        internal struct Enumerator : IEnumerator<T>
        {
            private readonly OptimizedImmutableStack<T> _stack;
            private OptimizedImmutableStack<T> _top;

            public Enumerator(OptimizedImmutableStack<T> stack)
            {
                _stack = stack;
                _top = stack;
                Current = default(T);
            }

            public bool MoveNext()
            {
                if (_top.IsEmpty)
                {
                    return false;
                }

                Current = _top.Top;
                _top = _top._under;
                return true;
            }

            public void Reset()
            {
                _top = _stack;
                Current = default(T);
            }

            public T Current { get; private set; }

            object IEnumerator.Current => Current;

            public void Dispose()
            {
            }
        }
    }

    internal class ImmutableStack<T> : IEnumerable<T>
    {
        private readonly ImmutableStack<T> _under;

        private ImmutableStack()
        {
        }

        private ImmutableStack(ImmutableStack<T> under, T top)
        {
            if (under == null)
            {
                throw new ArgumentNullException(nameof(under));
            }

            _under = under;
            Count = under.Count + 1;
            Top = top;
        }

        public static ImmutableStack<T> Empty { get; } = new ImmutableStack<T>();

        public int Count { get; }

        public bool IsEmpty => _under == null;

        public T Top { get; }

        public ImmutableStack<T> Push(T t) => new ImmutableStack<T>(this, t);

        IEnumerator IEnumerable.GetEnumerator()
        {
            var next = this;
            while (!next.IsEmpty)
            {
                yield return next.Top;
                next = next._under;
            }
        }

        IEnumerator<T> IEnumerable<T>.GetEnumerator()
        {
            var next = this;
            while (!next.IsEmpty)
            {
                yield return next.Top;
                next = next._under;
            }
        }
    }
}

There are four cases:

  1. OptimizedForeach. It tests the case of enumeration by foreach when public Enumerator GetEnumerator() is selected.
  2. OptimizedLinq. It tests the case of enumeration by foreach when public IEnumerator<T> GetEnumerator() is selected and the enumerator is boxed.
  3. NonOptimizedForeach. It tests the case of enumeration by foreach when public IEnumerator<T> GetEnumerator() is selected.
  4. NonOptimizedLinq. It tests the case of enumeration by foreach when public IEnumerator<T> GetEnumerator() is selected.

The result is the following:

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.15063.1387 (1703/CreatorsUpdate/Redstone2)
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
Frequency=3328127 Hz, Resolution=300.4693 ns, Timer=TSC
  [Host] : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.3190.0
  Clr    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.3190.0

Job=Clr  Runtime=Clr  
Method N Mean Error StdDev Median
OptimizedForeach 8 10.26 ns 0.1201 ns 0.1065 ns 10.25 ns
OptimizedLinq 8 99.16 ns 1.1621 ns 0.9073 ns 99.21 ns
NonOptimizedForeach 8 69.38 ns 1.3792 ns 1.2901 ns 68.93 ns
NonOptimizedLinq 8 75.46 ns 0.9207 ns 0.8612 ns 75.15 ns
OptimizedForeach 16 28.39 ns 0.5184 ns 0.4849 ns 28.18 ns
OptimizedLinq 16 169.75 ns 2.3815 ns 2.2277 ns 169.43 ns
NonOptimizedForeach 16 125.62 ns 3.4737 ns 4.0003 ns 124.31 ns
NonOptimizedLinq 16 135.82 ns 2.1474 ns 1.9036 ns 135.12 ns
OptimizedForeach 32 43.21 ns 0.5348 ns 0.4741 ns 43.09 ns
OptimizedLinq 32 304.63 ns 4.1617 ns 3.8928 ns 304.10 ns
NonOptimizedForeach 32 222.28 ns 1.7098 ns 1.5157 ns 221.88 ns
NonOptimizedLinq 32 233.84 ns 2.2037 ns 1.8402 ns 233.75 ns
OptimizedForeach 64 112.54 ns 0.6663 ns 0.6232 ns 112.48 ns
OptimizedLinq 64 558.39 ns 5.2943 ns 4.9523 ns 558.42 ns
NonOptimizedForeach 64 397.67 ns 3.4442 ns 3.2217 ns 397.03 ns
NonOptimizedLinq 64 446.37 ns 10.5691 ns 10.3803 ns 441.85 ns
OptimizedForeach 128 211.53 ns 2.5940 ns 2.4265 ns 210.67 ns
OptimizedLinq 128 1,112.32 ns 22.0308 ns 20.6076 ns 1,106.46 ns
NonOptimizedForeach 128 770.94 ns 7.3161 ns 5.7119 ns 769.48 ns
NonOptimizedLinq 128 874.76 ns 19.0325 ns 50.1393 ns 849.44 ns
OptimizedForeach 256 409.73 ns 5.6636 ns 5.2977 ns 408.73 ns
OptimizedLinq 256 2,237.75 ns 27.8624 ns 26.0625 ns 2,225.43 ns
NonOptimizedForeach 256 1,540.60 ns 12.3919 ns 11.5914 ns 1,544.21 ns
NonOptimizedLinq 256 1,704.67 ns 11.8290 ns 11.0649 ns 1,702.69 ns
OptimizedForeach 512 821.68 ns 2.4748 ns 2.1938 ns 821.49 ns
OptimizedLinq 512 4,441.98 ns 89.9906 ns 84.1773 ns 4,410.76 ns
NonOptimizedForeach 512 3,051.79 ns 38.0981 ns 35.6370 ns 3,047.93 ns
NonOptimizedLinq 512 3,345.61 ns 21.9264 ns 20.5100 ns 3,340.25 ns
OptimizedForeach 1024 1,671.96 ns 8.3684 ns 6.9880 ns 1,669.13 ns
OptimizedLinq 1024 8,724.86 ns 174.5482 ns 163.2724 ns 8,677.19 ns
NonOptimizedForeach 1024 6,064.96 ns 74.8370 ns 70.0026 ns 6,064.57 ns
NonOptimizedLinq 1024 6,688.24 ns 74.9410 ns 70.0998 ns 6,694.24 ns

The change fixes the allocation of enumerator and also speed up a bit the main case.

PS Will run perfomance tests a bit later because it is time-consuming 😉

@nblumhardt
Copy link
Member

Nice 👍 - I don't recall us using any LINQ on immutable stacks, but with a regression there it's good to have an eye on it. Benchmark results would be great, just in case we're tripped up anywhere :-)

@Pliner
Copy link
Contributor Author

Pliner commented Jan 13, 2019

Here are two benchmark results: the baseline(commit hash 5c3a782) and the PR.

As far as I understand cases of the benchmark, the most relevant is LogContextEnrichmentBenchmark.

net46

BenchmarkDotNet=v0.10.6, OS=Windows 10 Redstone 2 (10.0.15063)
Processor=Intel Core i7-6700 CPU 3.40GHz (Skylake), ProcessorCount=8
Frequency=3328125 Hz, Resolution=300.4695 ns, Timer=TSC
  [Host]     : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.7.3190.0
  DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.7.3190.0

Method Mean Error StdDev Scaled ScaledSD
Bare 12.61 ns 0.0576 ns 0.0481 ns 1.00 0.00
PushProperty 469.22 ns 4.4193 ns 4.1338 ns 37.21 0.34
PushPropertyNested 911.76 ns 6.6149 ns 6.1876 ns 72.30 0.54
PushPropertyEnriched 631.50 ns 3.7761 ns 3.5322 ns 50.07 0.33
Method Mean Error StdDev Scaled ScaledSD
Bare 12.54 ns 0.0336 ns 0.0298 ns 1.00 0.00
PushProperty 462.29 ns 2.2330 ns 1.8646 ns 36.86 0.17
PushPropertyNested 907.33 ns 2.6765 ns 2.2350 ns 72.34 0.24
PushPropertyEnriched 618.29 ns 2.7690 ns 2.5901 ns 49.29 0.23

netcoreapp2.0

BenchmarkDotNet=v0.10.6, OS=Windows 10 Redstone 2 (10.0.15063)
Processor=Intel Core i7-6700 CPU 3.40GHz (Skylake), ProcessorCount=8
Frequency=3328125 Hz, Resolution=300.4695 ns, Timer=TSC
dotnet cli version=2.1.4
  [Host]     : .NET Core 4.6.26614.01, 64bit RyuJIT
  DefaultJob : .NET Core 4.6.26614.01, 64bit RyuJIT

Method Mean Error StdDev Scaled ScaledSD
Bare 12.41 ns 0.2173 ns 0.2033 ns 1.00 0.00
PushProperty 120.57 ns 2.3199 ns 2.1700 ns 9.72 0.23
PushPropertyNested 239.82 ns 3.6876 ns 3.4494 ns 19.33 0.40
PushPropertyEnriched 271.48 ns 3.9595 ns 3.7038 ns 21.89 0.45
Method Mean Error StdDev Scaled ScaledSD
Bare 12.06 ns 0.1320 ns 0.1235 ns 1.00 0.00
PushProperty 116.90 ns 0.7450 ns 0.6221 ns 9.69 0.11
PushPropertyNested 234.42 ns 1.8830 ns 1.7614 ns 19.43 0.24
PushPropertyEnriched 243.07 ns 1.3864 ns 1.2290 ns 20.15 0.22

It seems there is no perfomance degradation 😅

PS Also here are full results.

baseline.zip
pr.zip

@Pliner
Copy link
Contributor Author

Pliner commented Jan 16, 2019

@nblumhardt Is it possible to include the PR to 2.8 release?

@nblumhardt
Copy link
Member

LGTM! Let's give it a blast on dev :shipit:

Thanks @Pliner! I'd love to spend some more time looking at allocations like those LogEventProperty classes in the near future. It will take some very, very careful planning to do without breaking changes, but I've got some ideas for removing them... 🙂

@nblumhardt nblumhardt merged commit ec87800 into serilog:dev Apr 26, 2019
@Pliner Pliner deleted the immutable-stack-enumerator branch April 26, 2019 14:24
@nblumhardt nblumhardt mentioned this pull request Oct 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants