[Performance][Optimization] Method inline question #1807

nmi-relewise · 2024-04-22T12:17:37Z

Dear gents, I'm trying to squeeze every ns in de/serialization process - we are operating on 100-GBs scale.

My attention was caught on how primitives are being read/inflated, saying MessagePackReader.Integers.cs

The op (internally) is to read bytes from span (aka SequenceReader) & increment position.
While it (MessagePackReader) is beef-ed with multiple method calls:

ReadInt32
ThrowInsufficientBufferUnless
TryReadBigEndian
TryRead

Keeping in mind absolute cost of method call being cheap/non-negative there could be a vision the flow is fine.
However, if we scale/compare to logical operation reader has to do = read from array in memory VS ((push a few args into reg + call method)*3) = costly.

A few methods are marked as subject for inline MethodImpl(MethodImplOptions.AggressiveInlining)].

It seem like a good idea to decorate more methods with inline option on the surface (considering they operate on same args).

The question is - were there any specific reasons why only a few inlined, or it is just codebase lifetime development (as-is)?

Cheers,
Nik

The text was updated successfully, but these errors were encountered:

AArnott · 2024-04-24T04:16:20Z

I don't know why the methods that are attributed were attributed. Inlining already happens by the JIT automatically in cases where it is obvious to the JIT that it would provide a useful improvement. We should only add attributes where the JIT isn't already inlining it yet we've measured perf and found it to make it faster to add the attribute.
If you have that data, I'd take a PR that adds the attributes.

nmi-relewise · 2024-05-06T13:34:45Z

The methods are getting called ~300M times, in green:

The timings differ 49 vs 34 sec:

It seem to be a possible way go forward; however I'd ask collective brain to look for pitfalls in da approach.

Thanks.

nmi-relewise · 2024-05-06T15:13:04Z

The first challenge - how to measure accurately.
There are tier-ed compilations, therefore a point with JIT optimizations makes sense:

However, as you've mentioned, runtime to pick how/which calls to be inlined:

In case of .NET 7 the ReadInt32 method still gets a jmp (what is weird - with multiple different addresses).

Since we are effectively trying to measure the impact of 'extra' work on top of read int from span = super fast, any observation tools will introduce the noise = influence = explain why numbers are so different with dotTrace, and not that far in real life.

@AArnott

nmi-relewise changed the title ~~[Performance][Inline] Question~~ [Performance][Optimization] Method inline question Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance][Optimization] Method inline question #1807

[Performance][Optimization] Method inline question #1807

nmi-relewise commented Apr 22, 2024

AArnott commented Apr 24, 2024

nmi-relewise commented May 6, 2024

nmi-relewise commented May 6, 2024 •

edited

[Performance][Optimization] Method inline question #1807

[Performance][Optimization] Method inline question #1807

Comments

nmi-relewise commented Apr 22, 2024

AArnott commented Apr 24, 2024

nmi-relewise commented May 6, 2024

nmi-relewise commented May 6, 2024 • edited

nmi-relewise commented May 6, 2024 •

edited