Parsing performance of large YAML files #519

PJB3005 · 2020-08-16T16:37:14Z

We're using YAML as a data format in our game, from data definitions to map files. We have a modestly sized map files now (that is at the time of writing, ~700 KiB uncompressed/38k lines).

I should note that we're not using deserialization features, JUST the AST (YamlMappingNode etc directly).

The parsing performance is really starting the hurt (currently taking about 10% of debug server startup time just spent inside YamlDotNet parsing the main map file) and, the kicker, 90% of that time is spent inside various collection ops like Array.Copy, List.Insert and List.RemoveAt.

Some profiling results

Could this be optimized? I tried saving the map files in flow style instead (since from my looking at the code the reason there's so many list butchering is because of parsing issues with block style?) but it did not seem to help perf. In fact it seems slightly worse with flow style.

The text was updated successfully, but these errors were encountered:

aaubry · 2020-09-03T10:03:07Z

There's probably room for improvement. I would be interested in exploring this but I can't dedicate much time to performance right now because I already have another development in progress. I can probably provide some help if you guide this process, though.

I've looked at InsertionQueue, which appears in your screenshot, and I can see that there's a big TODO:

// TODO: Use a more efficient data structure

Looking at the Dequeue method, it is clear that it is really inefficient since it always removes the first item from a List, meaning that every other element needs to be shifted. We could start by changing this implementation to use a more efficient algorithm.

aaubry · 2020-09-03T10:15:25Z

Would you be able to provide an example of such a large YAML file ?

PJB3005 · 2020-09-07T20:00:05Z

Would you be able to provide an example of such a large YAML file ?

From our repo

aaubry · 2020-09-26T14:17:20Z

Hello!
I've changed the implementation of InsertionQueue and LookAheadBuffer in an experimental branch. Would you be able to test your code against this build to see if there is any improvement ?

PJB3005 · 2020-09-30T16:41:28Z

Pretty good improvement!

Before:

After:

aaubry · 2020-10-01T13:55:12Z

Nice! I'll merge this change then.
Of course this was just a low-hanging fruit. I'm sure there are many more optimizations that can be performed. If you can identify more bottlenecks, I can see what we can do.

Small performance improvements. Addresses #519

EdwardCooke · 2022-07-28T06:02:56Z

Since there was a fix for this merged in, I'm going to close this issue. If you need more assistance, let me know.

PJB3005 mentioned this issue Aug 17, 2020

long list vs single file per prototype (AKA: YAML vs TOML) space-wizards/RobustToolbox#1236

Closed

aaubry added enhancement feedback-needed More feedback is required labels Sep 3, 2020

aaubry removed the feedback-needed More feedback is required label Oct 1, 2020

aaubry added a commit that referenced this issue Oct 1, 2020

Merge pull request #533 from aaubry/insertion-queue-performance

cd123f7

Small performance improvements. Addresses #519

This was referenced Nov 13, 2021

Introduce StringBuilderPool #646

Merged

Change Mark to be readonly struct #647

Merged

Avoid extra allocations in YamlScalarNode GetHashCode #648

Merged

aaubry mentioned this issue Apr 29, 2022

This project is on hold #690

Closed

EdwardCooke closed this as completed Jul 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing performance of large YAML files #519

Parsing performance of large YAML files #519

PJB3005 commented Aug 16, 2020

aaubry commented Sep 3, 2020

aaubry commented Sep 3, 2020

PJB3005 commented Sep 7, 2020

aaubry commented Sep 26, 2020

PJB3005 commented Sep 30, 2020

aaubry commented Oct 1, 2020

EdwardCooke commented Jul 28, 2022

Parsing performance of large YAML files #519

Parsing performance of large YAML files #519

Comments

PJB3005 commented Aug 16, 2020

aaubry commented Sep 3, 2020

aaubry commented Sep 3, 2020

PJB3005 commented Sep 7, 2020

aaubry commented Sep 26, 2020

PJB3005 commented Sep 30, 2020

aaubry commented Oct 1, 2020

EdwardCooke commented Jul 28, 2022