Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing performance of large YAML files #519

Closed
PJB3005 opened this issue Aug 16, 2020 · 7 comments
Closed

Parsing performance of large YAML files #519

PJB3005 opened this issue Aug 16, 2020 · 7 comments

Comments

@PJB3005
Copy link

PJB3005 commented Aug 16, 2020

We're using YAML as a data format in our game, from data definitions to map files. We have a modestly sized map files now (that is at the time of writing, ~700 KiB uncompressed/38k lines).

I should note that we're not using deserialization features, JUST the AST (YamlMappingNode etc directly).

The parsing performance is really starting the hurt (currently taking about 10% of debug server startup time just spent inside YamlDotNet parsing the main map file) and, the kicker, 90% of that time is spent inside various collection ops like Array.Copy, List.Insert and List.RemoveAt.

Some profiling results

image

image

Could this be optimized? I tried saving the map files in flow style instead (since from my looking at the code the reason there's so many list butchering is because of parsing issues with block style?) but it did not seem to help perf. In fact it seems slightly worse with flow style.

@aaubry
Copy link
Owner

aaubry commented Sep 3, 2020

There's probably room for improvement. I would be interested in exploring this but I can't dedicate much time to performance right now because I already have another development in progress. I can probably provide some help if you guide this process, though.

I've looked at InsertionQueue, which appears in your screenshot, and I can see that there's a big TODO:

// TODO: Use a more efficient data structure

Looking at the Dequeue method, it is clear that it is really inefficient since it always removes the first item from a List, meaning that every other element needs to be shifted. We could start by changing this implementation to use a more efficient algorithm.

@aaubry
Copy link
Owner

aaubry commented Sep 3, 2020

Would you be able to provide an example of such a large YAML file ?

@aaubry aaubry added enhancement feedback-needed More feedback is required labels Sep 3, 2020
@PJB3005
Copy link
Author

PJB3005 commented Sep 7, 2020

Would you be able to provide an example of such a large YAML file ?

From our repo

@aaubry
Copy link
Owner

aaubry commented Sep 26, 2020

Hello!
I've changed the implementation of InsertionQueue and LookAheadBuffer in an experimental branch. Would you be able to test your code against this build to see if there is any improvement ?

@PJB3005
Copy link
Author

PJB3005 commented Sep 30, 2020

Pretty good improvement!

Before:

image

After:

image

@aaubry aaubry removed the feedback-needed More feedback is required label Oct 1, 2020
@aaubry
Copy link
Owner

aaubry commented Oct 1, 2020

Nice! I'll merge this change then.
Of course this was just a low-hanging fruit. I'm sure there are many more optimizations that can be performed. If you can identify more bottlenecks, I can see what we can do.

@EdwardCooke
Copy link
Collaborator

Since there was a fix for this merged in, I'm going to close this issue. If you need more assistance, let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants