Skip to content
This repository has been archived by the owner on Sep 24, 2022. It is now read-only.

Decrease deserialization complexity from quadratic to linear #349

Merged
merged 4 commits into from Oct 28, 2019

Conversation

est31
Copy link
Contributor

@est31 est31 commented Oct 25, 2019

Fixes #342.

In particular, see my comment #342 (comment)

Values that I recorded on my machine for running the code measure_time(n, |i| format!("[header_no_{}]\n", i)) function for varying n:

benchmark optimizations before this PR after this PR
1k no 170ms 48ms
10k no 14 191ms 483ms
100k no n/a 5 077ms
1k yes 5ms 4ms
10k yes 311ms 39ms
100k yes 63 850ms 364ms

You can nicely see how before this PR, a 10x increase in data meant a 100x increase in time spent, while after the PR it only means a 10x increase.

Also add regression test
@est31 est31 changed the title Increase deserialization speed from quadratic to linear Increase deserialization speed from quadratic time to linear Oct 25, 2019
@est31 est31 changed the title Increase deserialization speed from quadratic time to linear Decrease deserialization complexity from quadratic to linear Oct 25, 2019
Copy link
Collaborator

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @est31!

Can this also include documentation as to what these two intermediate tables are? Either on the struct fields or on the functions that construct them.

src/de.rs Outdated
.and_then(|entries| {
let start = entries
.binary_search(&self.cur)
.unwrap_or_else(std::convert::identity);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally prefer if this were unwrap_or_else(|i| i)

@alexcrichton
Copy link
Collaborator

👍

@alexcrichton
Copy link
Collaborator

It looks like the tests added here may be causing a spurious failure on CI?

@est31
Copy link
Contributor Author

est31 commented Oct 28, 2019

Huh, that's weird. It works on my machine but it failed on CI previously as well. As a result I increased the tolerance value, but it seems further increases are needed. The difference is quite large. I can maybe increase the sample size, which should lead to less noise. And maybe increase the multiplier as well to allow for even larger tolerances. Does that sound reasonable?

@alexcrichton
Copy link
Collaborator

CI machines (VMs) tend to be extremely noisy in terms of measurements, so I think it's fine to perhaps remove the test and move it to a benchmark which can be manually tracked over time. This is pretty unlikely to regress.

est31 added a commit to est31/toml-rs that referenced this pull request Oct 28, 2019
est31 added a commit to est31/toml-rs that referenced this pull request Oct 28, 2019
CI environments can be noisy and while the test worked great
locally on my machine, it didn't on the CI environment.
This replaces the test with a (manually tracked) benchmark.
As per toml-rs#349 (comment)
est31 added a commit to est31/toml-rs that referenced this pull request Oct 28, 2019
CI environments can be noisy and while the test worked great
locally on my machine, it didn't on the CI environment.
This replaces the test with a (manually tracked) benchmark.
As per toml-rs#349 (comment)
alexcrichton pushed a commit that referenced this pull request Oct 29, 2019
CI environments can be noisy and while the test worked great
locally on my machine, it didn't on the CI environment.
This replaces the test with a (manually tracked) benchmark.
As per #349 (comment)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Large TOML document performance
2 participants