Decrease deserialization complexity from quadratic to linear #349
Conversation
Also add regression test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @est31!
Can this also include documentation as to what these two intermediate tables are? Either on the struct fields or on the functions that construct them.
src/de.rs
Outdated
.and_then(|entries| { | ||
let start = entries | ||
.binary_search(&self.cur) | ||
.unwrap_or_else(std::convert::identity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd personally prefer if this were unwrap_or_else(|i| i)
👍 |
It looks like the tests added here may be causing a spurious failure on CI? |
Huh, that's weird. It works on my machine but it failed on CI previously as well. As a result I increased the tolerance value, but it seems further increases are needed. The difference is quite large. I can maybe increase the sample size, which should lead to less noise. And maybe increase the multiplier as well to allow for even larger tolerances. Does that sound reasonable? |
CI machines (VMs) tend to be extremely noisy in terms of measurements, so I think it's fine to perhaps remove the test and move it to a benchmark which can be manually tracked over time. This is pretty unlikely to regress. |
CI environments can be noisy and while the test worked great locally on my machine, it didn't on the CI environment. This replaces the test with a (manually tracked) benchmark. As per toml-rs#349 (comment)
CI environments can be noisy and while the test worked great locally on my machine, it didn't on the CI environment. This replaces the test with a (manually tracked) benchmark. As per toml-rs#349 (comment)
CI environments can be noisy and while the test worked great locally on my machine, it didn't on the CI environment. This replaces the test with a (manually tracked) benchmark. As per #349 (comment)
Fixes #342.
In particular, see my comment #342 (comment)
Values that I recorded on my machine for running the code
measure_time(n, |i| format!("[header_no_{}]\n", i))
function for varyingn
:You can nicely see how before this PR, a 10x increase in data meant a 100x increase in time spent, while after the PR it only means a 10x increase.