Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase performance and reduce allocations in hclsyntax.LexConfig #490

Merged
merged 3 commits into from Feb 16, 2022

Conversation

inkel
Copy link
Contributor

@inkel inkel commented Oct 20, 2021

Hello, team. First of all, thank you for sharing this library.

While working on an internal project that makes use of this library to parse .hcl files to retrieve a list of used modules, I added some benchmarks and found that one of the bottlenecks that I had was in the hclsyntax.LexConfig. Further inspection highlighted checkInvalidTokens as one of the culprits (the other being emitToken which is in a generated file and thus I didn't want to mess much with).

I took the liberty to add a benchmark for LexConfig with a rather small HCL file, but that serves the purposes of finding if I was able to improve the results. Then I proceeded to run the benchmarks in the original code and found this results (only showing one line for brevity):

$ go test -benchmem -bench=. -benchtime=200000x -count=10 -memprofile=mem.pprof
goos: darwin
goarch: amd64
pkg: github.com/hashicorp/hcl/v2/hclsyntax
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkLexConfig-12             200000              9298 ns/op            8936 B/op         37 allocs/op

After that I've started to find where most of the allocations were being made, and found a spot that was making most of the reported allocations. Then I proceeded to try and reduce the allocations count and bytes, and the last commit got me this benchmark results (again showing only one line for brevity):

$ go test -benchmem -bench=. -benchtime=200000x -count=10 -memprofile=mem.pprof
goos: darwin
goarch: amd64
pkg: github.com/hashicorp/hcl/v2/hclsyntax
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkLexConfig-12             200000              7799 ns/op            6056 B/op          7 allocs/op

According to benchstat, the difference between both benchmarks is:

name          old time/op    new time/op    delta
LexConfig-12    9.27µs ± 0%    7.83µs ± 1%  -15.55%  (p=0.000 n=9+10)

name          old alloc/op   new alloc/op   delta
LexConfig-12    8.94kB ± 0%    6.06kB ± 0%  -32.23%  (p=0.000 n=10+10)

name          old allocs/op  new allocs/op  delta
LexConfig-12      37.0 ± 0%       7.0 ± 0%  -81.08%  (p=0.000 n=10+10)

As you can see this reduced the number of allocations in ~80% using 32% less bytes while doing so. As a side effect, performance got increased in 15%.

I hope you find this useful and worth of merging, thanks!

@hashicorp-cla
Copy link

hashicorp-cla commented Oct 20, 2021

CLA assistant check
All committers have signed the CLA.

@inkel
Copy link
Contributor Author

inkel commented Feb 8, 2022

Hi! Any updates on this?

@apparentlymart apparentlymart force-pushed the inkel/reduce-mem-usage branch 2 times, most recently from c31e897 to 818a4cf Compare February 15, 2022 18:38
@apparentlymart
Copy link
Member

apparentlymart commented Feb 15, 2022

Hi @inkel! Sorry for not responding sooner.

I had to do some tooling maintenance on this repository to get the per-commit checks working again before review, and that's now done and so we do have the tests running here again.

At the time I'm writing this there seems to be a test failure but my initial investigation suggests that it's a pre-existing issue unrelated to what you proposed here: there is a test that's currently relying on a particular iteration order for maps in Go, but Go does not guarantee that and so there's a 50% chance of the test failing.

I'd like to try to get that fixed first because the code you modified here was introduced specifically to fix an earlier bug around accidental reuse of values, and so although what you implemented does look correct to me as a human reviewer (the closure you introduced seems to allocate a new copy of the range for each use) I'd like to confirm that confidence by seeing the tests run cleanly before we move forward.

Thanks for working on this!

@inkel
Copy link
Contributor Author

inkel commented Feb 15, 2022

Hey @apparentlymart thanks for taking the time to review this!

What you say makes total sense to me. I'll be happy to help with anything if you need to.

@apparentlymart apparentlymart added syntax/native v2 Relates to the v2 line of releases labels Feb 16, 2022
Doing this reduces the memory used in ~11%, as the following benchstat
comparison shows:

    name          old time/op    new time/op    delta
    LexConfig-12    9.27µs ± 0%    9.03µs ± 1%   -2.55%  (p=0.000 n=9+10)

    name          old alloc/op   new alloc/op   delta
    LexConfig-12    8.94kB ± 0%    7.98kB ± 0%  -10.74%  (p=0.000 n=10+10)

    name          old allocs/op  new allocs/op  delta
    LexConfig-12      37.0 ± 0%      37.0 ± 0%     ~     (all equal)

Benchmarks were created using:

    go test -benchmem -benchtime=200000x -count=10 -bench=.
In a similar fashion as the parent commit, here instead of always
copying the tok.Range for later use, we define a function to get this
copied value, and thus we only allocate the copy if it's needed,
otherwise don't.

For the benchmark introduced earlier, the reduction in allocations and
memory usage is outstanding:

    name          old time/op    new time/op    delta
    LexConfig-12    9.05µs ± 1%    7.83µs ± 1%  -13.54%  (p=0.000 n=10+10)

    name          old alloc/op   new alloc/op   delta
    LexConfig-12    7.98kB ± 0%    6.06kB ± 0%  -24.07%  (p=0.000 n=10+10)

    name          old allocs/op  new allocs/op  delta
    LexConfig-12      37.0 ± 0%       7.0 ± 0%  -81.08%  (p=0.000 n=10+10)

Benchmarks were created using:

    go test -benchmem -benchtime=200000x -count=10 -bench=.
@apparentlymart
Copy link
Member

Hi again, @inkel!

Thanks for your patience. With the automated test runs now working we can see the existing tests passing, and for good measure I also tried some manual tests to convince myself that indeed this approach does result in each diagnostic pointing to a separate hcl.Range object, without sharing allocations between the loops.

This does mean that now if there are multiple diagnostics in a single iteration we will allocate a new hcl.Range for each, but that seems like a good tradeoff since we typically don't optimize the error return paths on the assumption that reasonable usage will produce only a few diagnostics at most.

Thanks again for working on this! I'm going to merge it now.

@apparentlymart apparentlymart merged commit 895479c into hashicorp:main Feb 16, 2022
@inkel inkel deleted the inkel/reduce-mem-usage branch February 16, 2022 19:11
@inkel
Copy link
Contributor Author

inkel commented Feb 16, 2022

@apparentlymart thank you for merging this and for the detailed explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
syntax/native v2 Relates to the v2 line of releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants