Optimize cases with long potential simple_keys #555

cjcullen · 2019-11-26T05:16:45Z

When we build up the simple_keys stack, we count on the (formerly named) staleness check to catch errors where a simple key is required but would be > 1024 chars or span lines. The previous simplification that searches the stack from the top can go 1024 keys deep before finding a "stale" key and stopping. I added a test that shows that this consumes ~3s per 1MB of document size.

I split that staleness check back out into a separate loop so that we don't unnecessarily re-process a bunch of keys just to get down to the ones that might have gone stale.

$ benchcmp old.txt new.txt
benchmark                                old ns/op      new ns/op     delta
Benchmark1000KB100Aliases-6              881167484      872120600     -1.03%
Benchmark1000KBDeeplyNestedSlices-6      48761251       5274819       -89.18%
Benchmark1000KBDeeplyNestedMaps-6        50438114       5292240       -89.51%
Benchmark1000KBDeeplyNestedIndents-6     4385726        4280545       -2.40%
Benchmark1000KB1000IndentLines-6         435702849      432047937     -0.84%
Benchmark1KBMaps-6                       574312         588420        +2.46%
Benchmark10KBMaps-6                      5727272        5895964       +2.95%
Benchmark100KBMaps-6                     52126341       52920733      +1.52%
Benchmark1000KBMaps-6                    484546095      474623653     -2.05%
BenchmarkDeepSlice-6                     455902294      397041567     -12.91%
BenchmarkDeepFlow-6                      399613786      407783513     +2.04%
Benchmark1000KBMaxDepthNested-6          2700904018     447527143     -83.43%

benchmark                                old allocs     new allocs     delta
Benchmark1000KB100Aliases-6              5832425        5832415        -0.00%
Benchmark1000KBDeeplyNestedSlices-6      9066           10081          +11.20%
Benchmark1000KBDeeplyNestedMaps-6        9071           10087          +11.20%
Benchmark1000KBDeeplyNestedIndents-6     10082          10082          +0.00%
Benchmark1000KB1000IndentLines-6         4093516        4093516        +0.00%
Benchmark1KBMaps-6                       3126           3126           +0.00%
Benchmark10KBMaps-6                      30780          30780          +0.00%
Benchmark100KBMaps-6                     307269         307269         +0.00%
Benchmark1000KBMaps-6                    3072079        3072079        +0.00%
BenchmarkDeepSlice-6                     2048121        2048112        -0.00%
BenchmarkDeepFlow-6                      1978104        1978103        -0.00%
Benchmark1000KBMaxDepthNested-6          4079998        4079989        -0.00%

benchmark                                old bytes     new bytes     delta
Benchmark1000KB100Aliases-6              393118344     392792852     -0.08%
Benchmark1000KBDeeplyNestedSlices-6      4689465       4576336       -2.41%
Benchmark1000KBDeeplyNestedMaps-6        4689831       4577450       -2.40%
Benchmark1000KBDeeplyNestedIndents-6     2969924       2969913       -0.00%
Benchmark1000KB1000IndentLines-6         143718512     143718512     +0.00%
Benchmark1KBMaps-6                       218984        218984        +0.00%
Benchmark10KBMaps-6                      2178155       2178154       -0.00%
Benchmark100KBMaps-6                     22002796      22002807      +0.00%
Benchmark1000KBMaps-6                    220560496     220560496     +0.00%
BenchmarkDeepSlice-6                     120114888     119789000     -0.27%
BenchmarkDeepFlow-6                      115058032     115056832     -0.00%
Benchmark1000KBMaxDepthNested-6          146582888     146257064     -0.22%

scannerc.go

niemeyer

Thanks for keeping it up.

Here are some initial comments on the issue:

scannerc.go

efficient lookup in yaml_parser_fetch_value().

cjcullen · 2019-12-17T18:42:28Z

Thanks again for the first pass. Are you up for considering the addition of the single_key index? I'd like to have something to include in the next set of Kubernetes patch releases around the first week of January.

Message from original commit (53403b5): This change introduces an index to lookup token numbers referenced by simple_keys in O(1), thus significantly reducing the performance impact of certain abusively constructed snippets. When we build up the simple_keys stack, we count on the (formerly named) staleness check to catch errors where a simple key is required but would be > 1024 chars or span lines. The previous simplification that searches the stack from the top can go 1024 keys deep before finding a "stale" key and stopping. I added a test that shows that this consumes ~3s per 1MB of document size.

cjcullen added 2 commits November 25, 2019 20:32

Add test for max-depth indents.

98e75d6

Track high-water mark for checking staleness up the simple_keys stack.

f7bfbcf

liggitt reviewed Nov 26, 2019

View reviewed changes

scannerc.go Outdated Show resolved Hide resolved

niemeyer requested changes Nov 29, 2019

View reviewed changes

scannerc.go Outdated Show resolved Hide resolved

scannerc.go Outdated Show resolved Hide resolved

Track possible simple_keys in a map indexed by token_number for

84d2f86

efficient lookup in yaml_parser_fetch_value().

cjcullen requested a review from niemeyer January 7, 2020 22:23

niemeyer merged commit 53403b5 into go-yaml:v2 Jan 21, 2020

Blesmol mentioned this pull request Jun 7, 2021

Upgrade gopkg.in/yaml.v2 dependency due to CVE-2019-11254 dnaeon/go-vcr#60

Closed

elado mentioned this pull request Mar 25, 2022

yarn.lock with keys longer than 1024 characters throw parse errors vercel/turbo#948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize cases with long potential simple_keys #555

Optimize cases with long potential simple_keys #555

cjcullen commented Nov 26, 2019

niemeyer left a comment

cjcullen commented Dec 17, 2019

Optimize cases with long potential simple_keys #555

Optimize cases with long potential simple_keys #555

Conversation

cjcullen commented Nov 26, 2019

niemeyer left a comment

Choose a reason for hiding this comment

cjcullen commented Dec 17, 2019