scanner: Find quoted token from context #327

braydonk · 2022-11-24T19:18:29Z

Closes #301

To resolve an issue with quoted map keys not correctly parsing, this PR changes how it progresses after scanning a quote token.

The old behaviour was:

scanner reaches a :
pull the characters out of the current scan context buffer and turn it into the map key token
set the previous indent column to the beginning of the buffer So next time it gets to another key in the map, it'll check whether the indent was the same.

When scanning a quote:

Scan from the beginning to the end of the quote sequence
Add a token, send back to the lexer, and reset the scan context In this scenario, it reaches the :, but nothing is in the scan context buffer because it just went ahead and added the quote token. The previous indent column doesn't get reset, so when it gets to the next map key it thinks the indent increased, which is invalid yaml.

The new behaviour is essentially to not reset the context after scanning a quote, and when reaching a : check the context tokens, and if it's a quote set the previous indent column to that

To resolve an issue with quoted map keys not correctly parsing, this PR changes how it progresses after scanning a quote token. The old behaviour was: 1. scanner reaches a : 2. pull the characters out of the current scan context buffer and turn it into the map key token 3. set the previous indent column to the beginning of the buffer So next time it gets to another key in the map, it'll check whether the indent was the same. When scanning a quote: 1. Scan from the beginning to the end of the quote sequence 2. Add a token, send back to the lexer, and reset the scan context In this scenario, it reaches the :, but nothing is in the scan context buffer because it just went ahead and added the quote token. The previous indent column doesn't get reset, so when it gets to the next map key it thinks the indent increased, which is invalid yaml. The new behaviour is essentially to not reset the context after scanning a quote, and when reaching a : check the context tokens, and if it's a quote set the previous indent column to that

braydonk · 2022-11-24T19:27:47Z

Currently, test cases like TestDecoder/"1":_"\"" and TestDecoder/'1':_'''' are failing. Is '' inside single quotes a way of escaping the single quote? I don't think I've ever heard of that. On my PR they hang though, the scan never ends up returning EOF.

EDIT: Yes, that is the way to escape single quotes in yaml. That's a new one to me!

When scanning an escaped single quote `''` there was an off by one error because while the local `idx` variable is increased, the context's `idx` is not. This leads to the progress returned being off by one, so the scanner gets stuck in the `ctx.next()` loop forever, unable to progress.

braydonk · 2022-11-24T20:18:34Z

Resolved the '' issue, but now need to solve an issue with double quote scanning. When there are certain numbers of \ escaped characters that cause a similar hangup. Probably a very similar off-by-one happening.

When we scan escape characters in quoted scalars, the local idx is shifted but not the underlying context. This adds a column skip to ensure the ctx progresses in sync with the local quote scanning.

`scanQuote` will add to the origin buffer for the purpose of creating the quote token, but would never reset it. This caused a problem when nothing would reset the buffer, i.e. when a quoted value was followed by a newline and the next map key.

codecov-commenter · 2022-11-25T13:40:23Z

Codecov Report

Merging #327 (d56f06e) into master (e4f32a2) will decrease coverage by 0.29%.
The diff coverage is n/a.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #327      +/-   ##
==========================================
- Coverage   75.93%   75.63%   -0.30%     
==========================================
  Files          13       13              
  Lines        4305     4359      +54     
==========================================
+ Hits         3269     3297      +28     
- Misses        799      819      +20     
- Partials      237      243       +6

braydonk · 2022-11-25T13:44:30Z

Okay, all existing tests appear to be passing. Going to add tests to properly cover all the new stuff and catch this potential regression in the future.

I had adjusted the `:` scanning logic in a way that didn't cover all cases.

braydonk · 2022-11-27T14:52:17Z

If you can think of any other tests cases I'll be happy to add them.

goccy · 2022-12-01T19:05:58Z

Thank you for your contribution !!

I read the implementation, but it seemed a bit complicated, so I implemented it too
#328

braydonk · 2022-12-05T13:15:20Z

Made obsolete by a better solution in #328!

braydonk marked this pull request as draft November 24, 2022 19:18

braydonk added 2 commits November 24, 2022 22:03

scanner: apply column skip in quote scanning

9de8125

When we scan escape characters in quoted scalars, the local idx is shifted but not the underlying context. This adds a column skip to ensure the ctx progresses in sync with the local quote scanning.

scanner: fix buf and obuf sync issue

8834a3c

`scanQuote` will add to the origin buffer for the purpose of creating the quote token, but would never reset it. This caused a problem when nothing would reset the buffer, i.e. when a quoted value was followed by a newline and the next map key.

add tests, fixed bad scanner logic

d56f06e

I had adjusted the `:` scanning logic in a way that didn't cover all cases.

braydonk marked this pull request as ready for review November 27, 2022 14:46

braydonk mentioned this pull request Nov 29, 2022

2.22.0 -> 2.23.0 upgrade broke 'map_values' configuration GoogleCloudPlatform/ops-agent#994

Closed

goccy mentioned this pull request Dec 1, 2022

Fix quoted map key #328

Merged

braydonk closed this Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scanner: Find quoted token from context #327

scanner: Find quoted token from context #327

braydonk commented Nov 24, 2022 •

edited

braydonk commented Nov 24, 2022 •

edited

braydonk commented Nov 24, 2022 •

edited

codecov-commenter commented Nov 25, 2022 •

edited

braydonk commented Nov 25, 2022 •

edited

braydonk commented Nov 27, 2022

goccy commented Dec 1, 2022

braydonk commented Dec 5, 2022

scanner: Find quoted token from context #327

scanner: Find quoted token from context #327

Conversation

braydonk commented Nov 24, 2022 • edited

braydonk commented Nov 24, 2022 • edited

braydonk commented Nov 24, 2022 • edited

codecov-commenter commented Nov 25, 2022 • edited

Codecov Report

braydonk commented Nov 25, 2022 • edited

braydonk commented Nov 27, 2022

goccy commented Dec 1, 2022

braydonk commented Dec 5, 2022

braydonk commented Nov 24, 2022 •

edited

braydonk commented Nov 24, 2022 •

edited

braydonk commented Nov 24, 2022 •

edited

codecov-commenter commented Nov 25, 2022 •

edited

braydonk commented Nov 25, 2022 •

edited