Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scanner: Find quoted token from context #327

Closed
wants to merge 5 commits into from

Conversation

braydonk
Copy link
Contributor

@braydonk braydonk commented Nov 24, 2022

Closes #301

To resolve an issue with quoted map keys not correctly parsing, this PR changes how it progresses after scanning a quote token.

The old behaviour was:

  1. scanner reaches a :
  2. pull the characters out of the current scan context buffer and turn it into the map key token
  3. set the previous indent column to the beginning of the buffer So next time it gets to another key in the map, it'll check whether the indent was the same.

When scanning a quote:

  1. Scan from the beginning to the end of the quote sequence
  2. Add a token, send back to the lexer, and reset the scan context In this scenario, it reaches the :, but nothing is in the scan context buffer because it just went ahead and added the quote token. The previous indent column doesn't get reset, so when it gets to the next map key it thinks the indent increased, which is invalid yaml.

The new behaviour is essentially to not reset the context after scanning a quote, and when reaching a : check the context tokens, and if it's a quote set the previous indent column to that

To resolve an issue with quoted map keys not correctly parsing, this PR
changes how it progresses after scanning a quote token.

The old behaviour was:
1. scanner reaches a :
2. pull the characters out of the current scan context buffer and turn it into the map key token
3. set the previous indent column to the beginning of the buffer
So next time it gets to another key in the map, it'll check whether the indent was the same.

When scanning a quote:
1. Scan from the beginning to the end of the quote sequence
2. Add a token, send back to the lexer, and reset the scan context
In this scenario, it reaches the :, but nothing is in the scan context buffer because it just went ahead and added the quote token. The previous indent column doesn't get reset, so when it gets to the next map key it thinks the indent increased, which is invalid yaml.

The new behaviour is essentially to not reset the context after scanning a quote, and when reaching a : check the context tokens, and if it's a quote set the previous indent column to that
@braydonk braydonk marked this pull request as draft November 24, 2022 19:18
@braydonk
Copy link
Contributor Author

braydonk commented Nov 24, 2022

Currently, test cases like TestDecoder/"1":_"\"" and TestDecoder/'1':_'''' are failing. Is '' inside single quotes a way of escaping the single quote? I don't think I've ever heard of that. On my PR they hang though, the scan never ends up returning EOF.

EDIT: Yes, that is the way to escape single quotes in yaml. That's a new one to me!

When scanning an escaped single quote `''` there was an off by one error
because while the local `idx` variable is increased, the context's `idx`
is not. This leads to the progress returned being off by one, so the
scanner gets stuck in the `ctx.next()` loop forever, unable to progress.
@braydonk
Copy link
Contributor Author

braydonk commented Nov 24, 2022

Resolved the '' issue, but now need to solve an issue with double quote scanning. When there are certain numbers of \ escaped characters that cause a similar hangup. Probably a very similar off-by-one happening.

When we scan escape characters in quoted scalars, the local idx is
shifted but not the underlying context. This adds a column skip to
ensure the ctx progresses in sync with the local quote scanning.
`scanQuote` will add to the origin buffer for the purpose of creating
the quote token, but would never reset it. This caused a problem when
nothing would reset the buffer, i.e. when a quoted value was followed by
a newline and the next map key.
@codecov-commenter
Copy link

codecov-commenter commented Nov 25, 2022

Codecov Report

Merging #327 (d56f06e) into master (e4f32a2) will decrease coverage by 0.29%.
The diff coverage is n/a.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #327      +/-   ##
==========================================
- Coverage   75.93%   75.63%   -0.30%     
==========================================
  Files          13       13              
  Lines        4305     4359      +54     
==========================================
+ Hits         3269     3297      +28     
- Misses        799      819      +20     
- Partials      237      243       +6     

@braydonk
Copy link
Contributor Author

braydonk commented Nov 25, 2022

Okay, all existing tests appear to be passing. Going to add tests to properly cover all the new stuff and catch this potential regression in the future.

I had adjusted the `:` scanning logic in a way that didn't cover all
cases.
@braydonk braydonk marked this pull request as ready for review November 27, 2022 14:46
@braydonk
Copy link
Contributor Author

If you can think of any other tests cases I'll be happy to add them.

@goccy
Copy link
Owner

goccy commented Dec 1, 2022

Thank you for your contribution !!

I read the implementation, but it seemed a bit complicated, so I implemented it too
#328

@goccy goccy mentioned this pull request Dec 1, 2022
@braydonk
Copy link
Contributor Author

braydonk commented Dec 5, 2022

Made obsolete by a better solution in #328!

@braydonk braydonk closed this Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unmarshalling a list of maps fails with "unexpected key name"
3 participants