Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce location tracking in the tokenizer and parser #710

Merged
merged 7 commits into from Dec 5, 2022

Conversation

ankrgyl
Copy link
Contributor

@ankrgyl ankrgyl commented Nov 15, 2022

This revives PR #288 and #514, rebased against the latest and done with fewer variable renames (limiting merge conflicts).

Closes #524

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Nov 15, 2022

@alamb @AugustoFKL this PR addresses #524.

This change is reasonably contained, but one thing that could make it even smaller would be maintaining the interface (i.e.next_token() returns a Token not a TokenWithLocation) and maybe adding something like last_location() to access the location just when you need it (e.g. for error messages).

@alamb
Copy link
Collaborator

alamb commented Nov 16, 2022

Thanks @ankrgyl -- I'll try to look at this PR more carefully later this week

@coveralls
Copy link

coveralls commented Nov 16, 2022

Pull Request Test Coverage Report for Build 3623431070

  • 203 of 269 (75.46%) changed or added relevant lines in 3 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.08%) to 86.309%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/tokenizer.rs 69 77 89.61%
src/parser.rs 133 191 69.63%
Files with Coverage Reduction New Missed Lines %
src/parser.rs 1 83.17%
Totals Coverage Status
Change from base Build 3623318513: -0.08%
Covered Lines: 12532
Relevant Lines: 14520

💛 - Coveralls

@AugustoFKL
Copy link
Contributor

@ankrgyl @alamb sorry for the delay. This has been a hectic week for me. I'm on vacation next week, and will be able to review this properly

Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks great @ankrgyl -- thank you for proposing it; I apologize for the delay in finding time to review it

I think we can avoid many of the changes with an impl PartialEq as I mentioned

Otherwise, all this PR needs is some tests and I think it would be ready to merge.

Thanks again!

pub column: u64,
}

/// A [Token] with [Location] attached to it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably avoid many of the changes in this PR by implementing PartialEq between Token and TokenWithLocation:

impl PartialEq<Token>` for TokenWithLocation

Which would allow things like

        if self.peek_nth_token(1).token == Token::RArrow {

to keep working

@@ -321,58 +356,88 @@ impl fmt::Display for TokenizerError {
#[cfg(feature = "std")]
impl std::error::Error for TokenizerError {}

struct State<'a> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Dec 1, 2022

@alamb updated with the PartialEq suggestion (great idea!). For tests -- do you have a suggestion on what would be best? I can tweak a few of the tests to tokenize w/ location and hardcode them in, or do something else?

@alamb
Copy link
Collaborator

alamb commented Dec 1, 2022

For tests -- do you have a suggestion on what would be best? I can tweak a few of the tests to tokenize w/ location and hardcode them in, or do something else?

I think the goal should be that if we accidentally break the location parsing code some tests will fail. I think ensuring the locations are set on parse errors would be particularly helpful.

So perhaps you could add a few tests that assert locations of parsed tokens, and then a test that has a parsing error that retrieves the location of the error (if this last test needs more code changes, we could do it as a follow on PR)

Thank you so much for pushing this along @ankrgyl -- this is going to be a great addition (and is a long asked for feature)

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Dec 1, 2022

Added a few tests. We'll need to thread parser locations through as a follow up PR, but I added a test that fails in the parser due to a tokenizer error (and asserts the location) + a test that we can fill in with locations once the parser carries along that info.

Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great -- thank you @ankrgyl

Would you like a chance to review @AugustoFKL ?

assert_eq!(
ast,
Err(ParserError::TokenizerError(
"Unterminated string literal at Line: 1, Column 5".to_string()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

/// The index of the first unprocessed token in `self.tokens`
index: usize,
dialect: &'a dyn Dialect,
}

impl<'a> Parser<'a> {
/// Parse the specified tokens
/// To avoid breaking backwards compatibility, this function accepts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Collaborator

alamb commented Dec 2, 2022

This appears to have a logical conflict with master.

I am planning on cutting a release soon -- maybe given the nature of this change I will make release right before this change, and then merge this one. That way people who use the tip of this repository can effectively beta test the changes before we make a release for everyone 🤔

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Dec 2, 2022

That seems wise to me. We may even want to flesh out the (potentially breaking) parser changes so that users who upgrade only need to adjust to breakages once. Happy to contribute those and collaborate with you on that.

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Dec 2, 2022

Additionally, I just rebased and updated the merge conflicts.

@alamb
Copy link
Collaborator

alamb commented Dec 2, 2022

That seems wise to me. We may even want to flesh out the (potentially breaking) parser changes so that users who upgrade only need to adjust to breakages once. Happy to contribute those and collaborate with you on that.

That would be awesome. I plan to make a release soon (TM) -- either later today or over the weekend sometime. Then let's merge this one in and iterate.

@alamb
Copy link
Collaborator

alamb commented Dec 5, 2022

@ankrgyl I took the liberty of fixing the clippy CI error and merging up from main to this branch.

@alamb alamb merged commit 813f4a2 into sqlparser-rs:main Dec 5, 2022
@alamb
Copy link
Collaborator

alamb commented Dec 5, 2022

That seems wise to me. We may even want to flesh out the (potentially breaking) parser changes so that users who upgrade only need to adjust to breakages once. Happy to contribute those and collaborate with you on that.

This is merged and should be included in the 0.29.0 release (eta in approximately 1 months time) I would love to start collaborating to fixup the parser. The first step might be to list / sketch out the changes that are needed.

Thanks again -- this is super exciting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support of locations in AST
4 participants