Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check regular expressions for compile errors #165

Merged
merged 5 commits into from May 26, 2018
Merged

Check regular expressions for compile errors #165

merged 5 commits into from May 26, 2018

Conversation

sharkdp
Copy link
Contributor

@sharkdp sharkdp commented May 25, 2018

This PR implements a basic version of regex syntax checking as proposed in #98 by @robinst. This would hopefully allow us to catch a lot of possible errors at SyntaxSet-generation time (and prevent panic!s at syntax-highlighting time).

I have also implemented Error for ParseSyntaxError (which includes RegexCompileError). This allows user code to display nice error messages when loading syntax sets. For example, I get the following error when I try to load CMake.sublime-syntax:

Error while compiling regex '(?=\b@?[A-Z_][A-Z0-9_]*(?=[^\w-<>=\$]))':
unmatched range specifier in char-class

(this has already been addressed upstream: zyxar/Sublime-CMakeLists#32)

Note that the tests currently do not pass due to #164 (I believe this is a good thing!), so #164 should be fixed before merging this. All tests pass if Rd (R Documentation).sublime-syntax is removed.


The backreference "rewriting" is very immature (and maybe even "unsafe"?). I ended up using this simple search-and-replace after trying two other approaches first:

  • Simply ignore regex compile errors with code -208 (onig_sys::ONIGERR_INVALID_BACKREF).
    However, using this approach, we wouldn't catch other syntax errors in these regexes.

  • Use onigs SyntaxBehavior to disable backreference checking (because I thought that
    SYNTAX_BEHAVIOR_STRICT_CHECK_BACKREF might help - it did not).

Copy link
Owner

@trishume trishume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good thanks, just some nits I'd prefer you fix but aren't actually important.

This is technically a breaking API change, but it is to a previously unused error, and searching all of Github brings up no uses of it, so I think it's fine to change without a major version bump.

Maybe just pinging @robinst since he's the only major non-open-source user I know.

assert!(def.is_err());
match def.unwrap_err() {
ParseSyntaxError::RegexCompileError(ref regex, _) => assert_eq!("[a", regex),
_ => assert!(false, "Got unexpedted ParseSyntaxError"),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/unexpedted/unexpected/

regex_str = regex_str.replace(&format!("\\{}", i), "placeholder");
}

match Regex::new(&regex_str) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may need to be a with_options with ONIG_OPTION_CAPTURE_GROUP like the other compiles in the library. From some Googling it appears this is the standard for textmate-like things and some syntaxes may rely on it.

Copy link
Contributor Author

@sharkdp sharkdp May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good point. I saw Regex::new somewhere and didn't check at the relevant places.

@sharkdp
Copy link
Contributor Author

sharkdp commented May 25, 2018

This is technically a breaking API change

Oh, right - due to the RegexCompileError(onig::Error) => RegexCompileError(String, onig::Error) change.

but it is to a previously unused error, and searching all of Github brings up no uses of it, so I think it's fine to change without a major version bump.

👍 Sounds good to me.

Copy link
Collaborator

@robinst robinst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, just one suggestion.

Awesome that this caught another problem in the existing syntax :).

Regarding the API change, it's not a problem for us.

let mut regex_str = String::from(regex_str);
for i in 0..10 {
regex_str = regex_str.replace(&format!("\\{}", i), "placeholder");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the regex_with_substitutes function a bit more abstract (or extract another function), so that we can use the same logic here?:

pub fn regex_with_substitutes(&self, region: &Region, s: &str) -> String {

This is necessary to handle double backslashes like \\1 properly.

While I was noticing things I also made the placeholders nicer for error
messages and used a capacity when building the substitute result.
@trishume
Copy link
Owner

@sharkdp I realized that @robinst's suggestion was important not just for cleanliness but also to handle the case of \\1 correctly. I thought there might be hairy lifetime problems so I wanted to take a crack at it myself, there ended up not being any problems, but I'd already figured it out.

I pushed my changes to the branch, they look okay? @sharkdp @robinst

Copy link
Collaborator

@robinst robinst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks good!

@sharkdp
Copy link
Contributor Author

sharkdp commented May 26, 2018

Looks great!

@trishume trishume merged commit ab5f77d into trishume:master May 26, 2018
@sharkdp
Copy link
Contributor Author

sharkdp commented May 27, 2018

Thank you!

Very much looking forward to the next release 😄.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants