Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic due to erroneous regex-rewriting #164

Closed
sharkdp opened this issue May 25, 2018 · 4 comments
Closed

Panic due to erroneous regex-rewriting #164

sharkdp opened this issue May 25, 2018 · 4 comments

Comments

@sharkdp
Copy link
Contributor

sharkdp commented May 25, 2018

Running syncat in no-newlines mode fails for the following file:

test.rd (R documentation)

\usage{
add(x, y)
}
> cargo run --example syncat -- -n test.rd
[..]
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value:
Error(-114, target of repeat operator is invalid)', libcore/result.rs:945:5

This is apparently caused by rewriting the following regex:

((\\)(?:usage))(\{)(?:\n)?

from testdata/Packages/R/Rd (R Documentation).sublime-syntax line 92 in a way that it includes (?:$)? at the end (causing the "target of repeat operator is invalid" error).

I found this while working on a fix for #98 (See #165).

@trishume
Copy link
Owner

Darn. The nonewlines rewriting is always very fraught and causes us a lot of trouble. Not sure how to fix this. One option is to just add a special case for this instead of solving it generally, which I've done before.

@sharkdp
Copy link
Contributor Author

sharkdp commented May 25, 2018

The nonewlines rewriting is always very fraught and causes us a lot of trouble. Not sure how to fix this.

I quickly tried to patch this by keeping track of whether we are in a group (..) or not, but this attempt immediately failed on the regex above due to the (\\) part which is hard to understand without a real parser (the closing parens is not escaped). Also, just ignoring the $-rewrite in groups is probably not a good idea anyways.

It would be great if we could use the regex engine to parse the regex first and perform the transformation on the AST. However, onig doesn't seem to support this(?).

One option is to just add a special case for this instead of solving it generally, which I've done before.

That sounds like a valid option for now. How would this work? Would the special case be the full regex ((\\)(?:usage))(\{)(?:\n)? or just the (?:\n)? part? There are two regexes in the Rd sublime-syntax using this pattern.

@trishume
Copy link
Owner

Probably just the (?:\n)? or even just the (?:\n) part as a string replace before the rewriter in rewrite_regex. With a comment saying it's a hack to fix the Rd syntax.

@robinst
Copy link
Collaborator

robinst commented May 26, 2018

I quickly tried to patch this by keeping track of whether we are in a group (..) or not

The RegexRewriter class already handles backslashes, because it needs it to detect character classes as well. Should not be difficult to keep track of being in a group or not.

But the problem here is not the group, it's the ? after the group, which makes this much harder.

So I also think doing a replace is better in this case. We could rewrite (?:\n)? to (?:$|), which Onig doesn't have a problem with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants