Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error message caret position is slightly off in at least one case #792

Closed
BurntSushi opened this issue Jul 2, 2021 · 2 comments
Closed

Comments

@BurntSushi
Copy link
Member

From: https://stackoverflow.com/questions/68228762/regex-wont-compile-because-of-unclosed-character-class-in-named-capture-group

use lazy_static::lazy_static;
use regex::Regex;

fn main() {
    lazy_static! {
        static ref FANCY_OPCODE_RE: Regex = Regex::new(r"(?x)
            ^                              # Match start of string
            (?P<opname>[-a-zA-Z#+]+)       # Match abbreviated name of OpCode as 'opname'
            \(                             # Open parentheses
            (?P<arg1>[0-9]+)               # Match first number as 'arg1'
            (/                             # Delimiter
            (?P<arg2>[0-9]+)               # Optionally match second number as 'arg2'
            /                              # Delimiter
            (?P<arg3>[0-9]+))?             # Optionally match third number as 'arg3'
            \)                             # Closing parenthesis
            $                              # Match end of string
        ").unwrap();
    }
    let s = "+loop(3)";
    let opname: String; 
    let arg1: String;
    let arg2: String;
    let arg3: String;
    match FANCY_OPCODE_RE.captures(s) {
        Some(cap) => { 
            opname = format!("{:?}", cap.name("opname")); 
            arg1 = format!("{:?}", cap.name("arg1"));
            arg2 = format!("{:?}", cap.name("arg2"));
            arg3 = format!("{:?}", cap.name("arg3"));
        },
        None => { 
            opname = "No match".to_string(); 
            arg1 = String::new();
            arg2 = String::new();
            arg3 = String::new();
        }
    }

    println!("opname = {}, arg1 = {}, arg2 = {}, arg3 = {}", opname, arg1, arg2, arg3);
}

Here's what the output looks like:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1: (?x)
 2:             ^                              # Match start of string
 3:             (?P<opname>[-a-zA-Z#+]+)       # Match abbreviated name of OpCode as 'opname'
                           ^^
 4:             \(                             # Open parentheses
 5:             (?P<arg1>[0-9]+)               # Match first number as 'arg1'
 6:             (/                             # Delimiter
 7:             (?P<arg2>[0-9]+)               # Optionally match second number as 'arg2'
 8:             /                              # Delimiter
 9:             (?P<arg3>[0-9]+))?             # Optionally match third number as 'arg3'
10:             \)                             # Closing parenthesis
11:             $                              # Match end of string
12:         
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
error: unclosed character class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)', src/main.rs:17:12

The positioning of the ^^ in the error message looks off. I think there should be only one ^ and it should be under the opening [.

@BurntSushi
Copy link
Member Author

This could be a "fun" one for someone to dig into. It's reasonably self-contained inside of regex-syntax, and shouldn't require any knowledge of regex engine internals.

@BurntSushi BurntSushi added the bug label Jul 2, 2021
@alexfertel
Copy link
Contributor

Hello! I started taking a look at this, and I have to say that it's a pleasure to read the code. I am learning so much just by reading 👌🏻 .

I have a question. When reading the following doc comments I don't understand what is meant by "Similarly for -".

    /// Parses the opening of a character class set. This includes the opening
    /// bracket along with `^` if present to indicate negation. This also
    /// starts parsing the opening set of unioned items if applicable, since
    /// there are special rules applied to certain characters in the opening
    /// of a character class. For example, `[^]]` is the class of all
    /// characters not equal to `]`. (`]` would need to be escaped in any other
    /// position.) Similarly for `-`.
    ///
    /// In all cases, the op inside the returned `ast::ClassBracketed` is an
    /// empty union. This empty union should be replaced with the actual item
    /// when it is popped from the parser's stack.
    ///
    /// This assumes the parser is positioned at the opening `[` and advances
    /// the parser to the first non-special byte of the character class.
    ///
    /// An error is returned if EOF is found.
    #[inline(never)]
    fn parse_set_class_open(

This section seems relevant to the issue, so I wanted it to be clear to me.

BurntSushi pushed a commit that referenced this issue Jul 5, 2022
This fixes a bug where the caret in some types of error messages was not
quite correct.

Fixes #792, Closes #794
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants