Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I want to implement separated_tuple for 8.0 #1718

Open
BGR360 opened this issue Dec 15, 2023 · 7 comments
Open

I want to implement separated_tuple for 8.0 #1718

BGR360 opened this issue Dec 15, 2023 · 7 comments

Comments

@BGR360
Copy link

BGR360 commented Dec 15, 2023

I'm using nom for Advent of Code 2023 and a pattern I keep using over and over again is "whitespace-separated tuple".

let parser = tuple((
    thing1,
    char(' '),
    thing2,
    char(' '),
    thing3,
)).map(|(thing1, _, thing2, _, thing3)| (thing1, thing2, thing3));

I ended up implementing a separated_tuple combinator so that code can be much cleaner:

let parser = separated_tuple(space1, (thing1, thing2, thing3));

Here is my ugly, undocumented implementation that I'm using for AoC.

I think the implementation is complex enough and useful enough to warrant inclusion in nom rather than in a third party utility library like nom-supreme.

I would like to contribute a cleaner, more user-ready implementation and submit a pull request.

However, after cloning the repo, I noticed that an 8.0.0 is in the works with lots of changes. So my question to you is:

how would a separated_tuple combinator best fit into 8.0.0?

If you give me an idea for how to implement it in the 8.0 paradigm, I will happily send out a PR.

@epage
Copy link
Contributor

epage commented Dec 15, 2023

For myself, I find there are a lot of unpredicatble ways I want to skip fields in a tuple. Having something like https://docs.rs/combine/latest/combine/macro.struct_parser.html that also applies to tuples could help and be a more general form of this proposed combinator.

@coalooball
Copy link

Hello, @BGR360!

I don't think there's much need to come up with a separated_tuple function specifically, we can use separated_list instead of tuple.

fn separated_tuple(s: &str) -> IResult<&str, Vec<&str>> {
    map(
        separated_list1(
            multispace1,
            alt((tag("thing1"), tag("thing2"), tag("thing3"))),
        ),
        |x| x,
    )(s)
}

It's easy to def a function with same functions while no extra _.

FULL CODES BELOW:

use nom::{
    branch::alt, bytes::complete::tag, character::complete::multispace1, combinator::map,
    multi::separated_list1, IResult,
};

fn separated_tuple(s: &str) -> IResult<&str, Vec<&str>> {
    map(
        separated_list1(
            multispace1,
            alt((tag("thing1"), tag("thing2"), tag("thing3"))),
        ),
        |x| x,
    )(s)
}

fn main() {
    assert_eq!(
        separated_tuple("thing1 thing2   thing3").unwrap().1,
        vec!["thing1", "thing2", "thing3"]
    );
}

@BGR360
Copy link
Author

BGR360 commented Dec 21, 2023

@coalooball That will not work when the tuple elements are not of the same type. Plus, even if they were the same type, this doesn't provide any type-safe guarantee that I'm parsing exactly N things.

@epage
Copy link
Contributor

epage commented Dec 21, 2023

What are your thoughts on the more general idea of

let (i, (value1, value2, value3)) = seq!(
    thing1,
    _: space1,
    thing2,
    _: space1,
    thing3,
).parse(i)?;

@coalooball
Copy link

@coalooball That will not work when the tuple elements are not of the same type. Plus, even if they were the same type, this doesn't provide any type-safe guarantee that I'm parsing exactly N things.

Regarding the first point, the method I provided allows heterogeneous things:

fn separated_tuple(s: &str) -> IResult<&str, Vec<&str>> {
    map(
        separated_list1(
            multispace1,
            alt((
                tag("thing1"),
                delimited(tag("\""), alphanumeric1, tag("\"")),
                tag_no_case("thing3"),
            )),
        ),
        |x| x,
    )(s)
}

assert_eq!(
    separated_tuple("THING3 thing2   \"thing1\"").unwrap().1,
    vec!["THING3", "thing2", "thing1"]
);

Concerning the second point, I don t understand type-safe guarantee refers to.

@coalooball
Copy link

What are your thoughts on the more general idea of

let (i, (value1, value2, value3)) = seq!(
    thing1,
    _: space1,
    thing2,
    _: space1,
    thing3,
).parse(i)?;

I prefer this:

let (i, parsers:iter) = seq!(seperated_parser, permutational_parsers:iter).parse(i)?;

@BGR360
Copy link
Author

BGR360 commented Dec 21, 2023

Regarding the first point, the method I provided allows heterogeneous things

@coalooball all of those parsers share the same return type, &str. IMO, the much more common use case is parsers that have different return types:

fn thing1(input: &str) -> IResult<&str, Thing1> {...}
fn thing2(input: &str) -> IResult<&str, Thing2> {...}
fn thing3(input: &str) -> IResult<&str, Thing3> {...}

It's not possible to use separated_list to parse a sequence of parsers with heterogeneous outputs.

Concerning the second point, I don t understand type-safe guarantee refers to.

The output of tuple((thing1, thing2, thing3)) is a tuple, (Thing1, Thing2, Thing3). Rust's type system guarantees that that output has exactly three elements, which is what I want. And nom will automatically produce parse errors if the input has more or fewer than 3 elements.

The output of separated_list(alt((a, b, c))) is Vec<T>, which could be any length. I would have to perform additional error checking on that vector to make sure it's exactly three elements long.

What are your thoughts on the more general idea of

@epage A macro that supports discarding arbitrary elements in a sequence is definitely more flexible than my idea of separated_tuple, which assumes parsing exactly one instance of the same parser between each element in the sequence. I generally prefer a non-macro solution though.

But IMO my idea aligns with the nom ideals. separated_list also assumes exactly one instance of the same parser between each sequence element.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants