Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separated{0,1} could accept non-consuming sep if parser is consuming. #325

Closed
2 tasks done
nicopap opened this issue Sep 2, 2023 · 3 comments
Closed
2 tasks done
Labels
C-enhancement Category: Raise on the bar on expectations

Comments

@nicopap
Copy link
Contributor

nicopap commented Sep 2, 2023

Please complete the following tasks

winnow version

0.5.15

Describe your use case

I have:

  • a parser parser that is guarenteed to consume 1 to N token.
  • a parser sep that may consume 0 to N token.

I want to use separated{0,1} such that sep is the separator and parser the element.

Currently, such a setup may panic in winnow if sep matches 0 tokens.

Click for minimal reproducible example
use winnow::ascii::{alphanumeric1, multispace0};
use winnow::combinator::{alt, delimited, separated0};
use winnow::{PResult, Parser};

fn token_tree(input: &mut &str) -> PResult<()> {
    let token_trees = separated0::<_, _, (), _, _, _, _>(token_tree, multispace0);
    let mut parser = alt((
        alphanumeric1.void(),
        delimited('(', delimited(multispace0, token_trees, multispace0), ')').void(),
    ));
    parser.parse_next(input)
}
fn main() {
    let separated = separated0::<_, _, (), _, _, _, _>;

    let text = std::hint::black_box("   hello (world(hello) world) ()(hello) world      ");
    let parser = separated(token_tree, multispace0).recognize();
    let parsed = delimited(multispace0, parser, multispace0).parse(text);
    assert_eq!(parsed, Ok("hello (world(hello) world) ()(hello) world"));
}

Describe the solution you'd like

The implementation is trivial.

In the implementation of separated1 (and separated0) functions, move the if i.eof_offset() == len block to within the parser.parse_next(i) match, just before the res.accumulate(o), so that the infinite loop check accounts for the option of the parser being consuming, rather than just the sep.

This successfully prevent infinite loops, as we still guarantee the parser advances and never reaches a fixed point.

A side effect is that it makes rustfmt happy and reduces 1 level of indentation.

Alternatives, if applicable

For my specific use-case, the sep in question are whitespaces, and all I need to do is trim the parser output. I also am considering implementing a streaming tokenizer through the Stream trait to not have to care about whitespaces (and comments) in my parser.

Additional Context

When seeing separated, people often think of comma-separated list of elements as you'd see them in ALGOL-like languages. But I think whitespace delimited languages (say LISP) it makes sense to have a whitespace separator.

This use-case came in as I was designing a simple version of the rust TokenTree syntax element.

@nicopap nicopap added the C-enhancement Category: Raise on the bar on expectations label Sep 2, 2023
@epage
Copy link
Collaborator

epage commented Sep 3, 2023

This also recently came up in nom's issue tracker (rust-bakery/nom#1691) which was also discussed in the chat (https://matrix.to/#/!siAWDgffrOjdrhkczN:gitter.im/$dwtC3OMAmaJi7R5gspxJrPdujXnnb60J9gDk3v1bF6k?via=gitter.im&via=matrix.org&via=xiretza.xyz)

For easier reference for others, the relevant code is here: https://github.com/winnow-rs/winnow/blob/main/src/combinator/multi.rs#L303

@epage
Copy link
Collaborator

epage commented Sep 3, 2023

If I'm understanding your proposal, it is taking it from "sep must not be empty" to "(sep or parser) must not be empty".

On the surface that seems reasonable. There is a less direct, human aspect that I'm concerned about which is by having the assert fire less often, it'll make it less obvious and be a bug lurking in programs.

@epage
Copy link
Collaborator

epage commented Feb 12, 2024

This was also discussed in #365.

I'm going to close this. Higher-level built-in parsers are provided for convenience and don't have to be all things to all people. Rather than having logic that might be more difficult to predict, I think we should be consistent.

If there is a reason we should re-evaluate this, let us know!

@epage epage closed this as not planned Won't fix, can't repro, duplicate, stale Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: Raise on the bar on expectations
Projects
None yet
Development

No branches or pull requests

2 participants