Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separated_list1 by multipspace0 sep failed #1691

Open
palpink opened this issue Aug 27, 2023 · 14 comments
Open

separated_list1 by multipspace0 sep failed #1691

palpink opened this issue Aug 27, 2023 · 14 comments

Comments

@palpink
Copy link

palpink commented Aug 27, 2023

Hello, and thank you for submitting an issue to nom!

First, please note that, for family reasons, I have limited time to work on
nom, so following the advice here will make sure I will quickly understand
your problem and answer as soon as possible.
Second, if I don't get to work on your issue quickly, that does not mean I
don't consider it important or useful. Major version releases happen once
a year, and a lot of fixes are done for the occasion, once I have had time
to think of the right solution. So I will get back to you :)

Prerequisites

Here are a few things you should provide to help me understand the issue:

  • Rust version : rustc 1.71.1 (eb26296b5 2023-08-03)
  • nom version :7.1.3
  • nom compilation features used: default

Test case

Please provide a short, complete (with crate import, etc) test case for
the issue, showing clearly the expected and obtained results.

Example test case:

    #[test]
    fn test_separated_list1() {
        fn parser(input: &str) -> IResult<&str, Vec<&str>> {
            separated_list1(multispace0, tag("abc"))(input)
        }
        assert_eq!(parser("abcabc"), Ok(("", vec!["abc", "abc"])));
       // thread 'parser::tests::test_cypher' panicked at 'assertion failed: `(left == right)`
       // left: `Err(Error(Error { input: "abc", code: SeparatedList }))`,
       // right: `Ok(("", ["abc", "abc"]))`
    }
@epage
Copy link
Contributor

epage commented Aug 28, 2023

While its not in the docs, in the code is this comment

infinite loop check: the parser must always consume

Meaning, the separator parser. I assume this is protecting against writing code where both the separator and the core type are allowed to be empty, you'd get an infinite loop, so nom seems to say the core type may be empty but the separator must not be.

@SaltyKitkat
Copy link

Maybe related with #1573

@gdennie
Copy link
Contributor

gdennie commented Jan 27, 2024

The sep parser in seperated_list must not be optional because if it is not present then consecutive items become just one item.

Perhaps the required sep parser is a alternate separators,

alt((sep1, sep2, ...))

@Taywee
Copy link

Taywee commented Mar 25, 2024

I ran into the same thing. Is there a more accepted pattern currently for a sequence of items that are optionally separated by whitespace? I've switched something like a many1(preceded(multispace0, my_parser)), but that doesn't work exactly the same because it allows leading separators (which isn't actually a problem in my context, at least), and separated_list1 would read more clearly.

@gdennie
Copy link
Contributor

gdennie commented Mar 25, 2024

If the separator is optional then what is to separate two consecutive elements when the separator is absent?

without testing, this seems to be what you might want...

many1(tuple((element, many0(sep))))

Rust saves us from ourselves ;)

@Taywee
Copy link

Taywee commented Mar 25, 2024

The return type on that isn't great, though. You'd have to map the tuple combinator to get the elements on their own. preceded or terminated are probably better options.

@Geal
Copy link
Collaborator

Geal commented Apr 21, 2024

that check for non consuming input in repeating parser dates from the very beginning of nom. People kept running into issues with parsers going into infinite loop, in particular due to optional whitespace. Even now in some user tests without the check people trip on it 😅

@Taywee
Copy link

Taywee commented Apr 22, 2024

Sure, that makes sense, but I think it would be more robustly handled by returning an error in the case that both sep and f both parse empty in a row, rather than one or the other. As long as one of the two isn't parsing empty, there's no risk of an infinite loop.

@gdennie
Copy link
Contributor

gdennie commented Apr 23, 2024

An optional sep in a separated list is conceptually invalid because if the separator does not exist the the elements are not separated and you then do not have the presence of two elements but one. Perhaps what is regarded as an optional is really an alternate combinator such as comma or space.

Another situation is when the elements themselves are bracketed and the separator is purely cosmetic such as a sequence of quoted bracketed strings. However, this later case isn't a separated list but a sequence of elements with optional suffix (or prefix).

I wonder if we cand define traits and auto implement them against parser results so as to use bounds to better document this functionality.

...pseudo

trait Consuming {}
trait NonConsuming {}
impl<..> Consuming for fn(I)->IResult<..>
impl<..> NonConsuming for fn(I)->IResult<I,Option<O>,E>
impl<..> NonConsuming for fn(I)->IResult<I,Result<O,E2>,E>

@Xiretza
Copy link
Contributor

Xiretza commented Apr 23, 2024

An optional sep in a separated list is conceptually invalid

You present this as fact when it's really just your opinion. The difference between separated_list and a prefix/suffix per element is that the former only looks for the separator between pairs of elements, while the latter also looks for a separator at the very beginning (prefix) or end (suffix). They are not equivalent, regardless of whether the separator is optional.

@Taywee
Copy link

Taywee commented Apr 23, 2024

if the separator does not exist the the elements are not separated and you then do not have the presence of two elements but one

That depends on what's being parsed. In my case, I have a list of {...} blocks, which I want to be separated by any amount of whitespace (including zero whitespace). I don't want to allow leading or trailing whitespace, and I always want at least one block. separated_list1(multispace0, block) would have been perfect for my case, but it gave an unhelpful error instead, which is how I found this issue.

Perhaps what is regarded as an optional is really an alternate combinator such as comma or space.

That's not a problem, because if it's not optional, it's not optional. If somebody uses alt((tag(','), tag(' '))) or something, it can't be skipped anyway.

separated_list1(foo, bar) is a parser that has a mandatory list of bar items separated by mandatory foo items.

separated_list1(opt(foo), bar) is a parser with bar items possibly separated by foo items.

In no situation should it be unclear whether the separator is optional or not, because you can always use a mandatory combinator if you want it to be mandatory. This is one of the major strengths of combinators, that you can build exactly the behavior you want from the building blocks. The suggestion is not making the separator optional, but allowing it to be a potentially non-consuming parser.

@gdennie
Copy link
Contributor

gdennie commented Apr 24, 2024

Actually, the reason I am on this thread is because I had a similar presumption about this functionality for the optionality of sep. Specifically, that sep should be capable of being optional due to the fact that it, sep, and the element parser are self delimiting maximally consuming independent parsers. Perhaps a new version of separated_list, separatedb_list, can be added to the library that implements this wildly expected behaviour.

Incidentally, my apologizes if I sound emphatic or definitive. I am an infinite novice in this and many things. :)

// test cases: 
(sep, element) => [element]
(sep, opt_element) => [opt_element]
(opt_sep, element) => [element]
(opt_sep, opt_element) => [opt_element]
(element, element) => [element;1]
(opt_element, opt_element) => Err

@Geal
Copy link
Collaborator

Geal commented May 5, 2024

I have moved the loop check in separated_list0 and separated_list1 to cover the application of both parsers: #1756
As @Taywee said

Sure, that makes sense, but I think it would be more robustly handled by returning an error in the case that both sep and f both parse empty in a row, rather than one or the other. As long as one of the two isn't parsing empty, there's no risk of an infinite loop.

This will offer enough protection against infinite loops

@gdennie
Copy link
Contributor

gdennie commented May 5, 2024

We should probably remove the comment that sep must be consuming...

/// # Arguments
/// * `sep` Parses the separator between list elements. Must be consuming.
/// * `f` Parses the elements of the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants