separated_list1 by multipspace0 sep failed #1691

palpink · 2023-08-27T02:49:13Z

Hello, and thank you for submitting an issue to nom!

First, please note that, for family reasons, I have limited time to work on
nom, so following the advice here will make sure I will quickly understand
your problem and answer as soon as possible.
Second, if I don't get to work on your issue quickly, that does not mean I
don't consider it important or useful. Major version releases happen once
a year, and a lot of fixes are done for the occasion, once I have had time
to think of the right solution. So I will get back to you :)

Prerequisites

Here are a few things you should provide to help me understand the issue:

Rust version : rustc 1.71.1 (eb26296b5 2023-08-03)
nom version :7.1.3
nom compilation features used: default

Test case

Please provide a short, complete (with crate import, etc) test case for
the issue, showing clearly the expected and obtained results.

Example test case:

    #[test]
    fn test_separated_list1() {
        fn parser(input: &str) -> IResult<&str, Vec<&str>> {
            separated_list1(multispace0, tag("abc"))(input)
        }
        assert_eq!(parser("abcabc"), Ok(("", vec!["abc", "abc"])));
       // thread 'parser::tests::test_cypher' panicked at 'assertion failed: `(left == right)`
       // left: `Err(Error(Error { input: "abc", code: SeparatedList }))`,
       // right: `Ok(("", ["abc", "abc"]))`
    }

The text was updated successfully, but these errors were encountered:

epage · 2023-08-28T14:13:21Z

While its not in the docs, in the code is this comment

infinite loop check: the parser must always consume

Meaning, the separator parser. I assume this is protecting against writing code where both the separator and the core type are allowed to be empty, you'd get an infinite loop, so nom seems to say the core type may be empty but the separator must not be.

SaltyKitkat · 2023-11-14T12:16:31Z

Maybe related with #1573

gdennie · 2024-01-27T12:26:05Z

The sep parser in seperated_list must not be optional because if it is not present then consecutive items become just one item.

Perhaps the required sep parser is a alternate separators,

alt((sep1, sep2, ...))

Taywee · 2024-03-25T15:42:20Z

I ran into the same thing. Is there a more accepted pattern currently for a sequence of items that are optionally separated by whitespace? I've switched something like a many1(preceded(multispace0, my_parser)), but that doesn't work exactly the same because it allows leading separators (which isn't actually a problem in my context, at least), and separated_list1 would read more clearly.

gdennie · 2024-03-25T16:05:29Z

If the separator is optional then what is to separate two consecutive elements when the separator is absent?

without testing, this seems to be what you might want...

many1(tuple((element, many0(sep))))

Rust saves us from ourselves ;)

Taywee · 2024-03-25T17:08:17Z

The return type on that isn't great, though. You'd have to map the tuple combinator to get the elements on their own. preceded or terminated are probably better options.

Geal · 2024-04-21T17:04:34Z

that check for non consuming input in repeating parser dates from the very beginning of nom. People kept running into issues with parsers going into infinite loop, in particular due to optional whitespace. Even now in some user tests without the check people trip on it 😅

Taywee · 2024-04-22T20:29:01Z

Sure, that makes sense, but I think it would be more robustly handled by returning an error in the case that both sep and f both parse empty in a row, rather than one or the other. As long as one of the two isn't parsing empty, there's no risk of an infinite loop.

gdennie · 2024-04-23T01:53:19Z

An optional sep in a separated list is conceptually invalid because if the separator does not exist the the elements are not separated and you then do not have the presence of two elements but one. Perhaps what is regarded as an optional is really an alternate combinator such as comma or space.

Another situation is when the elements themselves are bracketed and the separator is purely cosmetic such as a sequence of quoted bracketed strings. However, this later case isn't a separated list but a sequence of elements with optional suffix (or prefix).

I wonder if we cand define traits and auto implement them against parser results so as to use bounds to better document this functionality.

...pseudo

trait Consuming {}
trait NonConsuming {}
impl<..> Consuming for fn(I)->IResult<..>
impl<..> NonConsuming for fn(I)->IResult<I,Option<O>,E>
impl<..> NonConsuming for fn(I)->IResult<I,Result<O,E2>,E>

Xiretza · 2024-04-23T06:13:12Z

An optional sep in a separated list is conceptually invalid

You present this as fact when it's really just your opinion. The difference between separated_list and a prefix/suffix per element is that the former only looks for the separator between pairs of elements, while the latter also looks for a separator at the very beginning (prefix) or end (suffix). They are not equivalent, regardless of whether the separator is optional.

Taywee · 2024-04-23T17:42:09Z

if the separator does not exist the the elements are not separated and you then do not have the presence of two elements but one

That depends on what's being parsed. In my case, I have a list of {...} blocks, which I want to be separated by any amount of whitespace (including zero whitespace). I don't want to allow leading or trailing whitespace, and I always want at least one block. separated_list1(multispace0, block) would have been perfect for my case, but it gave an unhelpful error instead, which is how I found this issue.

Perhaps what is regarded as an optional is really an alternate combinator such as comma or space.

That's not a problem, because if it's not optional, it's not optional. If somebody uses alt((tag(','), tag(' '))) or something, it can't be skipped anyway.

separated_list1(foo, bar) is a parser that has a mandatory list of bar items separated by mandatory foo items.

separated_list1(opt(foo), bar) is a parser with bar items possibly separated by foo items.

In no situation should it be unclear whether the separator is optional or not, because you can always use a mandatory combinator if you want it to be mandatory. This is one of the major strengths of combinators, that you can build exactly the behavior you want from the building blocks. The suggestion is not making the separator optional, but allowing it to be a potentially non-consuming parser.

gdennie · 2024-04-24T04:02:16Z

Actually, the reason I am on this thread is because I had a similar presumption about this functionality for the optionality of sep. Specifically, that sep should be capable of being optional due to the fact that it, sep, and the element parser are self delimiting maximally consuming independent parsers. Perhaps a new version of separated_list, separatedb_list, can be added to the library that implements this wildly expected behaviour.

Incidentally, my apologizes if I sound emphatic or definitive. I am an infinite novice in this and many things. :)

// test cases: 
(sep, element) => [element]
(sep, opt_element) => [opt_element]
(opt_sep, element) => [element]
(opt_sep, opt_element) => [opt_element]
(element, element) => [element;1]
(opt_element, opt_element) => Err

Geal · 2024-05-05T16:46:04Z

I have moved the loop check in separated_list0 and separated_list1 to cover the application of both parsers: #1756
As @Taywee said

Sure, that makes sense, but I think it would be more robustly handled by returning an error in the case that both sep and f both parse empty in a row, rather than one or the other. As long as one of the two isn't parsing empty, there's no risk of an infinite loop.

This will offer enough protection against infinite loops

gdennie · 2024-05-05T17:48:57Z

We should probably remove the comment that sep must be consuming...

/// # Arguments
/// * `sep` Parses the separator between list elements. Must be consuming.
/// * `f` Parses the elements of the list.

epage mentioned this issue Sep 3, 2023

separated{0,1} could accept non-consuming sep if parser is consuming. winnow-rs/winnow#325

Closed

2 tasks

SaltyKitkat mentioned this issue Nov 14, 2023

fix: support empty sep for separated_list0 & separated_list1 #1491

Closed

gdennie mentioned this issue Jan 27, 2024

stipulate sep parser must consume #1723

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

separated_list1 by multipspace0 sep failed #1691

separated_list1 by multipspace0 sep failed #1691

palpink commented Aug 27, 2023

epage commented Aug 28, 2023

SaltyKitkat commented Nov 14, 2023

gdennie commented Jan 27, 2024 •

edited

Taywee commented Mar 25, 2024 •

edited

gdennie commented Mar 25, 2024

Taywee commented Mar 25, 2024

Geal commented Apr 21, 2024

Taywee commented Apr 22, 2024

gdennie commented Apr 23, 2024 •

edited

Xiretza commented Apr 23, 2024

Taywee commented Apr 23, 2024

gdennie commented Apr 24, 2024 •

edited

Geal commented May 5, 2024

gdennie commented May 5, 2024

separated_list1 by multipspace0 sep failed #1691

separated_list1 by multipspace0 sep failed #1691

Comments

palpink commented Aug 27, 2023

Prerequisites

Test case

epage commented Aug 28, 2023

SaltyKitkat commented Nov 14, 2023

gdennie commented Jan 27, 2024 • edited

Taywee commented Mar 25, 2024 • edited

gdennie commented Mar 25, 2024

Taywee commented Mar 25, 2024

Geal commented Apr 21, 2024

Taywee commented Apr 22, 2024

gdennie commented Apr 23, 2024 • edited

Xiretza commented Apr 23, 2024

Taywee commented Apr 23, 2024

gdennie commented Apr 24, 2024 • edited

Geal commented May 5, 2024

gdennie commented May 5, 2024

gdennie commented Jan 27, 2024 •

edited

Taywee commented Mar 25, 2024 •

edited

gdennie commented Apr 23, 2024 •

edited

gdennie commented Apr 24, 2024 •

edited