Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document (or fix?) the fact that escaped() succeeds on empty inputs #1735

Open
LunarLambda opened this issue Mar 26, 2024 · 7 comments
Open

Comments

@LunarLambda
Copy link

LunarLambda commented Mar 26, 2024

Combinators such as many1 explicitly state they fail if the input parser accepts empty inputs. For this reason, most parsers like is_not fail on empty inputs.

However, escaped immediately returns Ok when given an empty input without even attempting to run the inner parser, which otherwise would correctly return an error, which escaped would then correctly pass up when it detects that the error was due to empty input. I'm unsure whether this is intentional or an oversight, as a code path doing the right thing does exist, but is not triggered.

This leads to an issue where wrapping an existing parser in escaped can change its behaviour:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=e37a7669c47bed92d6853e788ba0725b

Workaround in my case seems to be wrapping the escaped in verify(..., |s: &str| !s.is_empty())

  • Rust version : 1.76.0
  • nom version : 7.13
  • nom compilation features used: default
@LunarLambda LunarLambda changed the title Document that escaped() succeeds on empty inputs Document (or fix?) the fact that escaped() succeeds on empty inputs Mar 26, 2024
@epage
Copy link
Contributor

epage commented Mar 26, 2024

If empty inputs were to be handled by the "normal" parser and propagate the error up, what would you expect escaped to do with a content that is all escaped values (e.g. (\t\t\t)? In that example, there are 4 empty strings which would be an error.

From an implementation perspective, its as-if you were running with opt(normal), making it so any error gets discarded.

However, that wouldn't preclude a escaped1 from existing like there is take_till and take_til1, etc if the maintainer was up for it. Personally, I am curious what the use case is. Most of the time I see escaped characters, they are put in delimiters ("") and 0 or more allowed.

@LunarLambda
Copy link
Author

LunarLambda commented Mar 26, 2024

I don't follow. escaped handles \t\t\t just fine.

main parser gets \t\t\t as input, hits \, rejects it, escaped consumes \, runs second parser, which consumes t. Rinse and repeat. The while loop in escaped terminates when there's no input left.

The issue is about how escaped does not work with the multi combinators because it returns Ok for an empty input string, which is both unexpected and arguably incorrect with how basically the entire library is designed (almost all parsers refuse empty inputs unless they have a 0 variant.)

There's also other issues with escaped, making implementing e.g. quoted strings difficult. Even nom's own example on escaped string literals doesn't use escaped. But that's not relevant to this specific issue. Playground link

@epage
Copy link
Contributor

epage commented Mar 26, 2024

The issue is about how escaped does not work with the multi combinators because it returns Ok for an empty input string, which is both unexpected and arguably incorrect with how basically the entire library is designed (almost all parsers refuse empty inputs unless they have a 0 variant.)

That was in response specifically to:

Ok when given an empty input without even attempting to run the inner parser, which otherwise would correctly return an error,

arguably incorrect with how basically the entire library is designed (almost all parsers refuse empty inputs unless they have a 0 variant.)

This doesn't match my experience with nom

@LunarLambda
Copy link
Author

  • Parsers with neither a 0 or 1 variant basically never accept empty inputs (tag, char, etc.)
  • Parsers with only a 1 variant accept empty inputs (take_*, seems like nothing else in nom has this)
  • Parsers with both variants are obvious and behave accordingly.

many_till seems to be special by virtue of how it works, which is "run f until g succeeds", and looking at the docs f has to succeed at least once.

Being in the bytes module, escaped should behave like the other bytes parsers, i.e. fail on empty inputs unless a 1 variant exists. I'm sorry but I don't see your argument.

@epage
Copy link
Contributor

epage commented Mar 26, 2024

many_till seems to be special by virtue of how it works, which is "run f until g succeeds", and looking at the docs f has to succeed at least once.

I'm not seeing what in the docs is leading to that. However, when I look at the code, g is tried first and immediately returns if it succeeds, independent of f.

Being in the bytes module, escaped should behave like the other bytes parsers, i.e. fail on empty inputs unless a 1 variant exists. I'm sorry but I don't see your argument.

I had misunderstood and thought you were keying off of whether a 0 variant exists. Instead, you are reading into the API that if a 1 variant exists, then the non-1 variant is implicitly 0 and if there isn't a 1 variant, then its implicitly a 1 variant.

not_line_ending breaks this "rule". Without there being something documented (which I've not seen), I think this is trying to make rules where there are none.

@LunarLambda
Copy link
Author

not_line_ending has neither a 0 or 1 variant and errors on empty inputs. That's perfectly consistent with all the other ones.

@LunarLambda
Copy link
Author

This discussion is getting pretty tiresome, it has nothing to do with my actual issue (which is, at best a one line documentation change, and at worst a legitimate bug)

The consistency doesn't actually matter. It's merely a supporting argument for why this edge case should be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants