rust-bakery · tfpk · May 2, 2022 · Jul 1, 2022 · Jul 1, 2022 · Geal
@@ -0,0 +1 @@
+book
@@ -0,0 +1,6 @@
+[book]
+authors = ["Tom Kunc"]
+language = "en"
+multilingual = false
+src = "src"
+title = "The Nom Guide (Nominomicon)"
@@ -0,0 +1,11 @@
+#!/bin/bash
+command="build"
+
+[[ "$1" == "serve" ]] && command="serve"
+
+BOOK_ROOT_PATH="$( cd "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )/.."
+cd $BOOK_ROOT_PATH
+
+[[ ! -e $BOOK_ROOT_PATH/../../target ]] && (cd ../../ && cargo build)
+mdbook test -L $(cd ../../ && pwd)/target/debug/deps/
+mdbook $command
@@ -0,0 +1,15 @@
+## Summary
+
+[Introduction](./introduction.md)
+
+- [Chapter 1: The Nom Way](./chapter_1.md)
+- [Chapter 2: Tags and Character Classes](./chapter_2.md)
+- [Chapter 3: Alternatives and Composition](./chapter_3.md)
+- [Chapter 4: Custom Outputs from Functions](./chapter_4.md)
+- [Chapter 5: Repeating with Predicates](./chapter_5.md)
+- [Chapter 6: Repeating Parsers](./chapter_6.md)
+- [Chapter 7: Using Errors from Outside Nom](./chapter_7.md)
+- [Chapter 8: Streaming vs. Complete](./todo.md)
+- [Chapter 9: Characters vs. Bytes](./todo.md)
+- [Chapter 10: Exercises and Further Reading](./todo.md)
+
@@ -0,0 +1,75 @@
+# Chapter 1: The Nom Way
+
+First of all, we need to understand the way that nom thinks about parsing.
+As discussed in the introduction, nom lets us build simple parsers, and
+then combine them (using "combinators").
+
+Let's discuss what a "parser" actually does. A parser takes an input and returns
+a result, where:
+ - `Ok` indicates the parser successfully found what it was looking for; or
+ - `Err` indicates the parser could not find what it was looking for.
+
+Parsers do more than just return a binary "success"/"failure" code. If
+the parser was successful, then it will return a tuple. The first field of the
+tuple will contain everything the parser did not process. The second will contain
+everything the parser processed. The idea is that a parser can happily parse the first
+*part* of an input, without being able to parse the whole thing.
+
+If the parser failed, then there are multiple errors that could be returned.
+For simplicity, however, in the next chapters we will leave these unexplored.
+
+```text
+                                   ┌─► Ok(
+                                   │      what the parser didn't touch,
+                                   │      what matched the regex
+                                   │   )
+             ┌─────────┐           │
+ my input───►│my parser├──►either──┤
+             └─────────┘           └─► Err(...)
+```
+
+
+To represent this model of the world, nom uses the `IResult<I, O>` type.
+The `Ok` variant has a tuple of `(remaining_input: I, output: O)`;
+whereas the `Err` variant stores an error.
+
+You can import that from:
+
+```rust
+# extern crate nom;
+use nom::IResult;
+```
+
+You'll note that `I` and `O` are parameterized -- while most of the examples in this book
+will be with `&str` (i.e. parsing a string); they do not have to be strings; nor do they
+have to be the same type (consider the simple example where `I = &str`, and `O = u64` -- this
+parses a string into an unsigned integer.)
+
+Let's write our first parser!
+The simplest parser we can write is one which successfully does nothing.
+
+This parser should take in an `&str`:
+
+ - Since it is supposed to succeed, we know it will return the Ok Variant.
+ - Since it does nothing to our input, the remaining input is the same as the input.
+ - Since it doesn't parse anything, it also should just return an empty string.
+
+
+```rust
+# extern crate nom;
+# use nom::IResult;
+# use std::error::Error;
+
+pub fn do_nothing_parser(input: &str) -> IResult<&str, &str> {
+    Ok((input, ""))
+}
+
+fn main() -> Result<(), Box<dyn Error>> {
+    let (remaining_input, output) = do_nothing_parser("my_input")?;
+    assert_eq!(remaining_input, "my_input");
+    assert_eq!(output, "");
+#   Ok(())
+}
+```
+
+It's that easy!
@@ -0,0 +1,111 @@
+# Chapter 2: Tags and Character Classes
+
+The simplest _useful_ parser you can write is one which
+has no special characters, it just matches a string.
+
+In `nom`, we call a simple collection of bytes a tag. Because
+these are so common, there already exists a function called `tag()`.
+This function returns a parser for a given string.
+
+ **Warning**: `nom` has multiple different definitions of `tag`, make sure you use this one for the
+ moment!
+
+```rust,ignore
+# extern crate nom;
+pub use nom::bytes::complete::tag;
+```
+
+For example, code to parse the string `"abc"` could be represented as `tag("abc")`.
+
+If you have not programmed in a language where functions are values, the type signature of them
+tag function might be a surprise:
+
+```rust,ignore
+pub fn tag<T, Input, Error: ParseError<Input>>(
+    tag: T
+) -> impl Fn(Input) -> IResult<Input, Input, Error> where
+    Input: InputTake + Compare<T>,
+    T: InputLength + Clone, 
+```
+
+Or, for the case where `Input` and `T` are both `&str`, and simplifying slightly:
+
+```rust,ignore
+fn tag(tag: &str) -> (impl Fn(&str) -> IResult<&str, Error>)
+```
+
+In other words, this function `tag` *returns a function*. The function it returns is a
+parser, taking a `&str` and returning an `IResult`. Functions creating parsers and 
+returning them is a common pattern in Nom, so it is useful to call out.
+
+Below, we have implemented a function that uses `tag`.
+
+```rust
+# extern crate nom;
+# pub use nom::bytes::complete::tag;
+# pub use nom::IResult;
+# use std::error::Error;
+
+fn parse_input(input: &str) -> IResult<&str, &str> {
+    //  note that this is really creating a function, the parser for abc
+    //  vvvvv 
+    //         which is then called here, returning an IResult<&str, &str>
+    //         vvvvv
+    tag("abc")(input)
+}
+
+fn main() -> Result<(), Box<dyn Error>> {
+    let (leftover_input, output) = parse_input("abcWorld")?;
+    assert_eq!(leftover_input, "World");
+    assert_eq!(output, "abc");
+
+    assert!(parse_input("defWorld").is_err());
+#   Ok(())
+}
+```
+
+If you'd like to, you can also check tags without case-sensitivity
+with the [`tag_no_case`](https://docs.rs/nom/latest/nom/bytes/complete/fn.tag_no_case.html) function.
+
+## Character Classes
+
+Tags are incredibly useful, but they are also incredibly restrictive.
+The other end of Nom's functionality is pre-written parsers that allow us to accept any of a group of characters,
+rather than just accepting characters in a defined sequence.
+
+Here is a selection of them:
+
+- [`alpha0`](https://docs.rs/nom/latest/nom/character/complete/fn.alpha0.html): Recognizes zero or more lowercase and uppercase alphabetic characters: `/[a-zA-Z]/`. [`alpha1`](https://docs.rs/nom/latest/nom/character/complete/fn.alpha1.html) does the same but returns at least one character
+- [`alphanumeric0`](https://docs.rs/nom/latest/nom/character/complete/fn.alphanumeric0.html): Recognizes zero or more numerical and alphabetic characters: `/[0-9a-zA-Z]/`. [`alphanumeric1`](https://docs.rs/nom/latest/nom/character/complete/fn.alphanumeric1.html) does the same but returns at least one character
+- [`digit0`](https://docs.rs/nom/latest/nom/character/complete/fn.digit0.html): Recognizes zero or more numerical characters: `/[0-9]/`. [`digit1`](https://docs.rs/nom/latest/nom/character/complete/fn.digit1.html) does the same but returns at least one character
+- [`multispace0`](https://docs.rs/nom/latest/nom/character/complete/fn.multispace0.html): Recognizes zero or more spaces, tabs, carriage returns and line feeds. [`multispace1`](https://docs.rs/nom/latest/nom/character/complete/fn.multispace1.html) does the same but returns at least one character
+- [`space0`](https://docs.rs/nom/latest/nom/character/complete/fn.space0.html): Recognizes zero or more spaces and tabs. [`space1`](https://docs.rs/nom/latest/nom/character/complete/fn.space1.html) does the same but returns at least one character
+- [`line_ending`](https://docs.rs/nom/latest/nom/character/complete/fn.line_ending.html): Recognizes an end of line (both `\n` and `\r\n`)
+- [`newline`](https://docs.rs/nom/latest/nom/character/complete/fn.newline.html): Matches a newline character `\n`
+- [`tab`](https://docs.rs/nom/latest/nom/character/complete/fn.tab.html): Matches a tab character `\t`
+
+
+We can use these in
+```rust
+# extern crate nom;
+# pub use nom::IResult;
+# use std::error::Error;
+pub use nom::character::complete::alpha0;
+fn parser(input: &str) -> IResult<&str, &str> {
+    alpha0(input)
+}
+
+fn main() -> Result<(), Box<dyn Error>> {
+    let (remaining, letters) = parser("abc123")?;
+    assert_eq!(remaining, "123");
+    assert_eq!(letters, "abc");
+
+#   Ok(())
+}
+```
+
+One important note is that, due to the type signature of these functions,
+it is generally best to use them within a function that returns an `IResult`.
+
+If you don't, some of the information around the type of the `tag` function must be
+manually specified, which can lead to verbose code or confusing errors.
@@ -0,0 +1,142 @@
+# Chapter 3: Alternatives and Composition
+
+In the last chapter, we saw how to create simple parsers using the `tag` function;
+and some of Nom's prebuilt parsers.
+
+In this chapter, we explore two other widely used features of Nom:
+alternatives and composition.
+
+## Alternatives
+
+Sometimes, we might want to choose between two parsers; and we're happy with
+either being used.
+
+Nom gives us a similar ability through the `alt()` combinator.
+
+```rust
+# extern crate nom;
+use nom::branch::alt;
+```
+
+The `alt()` combinator will execute each parser in a tuple until it finds one
+that does not error. If all error, then by default you are given the error from 
+the last error.
+
+We can see a basic example of `alt()` below.
+
+```rust
+# extern crate nom;
+use nom::branch::alt;
+use nom::bytes::complete::tag;
+use nom::IResult;
+# use std::error::Error;
+
+fn parse_abc_or_def(input: &str) -> IResult<&str, &str> {
+    alt((
+        tag("abc"),
+        tag("def")
+    ))(input)
+}
+
+fn main() -> Result<(), Box<dyn Error>> {
+    let (leftover_input, output) = parse_abc_or_def("abcWorld")?;
+    assert_eq!(leftover_input, "World");
+    assert_eq!(output, "abc");
+
+    assert!(parse_abc_or_def("ghiWorld").is_err());
+#   Ok(())
+}
+```
+
+## Composition
+
+Now that we can create more interesting regexes, we can compose them together.
+The simplest way to do this is just to evaluate them in sequence:
+
+```rust
+# extern crate nom;
+use nom::branch::alt;
+use nom::bytes::complete::tag;
+use nom::IResult;
+# use std::error::Error;
+
+fn parse_abc(input: &str) -> IResult<&str, &str> {
+    tag("abc")(input)
+}
+fn parse_def_or_ghi(input: &str) -> IResult<&str, &str> {
+    alt((
+        tag("def"),
+        tag("ghi")
+    ))(input)
+}
+
+fn main() -> Result<(), Box<dyn Error>> {
+    let input = "abcghi";
+    let (remainder, abc) = parse_abc(input)?;
+    let (remainder, def_or_ghi) = parse_def_or_ghi(remainder)?;
+    println!("first parsed: {abc}; then parsed: {def_or_ghi};");
+
+#   Ok(())
+}
+```
+
+Composing tags is such a common requirement that, in fact, Nom has a few built in
+combinators to do it. The simplest of these is `tuple()`. The `tuple()` combinator takes a tuple of parsers,
+and either returns `Ok` with a tuple of all of their successful parses, or it 
+returns the `Err` of the first failed parser.
+
+```rust
+# extern crate nom;
+use nom::sequence::tuple;
+```
+
+
+```rust
+# extern crate nom;
+use nom::branch::alt;
+use nom::sequence::tuple;
+use nom::bytes::complete::tag_no_case;
+use nom::character::complete::{digit1};
+use nom::IResult;
+# use std::error::Error;
+
+fn parse_base(input: &str) -> IResult<&str, &str> {
+    alt((
+        tag_no_case("a"),
+        tag_no_case("t"),
+        tag_no_case("c"),
+        tag_no_case("g")
+    ))(input)
+}
+
+fn parse_pair(input: &str) -> IResult<&str, (&str, &str)> {
+    // the many_m_n combinator might also be appropriate here.
+    tuple((
+        parse_base,
+        parse_base,
+    ))(input)
+}
+
+fn main() -> Result<(), Box<dyn Error>> {
+    let (remaining, parsed) = parse_pair("aTcG")?;
+    assert_eq!(parsed, ("a", "T"));
+    assert_eq!(remaining, "cG");
+
+    assert!(parse_pair("Dct").is_err());
+
+#   Ok(())
+}
+```
+
+
+## Extra Nom Tools
+
+After using `alt()` and `tuple()`, you might also be interested in a few other parsers that do similar things:
+
+| combinator | usage | input | output | comment |
+|---|---|---|---|---|
+| [delimited](https://docs.rs/nom/latest/nom/sequence/fn.delimited.html) | `delimited(char('('), take(2), char(')'))` | `"(ab)cd"` | `Ok(("cd", "ab"))` ||
+| [preceded](https://docs.rs/nom/latest/nom/sequence/fn.preceded.html) | `preceded(tag("ab"), tag("XY"))` | `"abXYZ"` | `Ok(("Z", "XY"))` ||
+| [terminated](https://docs.rs/nom/latest/nom/sequence/fn.terminated.html) | `terminated(tag("ab"), tag("XY"))` | `"abXYZ"` | `Ok(("Z", "ab"))` ||
+| [pair](https://docs.rs/nom/latest/nom/sequence/fn.pair.html) | `pair(tag("ab"), tag("XY"))` | `"abXYZ"` | `Ok(("Z", ("ab", "XY")))` ||
+| [separated_pair](https://docs.rs/nom/latest/nom/sequence/fn.separated_pair.html) | `separated_pair(tag("hello"), char(','), tag("world"))` | `"hello,world!"` | `Ok(("!", ("hello", "world")))` ||