rust-bakery · tfpk · May 2, 2022 · Jul 1, 2022 · Jul 1, 2022 · Xiretza
@@ -0,0 +1 @@
+book
@@ -0,0 +1,6 @@
+[book]
+authors = ["Tom Kunc"]
+language = "en"
+multilingual = false
+src = "src"
+title = "The Nom Guide (Nominomicon)"
@@ -0,0 +1,6 @@
+#!/bin/bash
+BOOK_ROOT_PATH="$( cd "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )/.."
+cd $BOOK_ROOT_PATH
+
+[[ ! -e $BOOK_ROOT_PATH/../../target ]] && (cd ../../ && cargo build)
+mdbook test -L $(cd ../../ && pwd)/target/debug/deps/
@@ -0,0 +1,17 @@
+## Summary
+
+[Introduction](./introduction.md)
+
+- [Chapter 1: The Nom Way](./chapter_1.md)
+- [Chapter 2: Tags and Character Classes](./chapter_2.md)
+- [Chapter 3: Alternatives and Composition](./chapter_3.md)
+- [Chapter 4: Custom Outputs from Functions](./chapter_4.md)
+- [Chapter 5: Parsing Functions](./todo.md)
+- [Chapter 6: Repeated Inputs](./todo.md)
+- [Chapter 7: Simple Exercises](./todo.md)
+- [Chapter 8: Custom Errors in Functions](./todo.md)
+- [Chapter 9: Modifiers](./todo.md)
+- [Chapter 10: Characters vs. Bytes](./todo.md)
+- [Chapter 11: Streaming vs. Complete](./todo.md)
+- [Chapter 12: Complex Exercises](./todo.md)
+
@@ -0,0 +1,82 @@
+# Chapter 1: The Nom Way
+
+First of all, we need to understand the way that regexes and nom think about
+parsing.
+
+A regex, in a sense, controls its whole input. Given a single input,
+it decides that either some text **did** match the regex, or it **didn't**.
+
+```text
+             ┌────────┐           ┌─► Some text that matched the regex
+ my input───►│my regex├──►either──┤
+             └────────┘           └─► None
+```
+
+As we mentioned above, Nom parsers are designed to be combined.
+This makes the assumption that a regex controls its entire input
+more difficult to maintain. So, there are three important changes
+required to our mental model of a regex.
+
+1. Rather than just returning the text that matched
+   the regex, Nom tells you *both* what it parsed, and what is left
+   to parse.
+
+2. Additionally, to help with combining parsers, Nom also gives you
+   error information about your parser. We'll talk about this more later,
+   for now let's assume it's "basically" the same as the `None` we have above.
+
+   Points 1 and 2 are illustrated in the diagram below:
+
+```text
+                                   ┌─► Ok(
+                                   │      text that the parser didn't touch,
+                                   │      text that matched the regex
+                                   │   )
+             ┌─────────┐           │
+ my input───►│my parser├──►either──┤
+             └─────────┘           └─► Err(...)
+```
+
+3. Lastly, Nom parsers are normally anchored to the beginning of their input.
+   In other words, if you converted a Nom parser to regex, it would generally
+   begin with `/^/`. This is sensible, because it means that nom parsers must
+   (conceptually) be sequential -- your parser isn't going to jump
+   ahead and start parsing the middle of the line.
+
+
+To represent this model of the world, nom uses the `IResult<(I, O)>` type.
+The `Ok` variant has a tuple of `(remaining_input: I, output: O)`;
+The `Err` variant stores an error. You can import that from:
+
+```rust
+# extern crate nom;
+use nom::IResult;
+```
+
+The simplest parser we can write is one which successfully does nothing.
+In other words, the regex `/^/`.
+
+This parser should take in an `&str`.
+    - Since it is supposed to succeed, we know it will return the Ok Variant.
+    - Since it does nothing to our input, the remaining input is the same as the input.
+    - Since it doesn't do anything, it also should just return the unit type.
+
+
+In other words, this code should be equivalent to the regex `/^/`.
+
+```rust
+# extern crate nom;
+# use nom::IResult;
+
+pub fn do_nothing_parser(input: &str) -> IResult<&str, ()> {
+    Ok((input, ()))
+}
+
+match do_nothing_parser("my_input") {
+    Ok((remaining_input, output)) => {
+        assert_eq!(remaining_input, "my_input");
+        assert_eq!(output, ());
+    },
+    Err(_) => unreachable!()
+}
+```
@@ -0,0 +1,105 @@
+# Chapter 2: Tags and Character Classes
+
+The simplest _useful_ regex you can write is one which
+has no special characters, it just matches a string.
+
+Imagine, for example, the regex `/abc/`. It simply matches when the string
+`"abc"` occurs.
+
+In `nom`, we call a simple collection of bytes a tag. Because
+these are so common, there already exists a function called `tag()`.
+This function returns a parser for a given string.
+
+<div class="example-wrap" style="display:inline-block"><pre class="compile_fail" style="white-space:normal;font:inherit;">
+ **Warning**: `nom` has multiple different definitions of `tag`, make sure you use this one for the
+ moment!
+</pre></div>
+
+```rust
+# extern crate nom;
+pub use nom::bytes::complete::tag;
+```
+
+For example, the regex `/abc/` (really, the regex `/^abc/`)
+could be represented as `tag("abc")`.
+
+Note, that the function `tag` will return
+another function, namely, a parser for the tag you requested.
+
+Below, we see a function using this:
+
+```rust
+# extern crate nom;
+# pub use nom::bytes::complete::tag;
+# pub use nom::IResult;
+
+fn parse_input(input: &str) -> IResult<&str, &str> {
+    //  note that this is really creating a function, the parser for abc
+    //  vvvvv 
+    //         which is then called here, returning an IResult<&str, &str>
+    //         vvvvv
+    tag("abc")(input)
+}
+
+    let ok_input = "abcWorld";
+
+    match parse_input(ok_input) {
+        Ok((leftover_input, output)) => {
+            assert_eq!(leftover_input, "World");
+            assert_eq!(output, "abc");
+        },
+        Err(_) => unreachable!()
+    }
+
+    let err_input = "defWorld";
+    match parse_input(err_input) {
+        Ok((leftover_input, output)) => unreachable!(),
+        Err(_) => assert!(true),
+    }
+```
+
+If you'd like to, you can also check case insensitive `/tag/i`
+with the `tag_case_insensitive`.
+
+## Character Classes
+
+Tags are incredibly useful, but they are also incredibly restrictive.
+The other end of Nom's functionality is pre-written parsers that allow us to accept any of a group of characters,
+rather than just accepting characters in a defined sequence.
+
+Here is a selection of them:
+
+- [`alpha0`](https://docs.rs/nom/latest/nom/character/complete/fn.alpha0.html): Recognizes zero or more lowercase and uppercase alphabetic characters: `/[a-zA-Z]/`. [`alpha1`](https://docs.rs/nom/latest/nom/character/complete/fn.alpha1.html) does the same but returns at least one character
+- [`alphanumeric0`](https://docs.rs/nom/latest/nom/character/complete/fn.alphanumeric0.html): Recognizes zero or more numerical and alphabetic characters: `/[0-9a-zA-Z]/`. [`alphanumeric1`](https://docs.rs/nom/latest/nom/character/complete/fn.alphanumeric1.html) does the same but returns at least one character
+- [`digit0`](https://docs.rs/nom/latest/nom/character/complete/fn.digit0.html): Recognizes zero or more numerical characters: `/[0-9]/`. [`digit1`](https://docs.rs/nom/latest/nom/character/complete/fn.digit1.html) does the same but returns at least one character
+- [`multispace0`](https://docs.rs/nom/latest/nom/character/complete/fn.multispace0.html): Recognizes zero or more spaces, tabs, carriage returns and line feeds. [`multispace1`](https://docs.rs/nom/latest/nom/character/complete/fn.multispace1.html) does the same but returns at least one character
+- [`space0`](https://docs.rs/nom/latest/nom/character/complete/fn.space0.html): Recognizes zero or more spaces and tabs. [`space1`](https://docs.rs/nom/latest/nom/character/complete/fn.space1.html) does the same but returns at least one character
+- [`line_ending`](https://docs.rs/nom/latest/nom/character/complete/fn.line_ending.html): Recognizes an end of line (both `\n` and `\r\n`)
+- [`newline`](https://docs.rs/nom/latest/nom/character/complete/fn.newline.html): Matches a newline character `\n`
+- [`tab`](https://docs.rs/nom/latest/nom/character/complete/fn.tab.html): Matches a tab character `\t`
+
+
+We can use these in
+```rust
+# extern crate nom;
+# pub use nom::IResult;
+pub use nom::character::complete::alpha0;
+fn parser(input: &str) -> IResult<&str, &str> {
+    alpha0(input)
+}
+
+    let ok_input = "abc123";
+    match parser(ok_input) {
+        Ok((remaining, letters)) => {
+            assert_eq!(remaining, "123");
+            assert_eq!(letters, "abc");
+        },
+        Err(_) => unreachable!()
+    }
+
+```
+
+One important note is that, due to the type signature of these functions,
+it is generally best to use them within a function that returns an `IResult`.
+
+*TODO* : Better explaination of why.
@@ -0,0 +1,124 @@
+# Chapter 3: Alternatives and Composition
+
+In the last chapter, we saw how to convert a simple regex into a nom parser.
+In this chapter, we explore features two other very important features of Nom,
+alternatives, and composition.
+
+## Alternatives
+
+In regex, we can write `/(^abc|^def)/`, which means "match either `/^abc/` or `/^def/`".
+Nom gives us a similar ability through the `alt()` combinator.
+
+```rust
+# extern crate nom;
+use nom::branch::alt;
+```
+
+The `alt()` combinator will execute each parser in a tuple until it finds one
+that does not error. If all error, then by default you are given the error from 
+the last error.
+We can see a basic example of `alt()` below.
+
+```rust
+# extern crate nom;
+use nom::branch::alt;
+use nom::bytes::complete::tag;
+use nom::IResult;
+
+fn parse_abc_or_def(input: &str) -> IResult<&str, &str> {
+    alt((
+        tag("abc"),
+        tag("def")
+    ))(input)
+}
+
+    match parse_abc_or_def("abcWorld") {
+        Ok((leftover_input, output)) => {
+            assert_eq!(leftover_input, "World");
+            assert_eq!(output, "abc");
+        },
+        Err(_) => unreachable!()
+    }
+
+    match parse_abc_or_def("ghiWorld") {
+        Ok((leftover_input, output)) => unreachable!(),
+        Err(_) => assert!(true),
+    }
+```
+
+## Composition
+
+Now that we can create more interesting regexes, we can compose them together.
+The simplest way to do this is just to evaluate them in sequence:
+
+```rust
+# extern crate nom;
+use nom::branch::alt;
+use nom::bytes::complete::tag;
+use nom::IResult;
+
+fn parse_abc(input: &str) -> IResult<&str, &str> {
+    tag("abc")(input)
+}
+fn parse_def_or_ghi(input: &str) -> IResult<&str, &str> {
+    alt((
+        tag("def"),
+        tag("ghi")
+    ))(input)
+}
+
+    let input = "abcghi";
+    if let Ok((remainder, abc)) = parse_abc(input) {
+        if let Ok((remainder, def_or_ghi)) = parse_def_or_ghi(remainder) {
+            println!("first parsed: {abc}; then parsed: {def_or_ghi};");
+        }
+    }
+
+```
+
+Composing tags is such a common requirement that, in fact, Nom has a few built in
+combinators to do it. The simplest of these is `tuple()`. The `tuple()` combinator takes a tuple of parsers,
+and either returns `Ok` with a tuple of all of their successful parses, or it 
+returns the `Err` of the first failed parser.
+
+```rust
+# extern crate nom;
+use nom::branch::alt;
+use nom::bytes::complete::{tag};
+use nom::character::complete::{digit1};
+use nom::IResult;
+
+fn parse_numbers_or_abc(input: &str) -> IResult<&str, &str> {
+    alt((
+        tag("abc"),
+        digit1
+    ))(input)
+}
+
+
+    let input = "abc";
+    let parsed_input = parse_numbers_or_abc(input);
+    match parsed_input {
+        Ok((_, matched_str)) => assert_eq!(matched_str, "abc"),
+        Err(_) => unreachable!()
+    }
+
+
+    let input = "def";
+    let parsed_input = parse_numbers_or_abc(input);
+    match parsed_input {
+        Ok(_) => unreachable!(),
+        Err(_) => assert!(true)
+    }
+```
+
+
+## Extra Nom Tools
+
+After using `alt()` and `tuple()`, you might also be interested in the `permutation()` parser, which
+requires all of the parsers it contains to succeed, but in any order.
+
+```rust
+# extern crate nom;
+use nom::branch::permutation;
+```
@@ -0,0 +1 @@
+# Chapter 4: Custom Outputs from Functions
@@ -0,0 +1,31 @@
+# The Nom Guide
+
+Welcome to The Nom Guide (or, the nominomicon); a guide to using the Nom parser for great good.
+This guide is written to take you from an understanding of Regular Expressions, to an understanding
+of Nom.
+
+This guide assumes that you are:
+ - Wanting to learn Nom,
+ - Already familiar with regular expressions (at least, somewhat), and
+ - Already familiar with Rust.
+
+Nom is a parser-combinator library. In other words, it gives you tools to define:
+ - "parsers" (a function that takes an input, and gives back an output), and
+ - "combinators" (functions that take parsers, and _combine_ them together!).
+
+By combining parsers with combinators, you can build complex parsers up from
+simpler ones. These complex parsers are enough to understand HTML, mkv or Python!
+
+Before we set off, it's important to list some caveats:
+ - This guide is for Nom7. Nom has undergone significant changes, so if
+   you are searching for documentation or StackOverflow answers, you may
+   find older documentation. Some common indicators that it is an old version are:
+    - Documentation older than 21st August, 2021
+    - Use of the `named!` macro
+    - Use of `CompleteStr` or `CompleteByteArray`.
+ - Nom can parse (almost) anything; but this guide will focus entirely on parsing
+   complete `&str` into things.
+
+And finally, some nomenclature:
+ - In this guide, regexes will be denoted inside slashes (for example `/abc/`)
+   to distinguish them from regular strings.
@@ -0,0 +1 @@
+# To Be Completed