Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Book for Nom #1525

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/nom-guide/.gitignore
@@ -0,0 +1 @@
book
6 changes: 6 additions & 0 deletions doc/nom-guide/book.toml
@@ -0,0 +1,6 @@
[book]
authors = ["Tom Kunc"]
language = "en"
multilingual = false
src = "src"
title = "The Nom Guide (Nominomicon)"
tfpk marked this conversation as resolved.
Show resolved Hide resolved
11 changes: 11 additions & 0 deletions doc/nom-guide/scripts/build.sh
@@ -0,0 +1,11 @@
#!/bin/bash
command="build"

[[ "$1" == "serve" ]] && command="serve"

BOOK_ROOT_PATH="$( cd "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )/.."
cd $BOOK_ROOT_PATH

[[ ! -e $BOOK_ROOT_PATH/../../target ]] && (cd ../../ && cargo build)
mdbook test -L $(cd ../../ && pwd)/target/debug/deps/
mdbook $command
15 changes: 15 additions & 0 deletions doc/nom-guide/src/SUMMARY.md
@@ -0,0 +1,15 @@
## Summary

[Introduction](./introduction.md)

- [Chapter 1: The Nom Way](./chapter_1.md)
- [Chapter 2: Tags and Character Classes](./chapter_2.md)
- [Chapter 3: Alternatives and Composition](./chapter_3.md)
- [Chapter 4: Custom Outputs from Functions](./chapter_4.md)
- [Chapter 5: Repeating with Predicates](./chapter_5.md)
- [Chapter 6: Repeating Parsers](./chapter_6.md)
- [Chapter 7: Using Errors from Outside Nom](./chapter_7.md)
- [Chapter 8: Streaming vs. Complete](./todo.md)
- [Chapter 9: Characters vs. Bytes](./todo.md)
- [Chapter 10: Exercises and Further Reading](./todo.md)

75 changes: 75 additions & 0 deletions doc/nom-guide/src/chapter_1.md
@@ -0,0 +1,75 @@
# Chapter 1: The Nom Way

First of all, we need to understand the way that nom thinks about parsing.
As discussed in the introduction, nom lets us build simple parsers, and
then combine them (using "combinators").

Let's discuss what a "parser" actually does. A parser takes an input and returns
a result, where:
- `Ok` indicates the parser successfully found what it was looking for; or
- `Err` indicates the parser could not find what it was looking for.

Parsers do more than just return a binary "success"/"failure" code. If
the parser was successful, then it will return a tuple. The first field of the
tuple will contain everything the parser did not process. The second will contain
everything the parser processed. The idea is that a parser can happily parse the first
*part* of an input, without being able to parse the whole thing.

If the parser failed, then there are multiple errors that could be returned.
For simplicity, however, in the next chapters we will leave these unexplored.

```text
┌─► Ok(
│ what the parser didn't touch,
│ what matched the regex
│ )
┌─────────┐ │
my input───►│my parser├──►either──┤
└─────────┘ └─► Err(...)
```


To represent this model of the world, nom uses the `IResult<I, O>` type.
The `Ok` variant has a tuple of `(remaining_input: I, output: O)`;
whereas the `Err` variant stores an error.

You can import that from:

```rust
# extern crate nom;
use nom::IResult;
```

You'll note that `I` and `O` are parameterized -- while most of the examples in this book
will be with `&str` (i.e. parsing a string); they do not have to be strings; nor do they
have to be the same type (consider the simple example where `I = &str`, and `O = u64` -- this
parses a string into an unsigned integer.)

Let's write our first parser!
The simplest parser we can write is one which successfully does nothing.

This parser should take in an `&str`:

- Since it is supposed to succeed, we know it will return the Ok Variant.
- Since it does nothing to our input, the remaining input is the same as the input.
- Since it doesn't parse anything, it also should just return an empty string.


```rust
# extern crate nom;
# use nom::IResult;
# use std::error::Error;

pub fn do_nothing_parser(input: &str) -> IResult<&str, &str> {
Ok((input, ""))
}

fn main() -> Result<(), Box<dyn Error>> {
let (remaining_input, output) = do_nothing_parser("my_input")?;
assert_eq!(remaining_input, "my_input");
assert_eq!(output, "");
# Ok(())
}
```

It's that easy!
111 changes: 111 additions & 0 deletions doc/nom-guide/src/chapter_2.md
@@ -0,0 +1,111 @@
# Chapter 2: Tags and Character Classes

The simplest _useful_ parser you can write is one which
has no special characters, it just matches a string.

In `nom`, we call a simple collection of bytes a tag. Because
these are so common, there already exists a function called `tag()`.
This function returns a parser for a given string.

**Warning**: `nom` has multiple different definitions of `tag`, make sure you use this one for the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that will need a link to the future chapter about parsers of complete VS streaming inputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; I'll add it in when that chapter is written.

moment!

```rust,ignore
# extern crate nom;
pub use nom::bytes::complete::tag;
```

For example, code to parse the string `"abc"` could be represented as `tag("abc")`.

If you have not programmed in a language where functions are values, the type signature of them
tag function might be a surprise:

```rust,ignore
pub fn tag<T, Input, Error: ParseError<Input>>(
tag: T
) -> impl Fn(Input) -> IResult<Input, Input, Error> where
Input: InputTake + Compare<T>,
T: InputLength + Clone,
```

Or, for the case where `Input` and `T` are both `&str`, and simplifying slightly:

```rust,ignore
fn tag(tag: &str) -> (impl Fn(&str) -> IResult<&str, Error>)
```

In other words, this function `tag` *returns a function*. The function it returns is a
parser, taking a `&str` and returning an `IResult`. Functions creating parsers and
returning them is a common pattern in Nom, so it is useful to call out.

Below, we have implemented a function that uses `tag`.

```rust
# extern crate nom;
# pub use nom::bytes::complete::tag;
# pub use nom::IResult;
# use std::error::Error;

fn parse_input(input: &str) -> IResult<&str, &str> {
// note that this is really creating a function, the parser for abc
// vvvvv
// which is then called here, returning an IResult<&str, &str>
// vvvvv
tag("abc")(input)
}

fn main() -> Result<(), Box<dyn Error>> {
let (leftover_input, output) = parse_input("abcWorld")?;
assert_eq!(leftover_input, "World");
assert_eq!(output, "abc");

assert!(parse_input("defWorld").is_err());
# Ok(())
}
```

If you'd like to, you can also check tags without case-sensitivity
with the [`tag_no_case`](https://docs.rs/nom/latest/nom/bytes/complete/fn.tag_no_case.html) function.

## Character Classes

Tags are incredibly useful, but they are also incredibly restrictive.
The other end of Nom's functionality is pre-written parsers that allow us to accept any of a group of characters,
rather than just accepting characters in a defined sequence.

Here is a selection of them:

- [`alpha0`](https://docs.rs/nom/latest/nom/character/complete/fn.alpha0.html): Recognizes zero or more lowercase and uppercase alphabetic characters: `/[a-zA-Z]/`. [`alpha1`](https://docs.rs/nom/latest/nom/character/complete/fn.alpha1.html) does the same but returns at least one character
- [`alphanumeric0`](https://docs.rs/nom/latest/nom/character/complete/fn.alphanumeric0.html): Recognizes zero or more numerical and alphabetic characters: `/[0-9a-zA-Z]/`. [`alphanumeric1`](https://docs.rs/nom/latest/nom/character/complete/fn.alphanumeric1.html) does the same but returns at least one character
- [`digit0`](https://docs.rs/nom/latest/nom/character/complete/fn.digit0.html): Recognizes zero or more numerical characters: `/[0-9]/`. [`digit1`](https://docs.rs/nom/latest/nom/character/complete/fn.digit1.html) does the same but returns at least one character
- [`multispace0`](https://docs.rs/nom/latest/nom/character/complete/fn.multispace0.html): Recognizes zero or more spaces, tabs, carriage returns and line feeds. [`multispace1`](https://docs.rs/nom/latest/nom/character/complete/fn.multispace1.html) does the same but returns at least one character
- [`space0`](https://docs.rs/nom/latest/nom/character/complete/fn.space0.html): Recognizes zero or more spaces and tabs. [`space1`](https://docs.rs/nom/latest/nom/character/complete/fn.space1.html) does the same but returns at least one character
- [`line_ending`](https://docs.rs/nom/latest/nom/character/complete/fn.line_ending.html): Recognizes an end of line (both `\n` and `\r\n`)
- [`newline`](https://docs.rs/nom/latest/nom/character/complete/fn.newline.html): Matches a newline character `\n`
- [`tab`](https://docs.rs/nom/latest/nom/character/complete/fn.tab.html): Matches a tab character `\t`


We can use these in
```rust
# extern crate nom;
# pub use nom::IResult;
# use std::error::Error;
pub use nom::character::complete::alpha0;
fn parser(input: &str) -> IResult<&str, &str> {
alpha0(input)
}

fn main() -> Result<(), Box<dyn Error>> {
let (remaining, letters) = parser("abc123")?;
assert_eq!(remaining, "123");
assert_eq!(letters, "abc");

# Ok(())
}
```

One important note is that, due to the type signature of these functions,
it is generally best to use them within a function that returns an `IResult`.

If you don't, some of the information around the type of the `tag` function must be
manually specified, which can lead to verbose code or confusing errors.
142 changes: 142 additions & 0 deletions doc/nom-guide/src/chapter_3.md
@@ -0,0 +1,142 @@
# Chapter 3: Alternatives and Composition

In the last chapter, we saw how to create simple parsers using the `tag` function;
and some of Nom's prebuilt parsers.

In this chapter, we explore two other widely used features of Nom:
alternatives and composition.

## Alternatives

Sometimes, we might want to choose between two parsers; and we're happy with
either being used.

Nom gives us a similar ability through the `alt()` combinator.

```rust
# extern crate nom;
use nom::branch::alt;
```

The `alt()` combinator will execute each parser in a tuple until it finds one
that does not error. If all error, then by default you are given the error from
the last error.

We can see a basic example of `alt()` below.

```rust
# extern crate nom;
use nom::branch::alt;
use nom::bytes::complete::tag;
use nom::IResult;
# use std::error::Error;

fn parse_abc_or_def(input: &str) -> IResult<&str, &str> {
alt((
tag("abc"),
tag("def")
))(input)
}

fn main() -> Result<(), Box<dyn Error>> {
let (leftover_input, output) = parse_abc_or_def("abcWorld")?;
assert_eq!(leftover_input, "World");
assert_eq!(output, "abc");

assert!(parse_abc_or_def("ghiWorld").is_err());
# Ok(())
}
```

## Composition

Now that we can create more interesting regexes, we can compose them together.
The simplest way to do this is just to evaluate them in sequence:

```rust
# extern crate nom;
use nom::branch::alt;
use nom::bytes::complete::tag;
use nom::IResult;
# use std::error::Error;

fn parse_abc(input: &str) -> IResult<&str, &str> {
tag("abc")(input)
}
fn parse_def_or_ghi(input: &str) -> IResult<&str, &str> {
alt((
tag("def"),
tag("ghi")
))(input)
}

fn main() -> Result<(), Box<dyn Error>> {
let input = "abcghi";
let (remainder, abc) = parse_abc(input)?;
let (remainder, def_or_ghi) = parse_def_or_ghi(remainder)?;
println!("first parsed: {abc}; then parsed: {def_or_ghi};");

# Ok(())
}
```

Composing tags is such a common requirement that, in fact, Nom has a few built in
combinators to do it. The simplest of these is `tuple()`. The `tuple()` combinator takes a tuple of parsers,
and either returns `Ok` with a tuple of all of their successful parses, or it
returns the `Err` of the first failed parser.

```rust
# extern crate nom;
use nom::sequence::tuple;
```


```rust
# extern crate nom;
use nom::branch::alt;
use nom::sequence::tuple;
use nom::bytes::complete::tag_no_case;
use nom::character::complete::{digit1};
use nom::IResult;
# use std::error::Error;

fn parse_base(input: &str) -> IResult<&str, &str> {
alt((
tag_no_case("a"),
tag_no_case("t"),
tag_no_case("c"),
tag_no_case("g")
))(input)
}

fn parse_pair(input: &str) -> IResult<&str, (&str, &str)> {
// the many_m_n combinator might also be appropriate here.
tuple((
parse_base,
parse_base,
))(input)
}

fn main() -> Result<(), Box<dyn Error>> {
let (remaining, parsed) = parse_pair("aTcG")?;
assert_eq!(parsed, ("a", "T"));
assert_eq!(remaining, "cG");

assert!(parse_pair("Dct").is_err());

# Ok(())
}
```


## Extra Nom Tools

After using `alt()` and `tuple()`, you might also be interested in a few other parsers that do similar things:

| combinator | usage | input | output | comment |
|---|---|---|---|---|
| [delimited](https://docs.rs/nom/latest/nom/sequence/fn.delimited.html) | `delimited(char('('), take(2), char(')'))` | `"(ab)cd"` | `Ok(("cd", "ab"))` ||
| [preceded](https://docs.rs/nom/latest/nom/sequence/fn.preceded.html) | `preceded(tag("ab"), tag("XY"))` | `"abXYZ"` | `Ok(("Z", "XY"))` ||
| [terminated](https://docs.rs/nom/latest/nom/sequence/fn.terminated.html) | `terminated(tag("ab"), tag("XY"))` | `"abXYZ"` | `Ok(("Z", "ab"))` ||
| [pair](https://docs.rs/nom/latest/nom/sequence/fn.pair.html) | `pair(tag("ab"), tag("XY"))` | `"abXYZ"` | `Ok(("Z", ("ab", "XY")))` ||
| [separated_pair](https://docs.rs/nom/latest/nom/sequence/fn.separated_pair.html) | `separated_pair(tag("hello"), char(','), tag("world"))` | `"hello,world!"` | `Ok(("!", ("hello", "world")))` ||