rnix-parser-tester

A tool to verify the correctness of rnix-parser.

Goals

This project aims to...

Verify that rnix-parser parses grammatically-valid Nix expressions the same as the reference implementation of Nix.
Provide more confidence that changes to rnix-parser are correct.
Find edge cases that the parser must handle.

Non-goals

It is not a goal of this project to...

Verify that rnix-parser correctly rejects grammatically-invalid Nix expressions.

Usage and Example

For now, the workflow requires first cloning this repository. The main way to use the tool is through the CLI.

$ cargo run
cli
Utility program to test/use various aspects of rnix-parser-tester

USAGE:
    cli <SUBCOMMAND>

OPTIONS:
    -h, --help    Print help information

SUBCOMMANDS:
    compare    Report differences in serialization between the reference Nix parser and
                   rnix-parser
    dump       Dump information about the given Nix expression
    help       Print this message or the help of the given subcommand(s)
    summary    Perform analysis of summaries generated by the compare subcommand

(See the cli crate readme for full documentation)

For example, consider the following Nix expression in a file test.nix:

let x = 3; in x + 2

`compare`

We can use the compare subcommand to check that rnix-parser and the reference implementation parse this expression to the same AST:

$ cargo run -- compare test.nix
test.nix ... equal

In this case, the parses were equal.

`dump`

We can use the dump subcommand to show the ASTs which are being compared as JSON:

$ cargo run -- dump test.nix -p rnix -p reference
test.nix ...
==== Reference impl json ====
{"Let":{"attrs":{"Attrs":{"attrs":[{"expr":{"Int":3},"inherited":false,"name":"x"}],"dynamic_attrs":[],"rec":false}},"body":{"OpConcatStrings":{"es":[{"Var":"x"},{"Int":2}],"force_string":false}}}}

==== rnix-parser json ====
{"Let":{"attrs":{"Attrs":{"rec":false,"attrs":[{"name":"x","inherited":false,"expr":{"Int":3}}],"dynamic_attrs":[]}},"body":{"OpConcatStrings":{"force_string":false,"es":[{"Var":"x"},{"Int":2}]}}}}

This is especially useful for when compare reports that some expression isn't parsed the same by both parsers. In that case, we can use dump and our favorite JSON differ (e.g., http://www.jsondiff.com/) to debug further.

`compare`ing all of nixpkgs

compare also...

Accepts directories
Can recurse into those directories
Accepts NIX_PATH paths between angle brackets
Can save a summary of the comparisons

All together, this gives us the following commmand which will check that rnix-parser parses every Nix file in nixpkgs the same as the reference parser, saving a summary for later use:

$ cargo run -- compare '<nixpkgs>' -r --save-summary summary/done.json
/nix/var/nix/profiles/per-user/root/channels/nixos/nixpkgs/flake.nix ... equal
/nix/var/nix/profiles/per-user/root/channels/nixos/nixpkgs/nixos/release-small.nix ... equal
... and so on ...

Comparing between changes and `summary`

The compare and summary subcommands can be used to determine how a change to rnix-parser affects its ability to parse.

cargo run -- compare '<nixpkgs>' -r --save-summary summary/before.json
Make some change to rnix-parser. For local development, this probably also involves changing the rnix dependency in Cargo.toml. For example, we could change it to rnix = { path = "../rnix-parser" } to use a local version.
cargo run -- compare '<nixpkgs>' -r --save-summary summary/after.json

We can then compare the summaries:

$ cargo run -- summary summary/fix-nested-apply.json summary/change-path-handling.json
== Progressions (not equal before, equal after) ==
/nix/var/nix/profiles/per-user/root/channels/nixos/nixpkgs/pkgs/tools/audio/botamusique/node-packages.nix
/nix/var/nix/profiles/per-user/root/channels/nixos/nixpkgs/pkgs/servers/mjolnir/node-deps.nix
... and so on ...

== Regressions (equal before, not equal after) ==

== Summary ==
# equal before: 24050
# not equal before: 11
# reference impl errors before: 1
# rnix-parser errors before: 0

# equal after: 24059
# not equal after: 2
# reference impl errors after: 1
# rnix-parser errors after: 0

# progressions: 9
# regressions: 0

The stats at the bottom tell us that whatever change we just made resulted in 9 more files being parsed correctly, and, importantly, 0 files were parsed correctly before the change but incorrectly after the change.

How it works

Under the hood, we are "just" parsing the same expression using both rnix-parser and the reference parser, serializing the resulting ASTs to JSON, and then comparing those JSON objects.

Challenges

There are several challenges to overcome in doing this:

The reference implementation is written in C++. Rust has limited interop with C++, so a basic C-compatible wrapper was written to do the bulk of the reference impl-related work on the C++ side.
The reference implementation closely couples parsing and evaluation. For example, the reference implementation builds a symbol table and checks for undefined variables during parsing. This limits the kinds of Nix expressions we can use and complicates the reference impl-related work.
rnix-parser's typed AST leaves a lot to be desired from a usability standpoint. We created our own which is easier to work with for our use case.
The reference implementation produces an AST that is sometimes structurally very different from rnix-parser. This necessitates some conversion between them, which we call "normalization".

Normalization

Normalization is the process of taking the AST produced by rnix-parser and converting it to the structure used by the reference impl. For example, consider the following expression:

f 0 1 2

rnix-parser parses this expression as (((f 0) 1) 2), i.e., three nested Apply nodes, each with one argument. However, the reference impl collects all of the function arguments into a single Apply node with three arguments. Both parses are "correct", but we need to transform the former into the latter if we want to directly compare the trees.

The rnix-normalize crate explains in detail all of the normalization rules.

Note that normalization does not go the other way (i.e., reference impl AST -> rnix-parser AST, even though that would be easier in some cases), because...

Working with the reference impl is generally harder, both because of C++/FFI concerns, and because our use case is not really supported
We generally try to keep the reference impl AST 100% untouched, because we define "correct" to mean "whatever the reference impl says".

Caveats

Normalization

The main challenge with this project is also its biggest caveat: the normalization phase. For some expressions, we have to do a lot of work to get the rnix-parser AST to look like the reference impl AST. In practice, that means that if the tool reports that some expression is not parsed equally, it means that either rnix-parser didn't parse it correctly, or we did not normalize the AST correctly. In other words, normalization opens us up to false negatives.

However, we can have decent confidence that normalization does not easily result in false positives (i.e., rnix-parser parses something incorrectly, but then we accidentally normalize it to be equal to what the reference impl gives us), because normalization does not inject new information into the parse. It only restructures what is already there.

Limited Data

The primary data set this tool is targeting is nixpkgs. While this set is very large (probably the largest set of valid Nix expressions), that doesn't necessarily mean that it contains every possible edge case in the Nix grammar. In fact, it definitely doesn't. For example, at the time of writing, nixpkgs contains zero instances of a float with no decimal part (e.g., 1.). This is a perfectly valid Nix expression, but by testing only on nixpkgs, we would not be able to tell that rnix-parser does not currently support it.

This isn't an inherent limitation of the tool since it isn't directly tied to nixpkgs. It can be used on any set of Nix expressions. But the main takeaway of this caveat is that just because we can parse nixpkgs correctly does not mean that we can parse all of Nix correctly (although perhaps it can be said that we can parse all Nix expressions that really matter. Who is writing 1. anyways?).

Breaking our own rules

We generally try to keep the reference impl AST 100% untouched, because we define "correct" to mean "whatever the reference impl says"

While this is true, we unfortunately have to break from this rule in two places:

The biggest break is an actual patch to Nix. Because of an implementation detail of the reference lexer (in particular, the fact that all of these rules produce IND_STR tokens), a multiline string with no interpolations might still be split into multiple StringTokens. For example, consider the following expression:
```
''foo \$''
```
The contents of the string would be tokenized as two separate tokens, foo and \$, eventually resulting in a string concat expression with those two arguments. This is highly problematic, because rnix-parser, using a different tokenizing algorithm, will simply produce a single token for the entire string contents. After this point, it isn't clear how to normalize the resulting tree into the reference impl's without re-parsing the string contents.

Consequently, we patch the reference impl to instead combine those consecutive tokens into a single one. The patch doesn't actually fix any issue in Nix, and doesn't improve performance, so it seems unlikely that it would be accepted just for our benefit.
By default, the reference impl sorts attribute sets in a way that would be difficult to reproduce in normalization. In particular, the reference impl maintains a global symbol table, creating symbols as they are encountered in parsing. Entries in attribute sets are then sorted by their key's position in this table. Additionally, many symbols are inserted by default without being encountered in the parsed expression.

While it would technically be possible to replicate this behavior in normalization, it would be for no benefit, because attrset order has no semantic bearing in Nix. We opt to instead sort entries lexicographically by key.

(We could alternatively solve this issue by treating the attrset entry list as a set, i.e., ignoring order, but the JSON comparison library we're using doesn't easily allow for this.)

Results

Several patches to rnix-parser were developed with the aid of this tool: #96, #99, #101, #102, and #103. In aggregate, these patches (as well as #106), greatly improved rnix-parser's ability to correctly parse nixpkgs:

$ cargo run -- summary summary/f759fce.json summary/with-patches.json
...
== Summary ==
# equal before: 15924
# not equal before: 8135
# reference impl errors before: 1
# rnix-parser errors before: 1

# equal after: 24060
# not equal after: 0
# reference impl errors after: 1
# rnix-parser errors after: 0

# progressions: 8135
# regressions: 0

Future work

There are several areas where this tool could be improved.

Speed. At time of writing, on my system, it takes about 6 minutes to compare all of nixpkgs. Having done no profiling, I would hypothesize that a lot of performance is being left on the table in the strategy of serializing ASTs to JSON. This was primarily done as a way to not have to worry too much about FFI, but this could be revisited now.
Parallelizing nixpkgs comparison. Related to the previous point, comparison of all of nixpkgs could be parallelized for a huge performance win. However, the reference impl does not seem to be thread safe. I gave up at the first sign of trouble here, when the Boehm GC was yelling about something when used from multiple threads. My first thought was to just disable the GC, but this isn't currently possible. Even once it is, I don't know if there will be further issues with using the reference impl from multiple threads.
Verifying that gramatically-invalid expressions are rejected. This would be pretty ambitious, but I think it could be possible. The two primary roadblocks would be the need of a decently-sized corpus of invalid expressions to test against (we use nixpkgs as a corpus of valid expressions), and further working around the reference impl's tight coupling of parsing and evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.cargo		.cargo
crates		crates
.envrc		.envrc
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix

darichey/rnix-parser-tester

Folders and files

Latest commit

History

Repository files navigation

rnix-parser-tester

Goals

Non-goals

Usage and Example

compare

dump

compareing all of nixpkgs

Comparing between changes and summary

How it works

Challenges

Normalization

Caveats

Normalization

Limited Data

Breaking our own rules

Results

Future work

About

Resources

Stars

Watchers

Forks

Languages

`compare`

`dump`

`compare`ing all of nixpkgs

Comparing between changes and `summary`