Skip to content

Commit

Permalink
NodePattern complete rewrite. Add support for multiple variadic terms.
Browse files Browse the repository at this point in the history
  • Loading branch information
marcandre committed Sep 11, 2020
1 parent 97120a2 commit 8b9c1b0
Show file tree
Hide file tree
Showing 29 changed files with 1,877 additions and 881 deletions.
4 changes: 3 additions & 1 deletion .github/workflows/rubocop.yml
Expand Up @@ -74,7 +74,7 @@ jobs:
run: bundle exec rake spec
- name: internal investigation
if: matrix.internal_investigation
run: bundle exec rake internal_investigation
run: bundle exec rake generate internal_investigation
rubocop_specs:
name: >-
Main Gem Specs | RuboCop: ${{ matrix.rubocop }} | ${{ matrix.ruby }} (${{ matrix.os }})
Expand All @@ -98,6 +98,8 @@ jobs:
ruby-version: ${{ matrix.ruby }}
- name: install dependencies
run: bundle install --jobs 3 --retry 3
- name: generate lexer and parser
run: bundle exec rake generate
- name: clone rubocop from source for full specs -- master
if: matrix.rubocop == 'master'
run: git clone --branch ${{ matrix.rubocop }} https://github.com/rubocop-hq/rubocop.git ../rubocop
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
@@ -1,3 +1,8 @@
# generated parser / lexer
/lib/rubocop/ast/node_pattern/parser.racc.rb
/lib/rubocop/ast/node_pattern/parser.output
/lib/rubocop/ast/node_pattern/lexer.rex.rb

# rcov generated
coverage
coverage.data
Expand Down
3 changes: 3 additions & 0 deletions .rubocop.yml
Expand Up @@ -13,6 +13,9 @@ AllCops:
- 'spec/fixtures/**/*'
- 'tmp/**/*'
- '.git/**/*'
- 'lib/rubocop/ast/node_pattern/parser.racc.rb'
- 'lib/rubocop/ast/node_pattern/lexer.rex.rb'
- 'spec/rubocop/ast/node_pattern/parse_helper.rb'
TargetRubyVersion: 2.4

Naming/PredicateName:
Expand Down
4 changes: 3 additions & 1 deletion .rubocop_todo.yml
Expand Up @@ -32,7 +32,7 @@ Metrics/MethodLength:
# Offense count: 1
# Configuration parameters: CountComments.
Metrics/ModuleLength:
Max: 101
Max: 108

# Offense count: 1
# Configuration parameters: ExpectMatchingDefinition, Regex, IgnoreExecutableScripts, AllowedAcronyms.
Expand Down Expand Up @@ -65,6 +65,7 @@ RSpec/ContextWording:
- 'spec/rubocop/ast/resbody_node_spec.rb'
- 'spec/rubocop/ast/token_spec.rb'
- 'spec/spec_helper.rb'
- 'spec/rubocop/ast/node_pattern/helper.rb'

# Offense count: 6
# Configuration parameters: Max.
Expand All @@ -73,6 +74,7 @@ RSpec/ExampleLength:
- 'spec/rubocop/ast/node_pattern_spec.rb'
- 'spec/rubocop/ast/processed_source_spec.rb'
- 'spec/rubocop/ast/send_node_spec.rb'
- 'spec/rubocop/ast/node_pattern/parser_spec.rb'

# Offense count: 6
RSpec/LeakyConstantDeclaration:
Expand Down
4 changes: 3 additions & 1 deletion Gemfile
Expand Up @@ -5,8 +5,10 @@ source 'https://rubygems.org'
gemspec

gem 'bump', require: false
gem 'oedipus_lex', require: false
gem 'pry'
gem 'rake', '~> 12.0'
gem 'racc'
gem 'rake', '~> 13.0'
gem 'rspec', '~> 3.7'
local_ast = File.expand_path('../rubocop', __dir__)
if Dir.exist? local_ast
Expand Down
2 changes: 1 addition & 1 deletion Rakefile
Expand Up @@ -15,7 +15,7 @@ end

require 'rspec/core/rake_task'

RSpec::Core::RakeTask.new(:spec) do |spec|
RSpec::Core::RakeTask.new(spec: :generate) do |spec|
spec.pattern = FileList['spec/**/*_spec.rb']
end

Expand Down
240 changes: 240 additions & 0 deletions docs/modules/ROOT/pages/node_pattern_compiler.adoc
@@ -0,0 +1,240 @@
= Hacker's guide to the `NodePattern` compiler

This documentation is aimed at anyone wanting to understand / modify the `NodePattern` compiler.
It assumes some familiarity with the syntax of https://github.com/rubocop-hq/rubocop-ast/blob/master/doc/modules/ROOT/pages/node_pattern.md[`NodePattern`], as well as the AST produced by the `parser` gem.

== High level view

The `NodePattern` compiler uses the same techniques as the `parser` gem:

* a `Lexer` that breaks source into tokens
* a `Parser` that uses tokens and a `Builder` to emit an AST
* a `Compiler` that converts this AST into Ruby code

Example:

* Pattern: `+(send nil? {:puts :p} $...)+`
* Tokens: `+'(', [:tNODE_TYPE, :send], [:tPREDICATE, :nil?], '{', ...+`
* AST: `+s(:sequence, s(:node_type, :send), s(:predicate, :nil?), s(:union, ...+`
* Ruby code:
+
[source,ruby]
----
node.is_a?(::RuboCop::AST::Node) && node.children.size >= 2 &&
node.send_type? &&
node.children[0].nil?() &&
(union2 = node.children[1]; ...
----

The different parts are described below

== Vocabulary

*"node pattern"*: something that can be matched against a single `AST::Node`.
While `(int 42)` and `#is_fun?` both correspond to node patterns, `+...+` (without the parenthesis) is not a node pattern.

*"sequence"*: a node pattern that describes the sequence of children of a node (and its type): `+(type first_child second_child ...)+`

*"variadic"*: element of a sequence that can match a variable number of children.
`+(send _ int* ...)+` has two variadic elements (`int*` and `+...+`).
`(send _ :name)` contains no variadic element.
Note that a sequence is itself never variadic.

*"atom"*: element of a pattern that corresponds with a simple Ruby object.
`(send nil?
:puts (str 'hello'))` has two atoms: `:puts` and `'hello'`.

== Lexer

The `lexer.rb` defines `Lexer` and has the few definitions needed for the lexer to work.
The bulk of the processing is in the inherited class that is generated by https://github.com/seattlerb/oedipus_lex[`oedipus_lex`]

[discrete]
==== Rules

https://github.com/seattlerb/oedipus_lex[`oedipus_lex`] generates the Ruby file `lexer.rex.rb` from the rules defined in `lexer.rex`.

These rules map a Regexp to code that emits a token.

`oedipus_lex` aims to be simple and the generated file is readable.
It uses https://ruby-doc.org/stdlib-2.7.1/libdoc/strscan/rdoc/StringScanner.html[`StringScanner`] behind the scene.
It selects the first rule that matches, contrary to many lexing tools that prioritize longest match.

[discrete]
==== Tokens

The `Lexer` emits tokens with types that are:

* string for the syntactic symbols (e.g.
`'('`, `'$'`, `+'...'+`)
* symbols of the form `:tTOKEN_TYPE` for the rest (e.g.
`:tPREDICATE`)

Tokens are stored as `[type, value]`.

[discrete]
==== Generation

Use `rake generate:lexer` to generate the `lexer.rex.rb` from `lexer.rex` file.
This is done automatically by `rake spec`.

NOTE: the `lexer.rex.rb` is not under source control, but is included in the gem.

== Parser

Similarly to the `Lexer`, the `parser.rb` defines `Parser` and has the few definitions needed for the parser to work.
The bulk of the processing is in the inherited class `parser.racc.rb` that is generated by https://ruby-doc.org/stdlib-2.7.0/libdoc/racc/parser/rdoc/Racc.html#module-Racc-label-Writing+A+Racc+Grammar+File[`racc`] from the rules in `parser.y`.

[discrete]
==== Nodes

The `Parser` emits `NodePattern::Node` which are similar to RuboCop's node.
They both inherit from ``parser``'s `Parser::AST::Source::Node`, and share additional methods too.

Like for RuboCop's nodes, some nodes have specicialized classes (e.g.
`Sequence`) while other nodes use the base class directly (e.g.
`s(:number, 42)`)

[discrete]
==== Rules

The rules follow closely the definitions above.
In particular a distinction between `node_pattern_list`, which is a list of node patterns (each term can match a single node), while the more generic `variadic_pattern_list` is a list of elements, some of which could be variadic, others simple node patterns.

[discrete]
==== Generation

Similarly to the lexer, use `rake generate:parser` to generate the `parser.racc.rb` from `parser.y` file.
This is done automatically by `rake spec`.

NOTE: the `parser.racc.rb` is not under source control, but is included in the gem.

== Compiler

The compiler's core is the `Compiler` class.
It holds the global state (e.g.
references to named arguments).
The goal of the compiler is to produce `matching_code`, Ruby code that can be run against an `AST::Node`, or any Ruby object for that matter.

Packaging of that `matching_code` into code for a `lambda`, or method `def` is handled separately by the `MethodDefiner` module.

The compilation itself is handled by three subcompilers:

* `NodePatternSubcompiler`
* `AtomSubcompiler`
* `SequenceSubcompiler`

=== Visitors

The subcompilers use the visitor pattern [https://en.wikipedia.org/wiki/Visitor_pattern]

The methods starting with `visit_` are used to process the different types of nodes.
For a node of type `:capture`, the method `visit_capture` will be called, or if none is defined then `visit_other_type` will be called.

No argument is passed, as the visited node is accessible with the `node` attribute reader.

=== NodePatternSubcompiler

Given any `NodePattern::Node`, it generates the Ruby code that can return `true` or `false` for the given node, or node type for sequence head.

==== `var` vs `access`

The subcompiler can be called with the current node stored either in a variable (provided with the `var:` keyword argument) or via a Ruby expression (e.g.
`access: 'current_node.children[2]'`).

The subcompiler will not generate code that executes this `access` expression more than once or twice.
If it might access the node more than that, `multiple_access` will store the result in a temporary variable (e.g.
`union`).

==== Sequences

Sequences are the most difficult elements to handle and are deferred to the `SequenceSubcompiler`.

==== Atoms

Atoms are handled with `visit_other_type`, which defers to the `AtomSubcompiler` and converts that result to a node pattern by appending `=== cur_node` (or `=== cur_node.type` if in sequence head).

This way, the two arguments in `(_ #func?(%1) %2)` would be compiled differently;
`%1` would be compiled as `param1`, while `%2` gets compiled as `param2 === node.children[1]`.

==== Precedence

The code generated has higher or equal precedence to `&&`, so as to make chaining convenient.

=== AtomSubcompiler

This subcompiler produces Ruby code that gets evaluated to a Ruby object.
E.g.
`"42"`, `:a_symbol`, `param1`.

A good way to think about it is when it has to be passed as arguments to a function call.
For example:

[source,ruby]
----
# Pattern '#func(42, %1)' compiles to
func(node, 42, param1)
----

Note that any node pattern can be output by this subcompiler, but those that don't correspond to a Ruby literal will be output as a lambda so they can be combined.
For example:

[source,ruby]
----
# Pattern '#func(int)' compiles to
func(node, ->(compare) { compare.is_a?(::RuboCop::AST::Node) && compare.int_type? })
----

=== SequenceSubcompiler

The subcompiler compiles the sequences' terms in turn, keeping track of which children of the `AST::Node` are being matched.

==== Variadic terms

The complexity comes from variadic elements, which have complex processing _and_ may make it impossible to know at compile time which children are matched by the subsequent terms.

*Example* (no variadic terms)

----
(_type int _ str)
----

First child must match `int`, third child must match `str`.
The subcompiler will use `children[0]` and `children[2]`.

*Example* (one variadic terms)

----
(_type int _* str)
----

First child must match `int` and _last_ child must match `str`.
The subcompiler will use `children[0]` and `children[-1]`.

*Example* (multiple variadic terms)

----
(_type int+ sym str+)
----

The subcompiler can not use any integer and `children[]` to match `sym`.
This must be tracked at runtime in a variable (`cur_index`).

The subcompiler will use fixed indices before the first variadic element and after the last one.

==== Node pattern terms

The node pattern terms are delegated to the `NodePatternSubcompiler`.

In the pattern `(:sym :sym)`, both `:sym` will be compiled differently because the first `:sym` is in "sequence head": `:sym === node.type` and `:sym == node.children[0]` respectively.
The subcompiler indicates if the pattern is in "sequence head" or not, so the `NodePatternSubcompiler` can produce the right code.

Variadic elements may not (currently) cover the sequence head.
As a convenience, `+(...)+` is understood as `+(_ ...)+`.
Other types of nodes will raise an error (e.g.
`(<will not compile>)`;
see `Node#in_sequence_head`)

==== Precedence

Like the node pattern subcompiler, it generates code that has higher or equal precedence to `&&`, so as to make chaining convenient.
12 changes: 12 additions & 0 deletions lib/rubocop/ast.rb
Expand Up @@ -6,8 +6,20 @@

require_relative 'ast/ext/range'
require_relative 'ast/ext/set'
require_relative 'ast/node_pattern/method_definer'
require_relative 'ast/node_pattern'
require_relative 'ast/node/mixin/descendence'
require_relative 'ast/node_pattern/builder'
require_relative 'ast/node_pattern/comment'
require_relative 'ast/node_pattern/compiler'
require_relative 'ast/node_pattern/compiler/subcompiler'
require_relative 'ast/node_pattern/compiler/atom_subcompiler'
require_relative 'ast/node_pattern/compiler/binding'
require_relative 'ast/node_pattern/compiler/node_pattern_subcompiler'
require_relative 'ast/node_pattern/compiler/sequence_subcompiler'
require_relative 'ast/node_pattern/lexer'
require_relative 'ast/node_pattern/node'
require_relative 'ast/node_pattern/parser'
require_relative 'ast/sexp'
require_relative 'ast/node'
require_relative 'ast/node/mixin/method_identifier_predicates'
Expand Down

0 comments on commit 8b9c1b0

Please sign in to comment.