Skip to content

Expression Objects

Janosch Müller edited this page Jan 7, 2024 · 7 revisions

Expression::Base

The base class of all the objects in the tree returned by the parser. It implements the methods that are common to all expressions.

Attributes

Each Expression object has the following attributes:

  • type: a symbol, denoting the expression type, such as :group, :quantifier

  • token: a symbol, for the object's token, or opening token (in the case of groups and sets)

  • text: a string, the text of the expression. For nesting expressions, this attribute only contains the opening text of the expressions, such as ( or [. To get the full text in such cases use the to_s method.

  • ts: an integer, the start offset of the expressions within the root expression.

  • quantifier: an instance of Expression::Quantifier that holds the details of repetition. Has a nil value if the expression is not quantified.

  • options: a hash, holds the keys :i, :m, and :x with a boolean value that indicates if the expression has a given option. In Ruby 2.0 and later syntax, this also includes the :d, :a, and :u keys.

Class methods

Every Expression subclass has the following methods:

  • construct: a convenience method to init an Expression without a Regexp::Token, for testing or rewriting

Methods

Every Expression object also has the following methods:

  • to_s: returns the string representation of the expression.

  • parts: returns an all elements of the expression as a sexp-like Array of Strings and child expressions

  • quantified?: return true if the expression was followed by a quantifier.

  • quantity: returns an array of the expression's min and max repetitions.

  • repetitions: returns quantity more uniformly, as a Range (1..1 if there is no quantifier)

  • greedy?: returns true if the expression's quantifier is greedy.

  • reluctant?: returns true if the expression's quantifier is reluctant. (aliased as lazy?)

  • terminal?: returns false if the expression has subexpressions (i.e. is a Subexpression)

  • possessive?: returns true if the expression's quantifier is possessive.

  • multiline?: returns true if the expression has the m option (aliased as m?)

  • case_insensitive?: returns true if the expression has the i option (aliased as ignore_case? and i?)

  • default_classes?: returns true if the expression has the d option (aliased as d?)

  • ascii_classes?: returns true if the expression has the a option (aliased as a?)

  • unicode_classes?: returns true if the expression has the u option (aliased as u?)

  • free_spacing?: returns true if the expression has the x option (aliased as extended? and x?)

  • negative?: returns true for negative sets (e.g. [^a]), char types, assertions, properties (aliased as negated?)

  • type?: tests if the expression is of the given type. Accepts a symbol or an array.

  • is?: tests if the expression is of a given token, and optionally, a type.

  • one_of?: tests if the expression is one of one or more types or tokens.

  • strfregexp: like strftime, but for regexp. (aliased as strfre)

  • match_length: allows to inspect and iterate over String lengths matched by the Expression

  • ==: for (deep) comparison between expressions

Subexpression

The Subexpression class is the base class for expressions that can contain one or more expressions, like groups.

Attributes

In addition to the attributes defined in Expression::Base, Subexpression objects also have the following:

  • expressions: an array, holds the sub-expressions for the expression if it is a group or alternation expression. Empty if the expression doesn't have sub-expressions.

Methods

And they have have the following extra methods (among other Enumerable methods):

  • <<: adds sub-expressions to the expression.

  • []: access sub-expressions by index.

  • dig: access deep sub-expressions.

  • each: iterates over the direct sub-expressions, if any.

  • first, last: return the first/last expressions of the subexpression.

  • length: returns the number of subexpressions in the expression.

  • empty?: return true if the subexpression does not have any subexpressions.

  • each_expression: traverses the expression tree, depth first, as an array.

  • flat_map: based on each_expression, returns an array with the results of calling the given block on each expression.

  • traverse: traverses the expression tree, depth first, with 1-2 calls per child and :enter/:exit/:visit event tokens.

Root Expression

The root expression is the special object returned by the parser. It represents the tree of the whole expression.

Root is a subclass of Subexpression and has all of its attributes and methods.

Sequence Expressions

A special subclass of Subexpression is the Expression::Sequence. It is used to hold the expressions of a branch within an Expression::Alternation expression. For example, the expression 'b[ai]t|h[ai]t|p[ai]t' would result in an Alternation object with 3 sequences, one for each possible alternative, each of which contains 3 expression objects.

The Sequence object is also used to hold the branches of conditional expressions and character set intersections.

Expressions referencing other expressions

Backreferences, Subexpression calls, Conditionals and Conditions have a referenced_expression method that returns the Group expression that is being referenced via name or number.