Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planned changes to NodeTraverser #916

Open
nikic opened this issue Mar 5, 2023 · 10 comments
Open

Planned changes to NodeTraverser #916

nikic opened this issue Mar 5, 2023 · 10 comments

Comments

@nikic
Copy link
Owner

nikic commented Mar 5, 2023

For PHP-Parser 5 I plan to make the following changes to NodeTraverser. The new interface will be

// Or maybe ConcurrentVisitor.
class InterleavedVisitor implements NodeVisitor {
    /** @param NodeVisitor[] $visitors */
    public function __construct(array $visitors) { ... }
    // ...
}

interface NodeTraverserInterface {
    /**
     * @param Node[] $nodes
     * @return Nodes[]
     */
    public function traverse(array $stmts, NodeVisitor $visitor): array;
}


interface NodeTraverser implements NodeTraverserInterface {
    /**
     * @param Node[] $nodes
     * @return Nodes[]
     */
    public function traverse(array $stmts, ?NodeVisitor $visitor = null): array;

    // For backwards-compatibility only.
    public function addVisitor(NodeVisitor $visitor): void { ... }
    public function removeVisitor(NodeVisitor $visitor): void { ... }
}

The usage will be as follows:

$traverser = new NodeTraverser();

// To run two visitors sequentially, one finishing completely before the other starts.
$stmts = $traverser->traverse($stmts, new CloningVisitor());
$stmts = $traverser->traverse($stmts, new NameResolver());

// To run two visitors concurrently (invoking all visitors for each node before visiting the next).
$stmts = $traverser->traverse($stmts, new InterleavedVisitor([
    new CloningVisitor(),
    new NameResolver(),
]));

// For backwards-compatibility, the following will also continue to work:
// This will construct the InterleavedVisitor internally, and do a concurrent visit.
$traverser->addVisitor(new CloningVisitor());
$traverser->addVisitor(new NameResolver());
$stmts = $traverser->traverse($stmts);

The motivation for this change is:

  1. Make the distinction between a sequential and concurrent visitor very clear. I got the impression that this is a common gotcha, because it's not clear from the NodeTraverser interface that visitors are concurrent, and you need to be careful about dependencies between visitors.
  2. Avoid the need to construct a separate NodeTraverser per visitor if you want to do sequential visits. Sequential and concurrent visitation should be equally simple.
@nikic nikic added this to the 5.0 milestone Mar 5, 2023
@nikic
Copy link
Owner Author

nikic commented May 21, 2023

A problem I ran into here is that the current NodeVisitor interface isn't flexible enough to handle all possible cases here. For example, let's say you have one visitor that return a new node, and then a second one that returns DONT_TRAVERSE_CHILDREN. If we combine these into one InterleavedVisitor, there is no way to both return a new node and DONT_TRAVERSE_CHILDREN.

@rulatir
Copy link

rulatir commented May 22, 2023

I don't think it is this library's job to reconcile visitors with conflicting ideas about what to do. Tree rewriting is Turing complete. If one visitor can modify the AST in a way that violates another visitor's assumptions or invalidates the results of its work, then it is on the application to model and implement the right reconciliation protocol and backtracking and whatnot.

Now that visitors are invoked last-enter-first-leave, we can think of them as middlewares, with the outer middleware having the power to decide whether to invoke "previous" middlewares or make a "final decision" like DONT_TRAVERSE_CHILDREN. It is the application's job to make sure that the visitors in an InterleavedVisitor are ordered correctly according to that power.

@rulatir
Copy link

rulatir commented May 22, 2023

Btw. in my experience splitting complex refactorings into separate passes was almost always the simplest and least error prone approach; even doing it break-and-repair style, e.g. add attributes first, then insert missing use statements for the attribute classes in a repair pass. Interleaved visiting is mostly useful for combining several non-conflicting, information-gathering visitors in one pass for better performance, or for combining "middleware" visitors that implement cross-pass protocols with a single visitor that does the actual work for the pass.

@nikic
Copy link
Owner Author

nikic commented May 25, 2023

I don't think it is this library's job to reconcile visitors with conflicting ideas about what to do. Tree rewriting is Turing complete. If one visitor can modify the AST in a way that violates another visitor's assumptions or invalidates the results of its work, then it is on the application to model and implement the right reconciliation protocol and backtracking and whatnot.

So you're suggesting that if in the InterleavedVisitor, the first one returns a new node, and the second one return DONT_TRAVERSE_CHILDREN (or similar), that would be considered user error and the InterleavedVisitor should throw in that case?

@rulatir
Copy link

rulatir commented May 25, 2023

So you're suggesting that if in the InterleavedVisitor, the first one returns a new node, and the second one return DONT_TRAVERSE_CHILDREN (or similar), that would be considered user error and the InterleavedVisitor should throw in that case?

No, I am suggesting that InterleavedVisitor cannot possibly know if this is a user error or not; I can imagine situations in which it is, and situations in which it isn't. Therefore it isn't InterleavedVisitor's job to assert any idea about correctness here; the only thing to do is to either specify what InterleavedVisitor will do in this case, i.e. turn it into defined behavior, or (possibly better, but possibly a rabbit hole) allow the application to specify the policy.

@nikic
Copy link
Owner Author

nikic commented May 25, 2023

So you're suggesting that if in the InterleavedVisitor, the first one returns a new node, and the second one return DONT_TRAVERSE_CHILDREN (or similar), that would be considered user error and the InterleavedVisitor should throw in that case?

No, I am suggesting that InterleavedVisitor cannot possibly know if this is a user error or not; I can imagine situations in which it is, and situations in which it isn't. Therefore it isn't InterleavedVisitor's job to assert any idea about correctness here; the only thing to do is to either specify what InterleavedVisitor will do in this case, i.e. turn it into defined behavior

I don't think that's possible, because both defined behaviors (ignore one or the other action), are obviously wrong. We can either make this user error, or support it by performing both actions (which requires extending the NodeVisitor API to allow doing so).

@rulatir
Copy link

rulatir commented May 25, 2023

I disagree. There is a well-defined timeline of who and when gets an opportunity to mutate the AST and/or affect traversal: outer enter(), inner enter(), inner leave(), outer leave(). You inform application writers that this is how the timeline is structured and there are no built-in facilities for time travel. If they want to implement time travel on top of your library, then it's entirely on them to avoid paradoxes, resolve paradoxes, and/or choose to rely on the timeline to resolve paradoxes, on a case by case basis.

[xkcd://guy_exits_time_machine_shoots_grandfather_and_says_DONT_TRAVERSE_CHILDREN.gif]

Two visitors that genuinely disagree about what to do never belong in a single InterleavedVisitor, full stop, no disagreement here. However if two visitors are in a relationship that goes like "I say X, but if that specific guy over there whom the engine will query later says Y, then that's fine", then they don't genuinely disagree, they in fact cooperate over the simple default protocol.

@nikic
Copy link
Owner Author

nikic commented May 27, 2023

There is no time travel involved. The actions are sequential, not overriding. If one visitor returns a new node, that's the node the next visitor is going to see, which can in turn return a new node, etc. Similarly, if one visitor returns a new node, the next visitor can decide that it would like to stop further traversal. But that changes nothing about the node replacement that has already happened.

At least this is how things currently work. The action returned by a later visitor does not discard that of an earlier ones, they are both performed.

@rulatir
Copy link

rulatir commented May 28, 2023

Ah, I think I get it: the problem is that there are two orthogonal "dimensions" to the possible response from a visitor: whether to replace or keep the node, and whether to alter the traversal or not. Currently there is no way to say "replace, skip children" as it was unlikely to be needed in the case of single-task visitors, but an InterleavedVisitor combines multiple single-task visitors and there is no way to combine two orthogonal responses, which should combine (as opposed to non-orthogonal ones which should override sequentially).

Since enterNode() / leaveNode() already return mixed (or will it be a union type in 5.0?), you could also allow returning a VisitorResponse object consisting of an optional node directive and an optional traversal directive. You would use this type internally, normalizing visitors' responses to it, and also accumulating them in InterleavedVisitor. (This assumes that you don't modify the AST within the loop over visitors in InterleavedVisitor, and "next visitor sees the new node" is achieved by updating a local variable in the loop. The node replacement in AST is performed by the traverser after the call to InterleavedVisitor returns, just like with any other visitor.)

@nikic nikic removed this from the 5.0 milestone Jun 25, 2023
@nikic
Copy link
Owner Author

nikic commented Jun 25, 2023

I think I'm going to drop this change. I thought this design would make things simpler overall, but in practice this turns out to be a significant complication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants