Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support js codemods: represent whitespace decisions in AST #12709

Closed
conartist6 opened this issue Apr 27, 2022 · 14 comments
Closed

Support js codemods: represent whitespace decisions in AST #12709

conartist6 opened this issue Apr 27, 2022 · 14 comments

Comments

@conartist6
Copy link

Codemods are usually a process of reading a source into an AST, mutating the AST, then overwriting the original source with the modified version. To do this well you obviously need to preserve formatting to the greatest extent possible.

I'm going to start this is a meta-issue for several related issues I've found: #2068, #4675, #5998, and probably more that have some bearing on how to use prettier on a javascript AST. Currently prettier assumes when printing that the AST it has is unmodified. In particular it assumes that it can use the locations attached to nodes to inspect the original source text and determine whether there was whitespace in very specific cases where it preserves original formatting. Those cases are:

  • Allowing a blank line between two statements
  • Choosing whether to print the key: value mappings inside brackets ({}) on one or more lines

Eliminating the dependency on the source text would mean building this bit of state into the AST itself. I am strongly in favor of this, as building it into the AST should ensure that it is possible not just to preserve existing formatting, but to specify what formatting state can be specified programatically, for example by demanding that an inserted statement be padded with blank lines.

I think there's strong incentive to build this (and I would like to), as it would allow me to avoid using recast, which is currently a hassle as I have to use recast to transform a source, then print it, then use prettier to reformat the string.

@conartist6 conartist6 changed the title Prettier for codemods (js) Prettier for js codemods: represent whitespace decisions in AST Apr 27, 2022
@conartist6 conartist6 changed the title Prettier for js codemods: represent whitespace decisions in AST Support js codemods: represent whitespace decisions in AST Apr 27, 2022
@conartist6
Copy link
Author

Let's think about what's in involved in building this. The first obstacle is that prettier supports a number of parsers. They all represent what location data they do store a bit differently. Some have a loc property. Some have a range property. All of them except for esprima store start and end properties on the node itself. Anywhere we choose to store the data regarding whitespace in the AST it will be possible that whatever tools or transforms touch the AST damage it.

We could store break and blank positions in sorted arrays at the root of the AST. It would be possible to quickly binary search the arrays to determine if any two nodes have a break or blank that falls between the end of one and the start of the next. This strategy would be fully resilient to the transforms removing nodes, e.g.

{ foo, bar }
// becoming
{ bar }

or

foo;

bar;

baz;
// becoming
foo;

baz;

While that solves the problem of how to preserve the spacing coming from the original code, it doesn't help us with being able to make new whitespace in code that we generate, which by definition will not yet have any location information.

For imports, object properties, or object patterns the entry list it makes sense I think to allow the owner node to be marked with a node.alwaysMultiLine property which prettier would read.

Statements are a bit harder. Let's say you're writing a transform to insert a middle statement between first and last in callbacks to thunk:

thunk(() => {
  first;
  last;
});

thunk(() => {
  first;

  last;
});

You need to provide an API that can fully specify where a break does or does not belong. I propose this api: node.blankLineAbove: 'ensure' | 'avoid' | 'auto' and node.blankLineBelow: 'ensure' | 'avoid' | 'auto'. When no value is specified the default would be auto. I propose that auto be dependent on whether or not the statement above and the statement below were separated by a blank line in the original code. If so, it would be equivalent to blankLineAbove: 'ensure', blankLineBelow: 'ensure'. If not it would be equivalent to blankLineAbove: 'avoid', blankLineBelow: 'avoid'.

Combining these strategies allows us to create a more robust and powerful system for preserving whitespace and allowing it to be manipulated than either strategy alone would be able to offer, and could completely replace the existing dependence on having a source string present. When printing an AST with no location data at all the most compact form would be chosen except for places where prettier's rules demand blank lines.

@conartist6
Copy link
Author

The way I've written the proposal usage of the blankLine* properties would probably be relatively common. For example whenever appending to a list of declarations that aren't usually space-separated themselves but are followed by a blank line (imports, for example) you'd want to use blankLineAbove: 'avoid'. I think that's easy enough, as what you need to specify is quite closely related to your purpose. And of course if you're inserting some statement with important side-effects, I'd strongly recommend blankLineAbove: 'ensure', blankLineBelow: 'ensure'.

@conartist6
Copy link
Author

conartist6 commented Apr 28, 2022

I did a quick search through the code for originalText to see if I'd find anything that would indicate that this would be outright impossible to build. Here's what I found:

Methods:

  • getNextNonSpaceNonCommentCharacter
  • getNextNonSpaceNonCommentCharacterIndex
  • getNextNonSpaceNonCommentCharacterIndexWithStartIndex
  • isFlowAnnotationComment
  • hasNewline
  • hasNewlineInRange
  • isNextLineEmpty
  • isNextLineEmptyAfterIndex
  • isTemplateOnItsOwnLine
  • hasLeadingOwnLineComment

Other usages:

if (isLineComment(comment)) {
// Supports `//`, `#!`, `<!--`, and `-->`
return options.originalText
.slice(locStart(comment), locEnd(comment))
.trimEnd();
}

const isInsideFlowComment =
options.originalText.slice(commentEnd - 3, commentEnd) === "*-/";

const commentStartIndex = options.originalText.lastIndexOf("/*", start);
const commentEndIndex = options.originalText.indexOf("*/", end);
if (commentStartIndex !== -1 && commentEndIndex !== -1) {
const comment = options.originalText
.slice(commentStartIndex + 2, commentEndIndex)
.trim();

options.originalText.slice(nextCharacter, nextCharacter + 2) === "=>"

return !/{\s*}/.test(
options.originalText.slice(locStart(node), locStart(source))
);

@conartist6
Copy link
Author

conartist6 commented Apr 28, 2022

Of all those the award for scaring me most goes to getNextNonSpaceNonCommentCharacter family of methods and their usages.

Those methods are used at times to peek for specific characters (;, :, (, ), and =>) in the source, mostly when deciding where to attach comments.

@conartist6
Copy link
Author

conartist6 commented Apr 29, 2022

Prettier's printer basically has the shape (...args) => [...parts]. One thing I'm curious about is changing that to be implemented with a generator having the shape (...args) => IterableIterator<parts>. This would incur some definite cost, but it would also make it easy to to check the next part without doing more work than is necessary (i.e. constructing all the parts of a node and its children). You could then easily check whether => is the next token that would be printed when it was necessary for the logic (where getNextNonSpaceNonCommentCharacter is used currently).

@conartist6
Copy link
Author

I think the thing to do is to fork prettier and start ripping things out until I have a toy pretty-printer that's significantly easier to experiment on. Then I can actually try some of these techniques and understand better what it is I really want to propose and what the benefits and drawbacks are.

@conartist6
Copy link
Author

Note to self: prettier's core data structure is Doc and it's a tree, not a flat list. It's not likely to be quite so simple as "just make it iterable".

@conartist6
Copy link
Author

I think I mentioned it on the recast thread, but the effective initial commit of this repo is quite interesting and informative. It has docs about internals that are no longer published anywhere, and it still is based closely enough on recast that it supports pretty-printing nodes which were programmatically inserted into the DOM.

@conartist6
Copy link
Author

conartist6 commented May 4, 2022

Next in the wayback machine, the introduction of originalText checks to keep blank lines from the source:

prettier/src/printer.js

Lines 2068 to 2090 in bcd44b4

function shouldAddSpacing(node, options) {
const text = options.originalText;
const length = text.length;
let cursor = node.end + 1;
// Look forward and see if there is a blank line after this code by
// scanning up to the next non-indentation character.
while(cursor < length) {
const c = text.charAt(cursor);
// Skip any indentation characters (this will detect lines with
// spaces or tabs as blank)
if(c !== " " && c !== "\t") {
// If the next character is a newline, this line is blank so we
// should add a blank line.
if(c === "\n" || c === "\r") {
return true;
}
return false;
}
cursor++;
}
return false;
}

Note that this is maybe 30 commits in, so there is very little history of this repository in which it has not been dependent on originalText.

@conartist6
Copy link
Author

Other discussion of this issue: estree/estree#265 (comment)

@conartist6
Copy link
Author

conartist6 commented May 4, 2022

Which leads here: estree/estree#41

@conartist6
Copy link
Author

A practical-sounding proposal: estree/estree#41 (comment)

@conartist6
Copy link
Author

OK as far as I can tell much of the existing discussion has been rolled up into cst, which appears to be much better than anything that I could propose.

@conartist6
Copy link
Author

I'm going to close this issue and make a new one. I've said too much stuff that is just completely wrong or nonsensical, so I'm going to start over.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant