Support js codemods: represent whitespace decisions in AST #12709

conartist6 · 2022-04-27T19:02:02Z

Codemods are usually a process of reading a source into an AST, mutating the AST, then overwriting the original source with the modified version. To do this well you obviously need to preserve formatting to the greatest extent possible.

I'm going to start this is a meta-issue for several related issues I've found: #2068, #4675, #5998, and probably more that have some bearing on how to use prettier on a javascript AST. Currently prettier assumes when printing that the AST it has is unmodified. In particular it assumes that it can use the locations attached to nodes to inspect the original source text and determine whether there was whitespace in very specific cases where it preserves original formatting. Those cases are:

Allowing a blank line between two statements
Choosing whether to print the key: value mappings inside brackets ({}) on one or more lines

Eliminating the dependency on the source text would mean building this bit of state into the AST itself. I am strongly in favor of this, as building it into the AST should ensure that it is possible not just to preserve existing formatting, but to specify what formatting state can be specified programatically, for example by demanding that an inserted statement be padded with blank lines.

I think there's strong incentive to build this (and I would like to), as it would allow me to avoid using recast, which is currently a hassle as I have to use recast to transform a source, then print it, then use prettier to reformat the string.

The text was updated successfully, but these errors were encountered:

conartist6 · 2022-04-27T21:04:51Z

Let's think about what's in involved in building this. The first obstacle is that prettier supports a number of parsers. They all represent what location data they do store a bit differently. Some have a loc property. Some have a range property. All of them except for esprima store start and end properties on the node itself. Anywhere we choose to store the data regarding whitespace in the AST it will be possible that whatever tools or transforms touch the AST damage it.

We could store break and blank positions in sorted arrays at the root of the AST. It would be possible to quickly binary search the arrays to determine if any two nodes have a break or blank that falls between the end of one and the start of the next. This strategy would be fully resilient to the transforms removing nodes, e.g.

{ foo, bar }
// becoming
{ bar }

or

foo;

bar;

baz;
// becoming
foo;

baz;

While that solves the problem of how to preserve the spacing coming from the original code, it doesn't help us with being able to make new whitespace in code that we generate, which by definition will not yet have any location information.

For imports, object properties, or object patterns the entry list it makes sense I think to allow the owner node to be marked with a node.alwaysMultiLine property which prettier would read.

Statements are a bit harder. Let's say you're writing a transform to insert a middle statement between first and last in callbacks to thunk:

thunk(() => {
  first;
  last;
});

thunk(() => {
  first;

  last;
});

You need to provide an API that can fully specify where a break does or does not belong. I propose this api: node.blankLineAbove: 'ensure' | 'avoid' | 'auto' and node.blankLineBelow: 'ensure' | 'avoid' | 'auto'. When no value is specified the default would be auto. I propose that auto be dependent on whether or not the statement above and the statement below were separated by a blank line in the original code. If so, it would be equivalent to blankLineAbove: 'ensure', blankLineBelow: 'ensure'. If not it would be equivalent to blankLineAbove: 'avoid', blankLineBelow: 'avoid'.

Combining these strategies allows us to create a more robust and powerful system for preserving whitespace and allowing it to be manipulated than either strategy alone would be able to offer, and could completely replace the existing dependence on having a source string present. When printing an AST with no location data at all the most compact form would be chosen except for places where prettier's rules demand blank lines.

conartist6 · 2022-04-27T21:35:06Z

The way I've written the proposal usage of the blankLine* properties would probably be relatively common. For example whenever appending to a list of declarations that aren't usually space-separated themselves but are followed by a blank line (imports, for example) you'd want to use blankLineAbove: 'avoid'. I think that's easy enough, as what you need to specify is quite closely related to your purpose. And of course if you're inserting some statement with important side-effects, I'd strongly recommend blankLineAbove: 'ensure', blankLineBelow: 'ensure'.

conartist6 · 2022-04-28T15:30:27Z

I did a quick search through the code for originalText to see if I'd find anything that would indicate that this would be outright impossible to build. Here's what I found:

Methods:

getNextNonSpaceNonCommentCharacter
getNextNonSpaceNonCommentCharacterIndex
getNextNonSpaceNonCommentCharacterIndexWithStartIndex
isFlowAnnotationComment
hasNewline
hasNewlineInRange
isNextLineEmpty
isNextLineEmptyAfterIndex
isTemplateOnItsOwnLine
hasLeadingOwnLineComment

Other usages:

prettier/src/language-js/print/comment.js

Lines 16 to 21 in e23b89b

    
           if (isLineComment(comment)) { 
        
             // Supports `//`, `#!`, `<!--`, and `-->` 
        
             return options.originalText 
        
               .slice(locStart(comment), locEnd(comment)) 
        
               .trimEnd(); 
        
           }

prettier/src/language-js/print/comment.js

Lines 41 to 42 in e23b89b

    
           const isInsideFlowComment = 
        
             options.originalText.slice(commentEnd - 3, commentEnd) === "*-/";

prettier/src/language-js/print/flow.js

Lines 306 to 311 in e23b89b

    
           const commentStartIndex = options.originalText.lastIndexOf("/*", start); 
        
           const commentEndIndex = options.originalText.indexOf("*/", end); 
        
           if (commentStartIndex !== -1 && commentEndIndex !== -1) { 
        
             const comment = options.originalText 
        
               .slice(commentStartIndex + 2, commentEndIndex) 
        
               .trim();

prettier/src/language-js/print/function.js

Line 231 in e23b89b

options.originalText.slice(nextCharacter, nextCharacter + 2) === "=>"

prettier/src/language-js/print/module.js

Lines 262 to 264 in e23b89b

    
           return !/{\s*}/.test( 
        
             options.originalText.slice(locStart(node), locStart(source)) 
        
           );

conartist6 · 2022-04-28T15:40:26Z

Of all those the award for scaring me most goes to getNextNonSpaceNonCommentCharacter family of methods and their usages.

Those methods are used at times to peek for specific characters (;, :, (, ), and =>) in the source, mostly when deciding where to attach comments.

conartist6 · 2022-04-29T16:49:39Z

Prettier's printer basically has the shape (...args) => [...parts]. One thing I'm curious about is changing that to be implemented with a generator having the shape (...args) => IterableIterator<parts>. This would incur some definite cost, but it would also make it easy to to check the next part without doing more work than is necessary (i.e. constructing all the parts of a node and its children). You could then easily check whether => is the next token that would be printed when it was necessary for the logic (where getNextNonSpaceNonCommentCharacter is used currently).

conartist6 · 2022-04-29T16:57:25Z

I think the thing to do is to fork prettier and start ripping things out until I have a toy pretty-printer that's significantly easier to experiment on. Then I can actually try some of these techniques and understand better what it is I really want to propose and what the benefits and drawbacks are.

conartist6 · 2022-04-29T19:05:33Z

Note to self: prettier's core data structure is Doc and it's a tree, not a flat list. It's not likely to be quite so simple as "just make it iterable".

conartist6 · 2022-05-03T14:20:23Z

I think I mentioned it on the recast thread, but the effective initial commit of this repo is quite interesting and informative. It has docs about internals that are no longer published anywhere, and it still is based closely enough on recast that it supports pretty-printing nodes which were programmatically inserted into the DOM.

conartist6 · 2022-05-04T13:38:21Z

Next in the wayback machine, the introduction of originalText checks to keep blank lines from the source:

prettier/src/printer.js

Lines 2068 to 2090 in bcd44b4

    
           function shouldAddSpacing(node, options) { 
        
             const text = options.originalText; 
        
             const length = text.length; 
        
             let cursor = node.end + 1; 
        
             // Look forward and see if there is a blank line after this code by 
        
             // scanning up to the next non-indentation character. 
        
             while(cursor < length) { 
        
               const c = text.charAt(cursor); 
        
               // Skip any indentation characters (this will detect lines with 
        
               // spaces or tabs as blank) 
        
               if(c !== " " && c !== "\t") { 
        
                 // If the next character is a newline, this line is blank so we 
        
                 // should add a blank line. 
        
                 if(c === "\n" || c === "\r") { 
        
                   return true; 
        
                 } 
        
                 return false; 
        
               } 
        
               cursor++; 
        
             } 
        
             return false; 
        
           }

Note that this is maybe 30 commits in, so there is very little history of this repository in which it has not been dependent on originalText.

conartist6 · 2022-05-04T14:17:14Z

Other discussion of this issue: estree/estree#265 (comment)

conartist6 · 2022-05-04T14:25:19Z

Which leads here: estree/estree#41

conartist6 · 2022-05-04T14:30:20Z

A practical-sounding proposal: estree/estree#41 (comment)

conartist6 · 2022-05-04T17:51:00Z

OK as far as I can tell much of the existing discussion has been rolled up into cst, which appears to be much better than anything that I could propose.

conartist6 · 2022-05-04T17:59:15Z

I'm going to close this issue and make a new one. I've said too much stuff that is just completely wrong or nonsensical, so I'm going to start over.

conartist6 changed the title ~~Prettier for codemods (js)~~ Prettier for js codemods: represent whitespace decisions in AST Apr 27, 2022

conartist6 changed the title ~~Prettier for js codemods: represent whitespace decisions in AST~~ Support js codemods: represent whitespace decisions in AST Apr 27, 2022

conartist6 mentioned this issue Apr 27, 2022

Recast needs additional maintainers benjamn/recast#1086

Closed

conartist6 closed this as completed May 4, 2022

conartist6 mentioned this issue May 7, 2022

Support codemods: use a CST #12806

Open

github-actions bot locked as resolved and limited conversation to collaborators May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support js codemods: represent whitespace decisions in AST #12709

Support js codemods: represent whitespace decisions in AST #12709

conartist6 commented Apr 27, 2022

conartist6 commented Apr 27, 2022

conartist6 commented Apr 27, 2022

conartist6 commented Apr 28, 2022 •

edited

conartist6 commented Apr 28, 2022 •

edited

conartist6 commented Apr 29, 2022 •

edited

conartist6 commented Apr 29, 2022

conartist6 commented Apr 29, 2022

conartist6 commented May 3, 2022

conartist6 commented May 4, 2022 •

edited

conartist6 commented May 4, 2022

conartist6 commented May 4, 2022 •

edited

conartist6 commented May 4, 2022

conartist6 commented May 4, 2022

conartist6 commented May 4, 2022

Support js codemods: represent whitespace decisions in AST #12709

Support js codemods: represent whitespace decisions in AST #12709

Comments

conartist6 commented Apr 27, 2022

conartist6 commented Apr 27, 2022

conartist6 commented Apr 27, 2022

conartist6 commented Apr 28, 2022 • edited

Methods:

Other usages:

conartist6 commented Apr 28, 2022 • edited

conartist6 commented Apr 29, 2022 • edited

conartist6 commented Apr 29, 2022

conartist6 commented Apr 29, 2022

conartist6 commented May 3, 2022

conartist6 commented May 4, 2022 • edited

conartist6 commented May 4, 2022

conartist6 commented May 4, 2022 • edited

conartist6 commented May 4, 2022

conartist6 commented May 4, 2022

conartist6 commented May 4, 2022

conartist6 commented Apr 28, 2022 •

edited

conartist6 commented Apr 28, 2022 •

edited

conartist6 commented Apr 29, 2022 •

edited

conartist6 commented May 4, 2022 •

edited

conartist6 commented May 4, 2022 •

edited