Skip to content

Commit

Permalink
Add initial formatter implementation (astral-sh#2883)
Browse files Browse the repository at this point in the history
# Summary

This PR contains the code for the autoformatter proof-of-concept.

## Crate structure

The primary formatting hook is the `fmt` function in `crates/ruff_python_formatter/src/lib.rs`.

The current formatter approach is outlined in `crates/ruff_python_formatter/src/lib.rs`, and is structured as follows:

- Tokenize the code using the RustPython lexer.
- In `crates/ruff_python_formatter/src/trivia.rs`, extract a variety of trivia tokens from the token stream. These include comments, trailing commas, and empty lines.
- Generate the AST via the RustPython parser.
- In `crates/ruff_python_formatter/src/cst.rs`, convert the AST to a CST structure. As of now, the CST is nearly identical to the AST, except that every node gets a `trivia` vector. But we might want to modify it further.
- In `crates/ruff_python_formatter/src/attachment.rs`, attach each trivia token to the corresponding CST node. The logic for this is mostly in `decorate_trivia` and is ported almost directly from Prettier (given each token, find its preceding, following, and enclosing nodes, then attach the token to the appropriate node in a second pass).
- In `crates/ruff_python_formatter/src/newlines.rs`, normalize newlines to match Black’s preferences. This involves traversing the CST and inserting or removing `TriviaToken` values as we go.
- Call `format!` on the CST, which delegates to type-specific formatter implementations (e.g., `crates/ruff_python_formatter/src/format/stmt.rs` for `Stmt` nodes, and similar for `Expr` nodes; the others are trivial). Those type-specific implementations delegate to kind-specific functions (e.g., `format_func_def`).

## Testing and iteration

The formatter is being developed against the Black test suite, which was copied over in-full to `crates/ruff_python_formatter/resources/test/fixtures/black`.

The Black fixtures had to be modified to create `[insta](https://github.com/mitsuhiko/insta)`-compatible snapshots, which now exist in the repo.

My approach thus far has been to try and improve coverage by tackling fixtures one-by-one.

## What works, and what doesn’t

- *Most* nodes are supported at a basic level (though there are a few stragglers at time of writing, like `StmtKind::Try`).
- Newlines are properly preserved in most cases.
- Magic trailing commas are properly preserved in some (but not all) cases.
- Trivial leading and trailing standalone comments mostly work (although maybe not at the end of a file).
- Inline comments, and comments within expressions, often don’t work -- they work in a few cases, but it’s one-off right now. (We’re probably associating them with the “right” nodes more often than we are actually rendering them in the right place.)
- We don’t properly normalize string quotes. (At present, we just repeat any constants verbatim.)
- We’re mishandling a bunch of wrapping cases (if we treat Black as the reference implementation). Here are a few examples (demonstrating Black's stable behavior):

```py
# In some cases, if the end expression is "self-closing" (functions,
# lists, dictionaries, sets, subscript accesses, and any length-two
# boolean operations that end in these elments), Black
# will wrap like this...
if some_expression and f(
    b,
    c,
    d,
):
    pass

# ...whereas we do this:
if (
    some_expression
    and f(
        b,
        c,
        d,
    )
):
    pass

# If function arguments can fit on a single line, then Black will
# format them like this, rather than exploding them vertically.
if f(
    a, b, c, d, e, f, g, ...
):
    pass
```

- We don’t properly preserve parentheses in all cases. Black preserves parentheses in some but not all cases.
  • Loading branch information
charliermarsh committed Feb 15, 2023
1 parent f661c90 commit ca49b00
Show file tree
Hide file tree
Showing 134 changed files with 12,044 additions and 18 deletions.
33 changes: 25 additions & 8 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion _typos.toml
@@ -1,5 +1,5 @@
[files]
extend-exclude = ["snapshots"]
extend-exclude = ["snapshots", "black"]

[default.extend-words]
trivias = "trivias"
Expand Down
2 changes: 1 addition & 1 deletion crates/ruff/Cargo.toml
Expand Up @@ -21,7 +21,7 @@ bisection = { version = "0.1.0" }
bitflags = { version = "1.3.2" }
cfg-if = { version = "1.0.0" }
chrono = { version = "0.4.21", default-features = false, features = ["clock"] }
clap = { version = "4.0.1", features = ["derive", "env"] }
clap = { workspace = true, features = ["derive", "env"] }
colored = { version = "2.0.0" }
dirs = { version = "4.0.0" }
fern = { version = "0.6.1" }
Expand Down
2 changes: 1 addition & 1 deletion crates/ruff/src/ast/helpers.rs
Expand Up @@ -723,7 +723,7 @@ where
StmtKind::FunctionDef { .. } | StmtKind::AsyncFunctionDef { .. } => {
// Don't recurse.
}
StmtKind::Return { value } => self.returns.push(value.as_ref().map(|expr| &**expr)),
StmtKind::Return { value } => self.returns.push(value.as_deref()),
_ => visitor::walk_stmt(self, stmt),
}
}
Expand Down
7 changes: 1 addition & 6 deletions crates/ruff/src/checkers/ast.rs
Expand Up @@ -1560,12 +1560,7 @@ where
pyflakes::rules::assert_tuple(self, stmt, test);
}
if self.settings.rules.enabled(&Rule::AssertFalse) {
flake8_bugbear::rules::assert_false(
self,
stmt,
test,
msg.as_ref().map(|expr| &**expr),
);
flake8_bugbear::rules::assert_false(self, stmt, test, msg.as_deref());
}
if self.settings.rules.enabled(&Rule::Assert) {
self.diagnostics
Expand Down
2 changes: 1 addition & 1 deletion crates/ruff_cli/Cargo.toml
Expand Up @@ -33,7 +33,7 @@ bincode = { version = "1.3.3" }
bitflags = { version = "1.3.2" }
cachedir = { version = "0.3.0" }
chrono = { version = "0.4.21", default-features = false, features = ["clock"] }
clap = { version = "4.0.1", features = ["derive", "env"] }
clap = { workspace = true, features = ["derive", "env"] }
clap_complete_command = { version = "0.4.0" }
clearscreen = { version = "2.0.0" }
colored = { version = "2.0.0" }
Expand Down
19 changes: 19 additions & 0 deletions crates/ruff_python_formatter/Cargo.toml
@@ -0,0 +1,19 @@
[package]
name = "ruff_python_formatter"
version = "0.0.0"
publish = false
edition = "2021"

[dependencies]
anyhow = { workspace = true }
clap = { workspace = true }
once_cell = { workspace = true }
ruff_formatter = { path = "../ruff_formatter" }
ruff_text_size = { path = "../ruff_text_size" }
rustc-hash = { workspace = true }
rustpython-common = { workspace = true }
rustpython-parser = { workspace = true }

[dev-dependencies]
insta = { version = "1.19.0", features = [] }
test-case = { version = "2.2.2" }
@@ -0,0 +1,22 @@
ax = 123456789 .bit_count()
x = (123456).__abs__()
x = .1.is_integer()
x = 1. .imag
x = 1E+1.imag
x = 1E-1.real
x = 123456789.123456789.hex()
x = 123456789.123456789E123456789 .real
x = 123456789E123456789 .conjugate()
x = 123456789J.real
x = 123456789.123456789J.__add__(0b1011.bit_length())
x = 0XB1ACC.conjugate()
x = 0B1011 .conjugate()
x = 0O777 .real
x = 0.000000006 .hex()
x = -100.0000J

if 10 .real:
...

y = 100[no]
y = 100(no)
@@ -0,0 +1,7 @@
\





print("hello, world")
@@ -0,0 +1,6 @@
for ((x in {}) or {})['a'] in x:
pass
pem_spam = lambda l, spam = {
"x": 3
}: not spam.get(l.strip())
lambda x=lambda y={1: 3}: y['x':lambda y: {1: 2}]: x
@@ -0,0 +1,23 @@
class SimpleClassWithBlankParentheses():
pass
class ClassWithSpaceParentheses ( ):
first_test_data = 90
second_test_data = 100
def test_func(self):
return None
class ClassWithEmptyFunc(object):

def func_with_blank_parentheses():
return 5


def public_func_with_blank_parentheses():
return None
def class_under_the_func_with_blank_parentheses():
class InsideFunc():
pass
class NormalClass (
):
def func_for_testing(self, first, second):
sum = first + second
return sum
@@ -0,0 +1,100 @@
class ClassSimplest:
pass
class ClassWithSingleField:
a = 1
class ClassWithJustTheDocstring:
"""Just a docstring."""
class ClassWithInit:
def __init__(self):
pass
class ClassWithTheDocstringAndInit:
"""Just a docstring."""
def __init__(self):
pass
class ClassWithInitAndVars:
cls_var = 100
def __init__(self):
pass
class ClassWithInitAndVarsAndDocstring:
"""Test class"""
cls_var = 100
def __init__(self):
pass
class ClassWithDecoInit:
@deco
def __init__(self):
pass
class ClassWithDecoInitAndVars:
cls_var = 100
@deco
def __init__(self):
pass
class ClassWithDecoInitAndVarsAndDocstring:
"""Test class"""
cls_var = 100
@deco
def __init__(self):
pass
class ClassSimplestWithInner:
class Inner:
pass
class ClassSimplestWithInnerWithDocstring:
class Inner:
"""Just a docstring."""
def __init__(self):
pass
class ClassWithSingleFieldWithInner:
a = 1
class Inner:
pass
class ClassWithJustTheDocstringWithInner:
"""Just a docstring."""
class Inner:
pass
class ClassWithInitWithInner:
class Inner:
pass
def __init__(self):
pass
class ClassWithInitAndVarsWithInner:
cls_var = 100
class Inner:
pass
def __init__(self):
pass
class ClassWithInitAndVarsAndDocstringWithInner:
"""Test class"""
cls_var = 100
class Inner:
pass
def __init__(self):
pass
class ClassWithDecoInitWithInner:
class Inner:
pass
@deco
def __init__(self):
pass
class ClassWithDecoInitAndVarsWithInner:
cls_var = 100
class Inner:
pass
@deco
def __init__(self):
pass
class ClassWithDecoInitAndVarsAndDocstringWithInner:
"""Test class"""
cls_var = 100
class Inner:
pass
@deco
def __init__(self):
pass
class ClassWithDecoInitAndVarsAndDocstringWithInner2:
"""Test class"""
class Inner:
pass
cls_var = 100
@deco
def __init__(self):
pass
@@ -0,0 +1,71 @@
import core, time, a

from . import A, B, C

# keeps existing trailing comma
from foo import (
bar,
)

# also keeps existing structure
from foo import (
baz,
qux,
)

# `as` works as well
from foo import (
xyzzy as magic,
)

a = {1,2,3,}
b = {
1,2,
3}
c = {
1,
2,
3,
}
x = 1,
y = narf(),
nested = {(1,2,3),(4,5,6),}
nested_no_trailing_comma = {(1,2,3),(4,5,6)}
nested_long_lines = ["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", "cccccccccccccccccccccccccccccccccccccccc", (1, 2, 3), "dddddddddddddddddddddddddddddddddddddddd"]
{"oneple": (1,),}
{"oneple": (1,)}
['ls', 'lsoneple/%s' % (foo,)]
x = {"oneple": (1,)}
y = {"oneple": (1,),}
assert False, ("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa wraps %s" % bar)

# looping over a 1-tuple should also not get wrapped
for x in (1,):
pass
for (x,) in (1,), (2,), (3,):
pass

[1, 2, 3,]

division_result_tuple = (6/2,)
print("foo %r", (foo.bar,))

if True:
IGNORED_TYPES_FOR_ATTRIBUTE_CHECKING = (
Config.IGNORED_TYPES_FOR_ATTRIBUTE_CHECKING
| {pylons.controllers.WSGIController}
)

if True:
ec2client.get_waiter('instance_stopped').wait(
InstanceIds=[instance.id],
WaiterConfig={
'Delay': 5,
})
ec2client.get_waiter("instance_stopped").wait(
InstanceIds=[instance.id],
WaiterConfig={"Delay": 5,},
)
ec2client.get_waiter("instance_stopped").wait(
InstanceIds=[instance.id], WaiterConfig={"Delay": 5,},
)
@@ -0,0 +1,9 @@
def bob(): \
# pylint: disable=W9016
pass


def bobtwo(): \
\
# some comment here
pass

0 comments on commit ca49b00

Please sign in to comment.