Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter notebook support #2357

Merged
merged 85 commits into from Aug 6, 2021
Merged
Show file tree
Hide file tree
Changes from 84 commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
05fee79
wip
MarcoGorelli Jul 2, 2021
55bdaed
fixup tests
MarcoGorelli Jul 3, 2021
61bb015
skip tests if no IPython
MarcoGorelli Jul 3, 2021
53cc3f4
install test requirements in ipynb tests
MarcoGorelli Jul 4, 2021
27aa4dc
if --ipynb format all as ipynb
MarcoGorelli Jul 4, 2021
33887d2
wip
MarcoGorelli Jul 4, 2021
9e23bc6
add some whole-notebook tests
MarcoGorelli Jul 4, 2021
6fa73bc
docstrings
MarcoGorelli Jul 4, 2021
9e611f5
skip multiline magics
MarcoGorelli Jul 4, 2021
1f9ecca
add test for nested cell magic
MarcoGorelli Jul 4, 2021
8bed188
remove ipynb_test.yml, put ipynb tests in tox.ini
MarcoGorelli Jul 4, 2021
8000b03
add changelog entry
MarcoGorelli Jul 4, 2021
fb03aaa
typo
MarcoGorelli Jul 4, 2021
991f85f
make token same length as magic it replaces
MarcoGorelli Jul 4, 2021
978f6ae
only include .ipynb by default if jupyter dependencies are found
MarcoGorelli Jul 5, 2021
9f9c442
remove logic from const
MarcoGorelli Jul 5, 2021
85d34cd
fixup
MarcoGorelli Jul 5, 2021
fda607b
fixup
MarcoGorelli Jul 5, 2021
7d77e7d
re.compile
MarcoGorelli Jul 5, 2021
3eb55ff
noop
MarcoGorelli Jul 5, 2021
603821c
clear up
MarcoGorelli Jul 5, 2021
eff0df5
new_src -> dst
MarcoGorelli Jul 6, 2021
742a667
early exit for non-python notebooks
MarcoGorelli Jul 6, 2021
57e1577
add non-python test notebook
MarcoGorelli Jul 6, 2021
58ea513
add repo with many notebooks to black-primer
MarcoGorelli Jul 7, 2021
2ab8ca2
install extra dependencies for black-primer
MarcoGorelli Jul 7, 2021
d98e49f
fix planetary computer examples url
MarcoGorelli Jul 7, 2021
965ef50
dont run on ipynb files by default
MarcoGorelli Jul 7, 2021
083e794
add scikit-lego (Expected to change) to black-primer
MarcoGorelli Jul 7, 2021
d3febc1
add ipynb-specific diff
MarcoGorelli Jul 7, 2021
62cce53
fixup
MarcoGorelli Jul 7, 2021
f786a38
run on all (including ipynb) by default
MarcoGorelli Jul 7, 2021
8853aa3
remove --include .ipynb from scikit-lego black-primer
MarcoGorelli Jul 7, 2021
879cd11
use tokenize so as to mirror the exact logic in IPython.core.displayh…
MarcoGorelli Jul 8, 2021
6716963
fixup
MarcoGorelli Jul 8, 2021
786ac06
:art:
MarcoGorelli Jul 8, 2021
48b56e1
clarify docstring
MarcoGorelli Jul 8, 2021
c74959d
add test for when comment is after trailing semicolon
MarcoGorelli Jul 8, 2021
9e3b9bd
enumerate(reversed) instead of [::-1]
MarcoGorelli Jul 8, 2021
200669f
clarify docstrings
MarcoGorelli Jul 8, 2021
db9a8ba
wip
MarcoGorelli Jul 9, 2021
18a502a
use jupyter and no_jupyter marks
MarcoGorelli Jul 9, 2021
d6a4869
use THIS_DIR
MarcoGorelli Jul 9, 2021
e45a208
windows fixup
MarcoGorelli Jul 9, 2021
98d1ea3
perform safe check cell-by-cell for ipynb
MarcoGorelli Jul 10, 2021
9370244
only perform safe check in ipynb if not fast
MarcoGorelli Jul 10, 2021
a6c6341
remove redundant Optional
MarcoGorelli Jul 10, 2021
ddc34e1
:art:
MarcoGorelli Jul 10, 2021
75d493a
use typeguard
MarcoGorelli Jul 10, 2021
49b9e59
dont process cell containing transformed magic
MarcoGorelli Jul 10, 2021
98cb06a
require typing extensions before 3.10 so as to have TypeGuard
MarcoGorelli Jul 10, 2021
aba9d41
use dataclasses
MarcoGorelli Jul 10, 2021
9a421f8
mention black[jupyter] in docs as well as in README
MarcoGorelli Jul 10, 2021
b40f7d6
add faq
MarcoGorelli Jul 10, 2021
0fa616a
add message to assertion error
MarcoGorelli Jul 11, 2021
c8d12df
add test for indented quieted cell
MarcoGorelli Jul 11, 2021
a6366c4
use tokenize_rt else we cant roundtrip
MarcoGorelli Jul 11, 2021
cd70366
fmake fronzet set for tokens to ignore when looking for trailing semi…
MarcoGorelli Jul 12, 2021
4c3da5d
remove planetary code examples as recent commits result in changes
MarcoGorelli Jul 12, 2021
20d63fb
Merge remote-tracking branch 'upstream/main' into jupyter
MarcoGorelli Jul 12, 2021
02151f0
use dataclasses which inherit from ast.NodeVisitor
MarcoGorelli Jul 13, 2021
4176294
Merge branch 'jupyter' of github.com:MarcoGorelli/black into jupyter
MarcoGorelli Jul 13, 2021
ec62768
bump typing-extensions so that TypeGuard is available
MarcoGorelli Jul 14, 2021
9b1b41b
Merge remote-tracking branch 'upstream/main' into jupyter
MarcoGorelli Jul 14, 2021
dce54b6
bump typing-extensions in Pipfile
MarcoGorelli Jul 14, 2021
f7f3ce4
add test with notebook with empty metadata
MarcoGorelli Jul 14, 2021
ede7de5
pipenv lock
MarcoGorelli Jul 14, 2021
67d6bce
deprivative validate_cell
MarcoGorelli Jul 14, 2021
c240b2c
Update README.md
JelleZijlstra Jul 17, 2021
32d6ee6
Update docs/getting_started.md
JelleZijlstra Jul 17, 2021
e2de6ce
Merge branch 'main' into jupyter
JelleZijlstra Jul 17, 2021
8f24601
dont cache notebooks if jupyter dependencies arent found
MarcoGorelli Jul 17, 2021
2c14df5
dont write to cache if jupyter deps are not installed
MarcoGorelli Jul 17, 2021
ce392f5
add notebook which cant be parsed
MarcoGorelli Jul 17, 2021
7ad25a1
use clirunner
MarcoGorelli Jul 17, 2021
9ccf84e
remove other subprocess calls
MarcoGorelli Jul 17, 2021
7d4cbf6
add docstring
MarcoGorelli Jul 17, 2021
8bf86ba
make verbose and quiet keyword only
MarcoGorelli Jul 17, 2021
46b802e
:art:
MarcoGorelli Jul 17, 2021
c074541
run second many test on directory, not on file
MarcoGorelli Jul 17, 2021
72c6af1
test for warning message when running on directory
MarcoGorelli Jul 17, 2021
dad04e2
early return from non-python cell magics
MarcoGorelli Jul 18, 2021
3eab2ce
move NothingChanged to report to avoid circular import
MarcoGorelli Jul 18, 2021
eec2685
remove circular import
MarcoGorelli Jul 18, 2021
67d38da
reinstate --ipynb flag
MarcoGorelli Jul 30, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/primer.yml
Expand Up @@ -38,7 +38,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install -e ".[d]"
python -m pip install -e ".[d,jupyter]"

- name: Primer run
env:
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Expand Up @@ -18,3 +18,4 @@ src/_black_version.py
*.swp
.hypothesis/
venv/
.ipynb_checkpoints/
5 changes: 4 additions & 1 deletion CHANGES.md
Expand Up @@ -2,7 +2,10 @@

## Unreleased

- Moved from `appdirs` dependency to `platformdirs` (#2375)
### _Black_

- Add support for formatting Jupyter Notebook files (#2357)
- Move from `appdirs` dependency to `platformdirs` (#2375)

## 21.7b0

Expand Down
2 changes: 1 addition & 1 deletion Pipfile
Expand Up @@ -34,6 +34,6 @@ pathspec = ">=0.8.1"
regex = ">=2020.1.8"
tomli = ">=0.2.6, <2.0.0"
typed-ast = "==1.4.2"
typing_extensions = {"python_version <" = "3.8","version >=" = "3.7.4"}
typing_extensions = {"python_version <" = "3.10","version >=" = "3.10.0.0"}
black = {editable = true,extras = ["d"],path = "."}
dataclasses = {"python_version <" = "3.7","version >" = "0.1.3"}
3 changes: 2 additions & 1 deletion README.md
Expand Up @@ -41,7 +41,8 @@ Try it out now using the [Black Playground](https://black.vercel.app). Watch the

_Black_ can be installed by running `pip install black`. It requires Python 3.6.2+ to
run. If you want to format Python 2 code as well, install with
`pip install black[python2]`.
`pip install black[python2]`. If you want to format Jupyter Notebooks, install with
`pip install black[jupyter]`.

If you can't wait for the latest _hotness_ and want to install from GitHub, use:

Expand Down
23 changes: 23 additions & 0 deletions docs/faq.md
Expand Up @@ -37,6 +37,29 @@ Most likely because it is ignored in `.gitignore` or excluded with configuration
[file collection and discovery](usage_and_configuration/file_collection_and_discovery.md)
for details.

## Why is my Jupyter Notebook cell not formatted?

_Black_ is timid about formatting Jupyter Notebooks. Cells containing any of the
following will not be formatted:

- automagics (e.g. `pip install black`)
- multiline magics, e.g.:

```python
%timeit f(1, \
2, \
3)
```

- code which `IPython`'s `TransformerManager` would transform magics into, e.g.:

```python
get_ipython().system('ls')
```

- invalid syntax, as it can't be safely distinguished from automagics in the absense of
a running `IPython` kernel.

## Why are Flake8's E203 and W503 violated?

Because they go against PEP 8. E203 falsely triggers on list
Expand Down
3 changes: 2 additions & 1 deletion docs/getting_started.md
Expand Up @@ -18,7 +18,8 @@ Also, you can try out _Black_ online for minimal fuss on the

_Black_ can be installed by running `pip install black`. It requires Python 3.6.2+ to
run, but can format Python 2 code too. Python 2 support needs the `typed_ast`
dependency, which be installed with `pip install black[python2]`.
dependency, which be installed with `pip install black[python2]`. If you want to format
Jupyter Notebooks, install with `pip install black[jupyter]`.

If you can't wait for the latest _hotness_ and want to install from GitHub, use:

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Expand Up @@ -31,4 +31,5 @@ build-backend = "setuptools.build_meta"
optional-tests = [
"no_python2: run when `python2` extra NOT installed",
"no_blackd: run when `d` extra NOT installed",
"no_jupyter: run when `jupyter` extra NOT installed",
ichard26 marked this conversation as resolved.
Show resolved Hide resolved
]
3 changes: 2 additions & 1 deletion setup.py
Expand Up @@ -79,14 +79,15 @@ def get_long_description() -> str:
"regex>=2020.1.8",
"pathspec>=0.8.1, <1",
"dataclasses>=0.6; python_version < '3.7'",
"typing_extensions>=3.7.4; python_version < '3.8'",
"typing_extensions>=3.10.0.0; python_version < '3.10'",
"mypy_extensions>=0.4.3",
],
extras_require={
"d": ["aiohttp>=3.6.0", "aiohttp-cors>=0.4.0"],
"colorama": ["colorama>=0.4.3"],
"python2": ["typed-ast>=1.4.2"],
"uvloop": ["uvloop>=0.15.2"],
"jupyter": ["ipython>=7.8.0", "tokenize-rt>=3.2.0"],
},
test_suite="tests.test_black",
classifiers=[
Expand Down
178 changes: 157 additions & 21 deletions src/black/__init__.py
@@ -1,4 +1,6 @@
import asyncio
from json.decoder import JSONDecodeError
import json
from concurrent.futures import Executor, ThreadPoolExecutor, ProcessPoolExecutor
from contextlib import contextmanager
from datetime import datetime
Expand All @@ -18,6 +20,7 @@
Generator,
Iterator,
List,
MutableMapping,
Optional,
Pattern,
Set,
Expand All @@ -39,13 +42,21 @@
from black.mode import Feature, supports_feature, VERSION_TO_FEATURES
from black.cache import read_cache, write_cache, get_cache_info, filter_cached, Cache
from black.concurrency import cancel, shutdown, maybe_install_uvloop
from black.output import dump_to_file, diff, color_diff, out, err
from black.report import Report, Changed
from black.output import dump_to_file, ipynb_diff, diff, color_diff, out, err
from black.report import Report, Changed, NothingChanged
from black.files import find_project_root, find_pyproject_toml, parse_pyproject_toml
from black.files import gen_python_files, get_gitignore, normalize_path_maybe_ignore
from black.files import wrap_stream_for_windows
from black.parsing import InvalidInput # noqa F401
from black.parsing import lib2to3_parse, parse_ast, stringify_ast
from black.handle_ipynb_magics import (
mask_cell,
unmask_cell,
remove_trailing_semicolon,
put_trailing_semicolon_back,
TRANSFORMED_MAGICS,
jupyter_dependencies_are_installed,
)


# lib2to3 fork
Expand All @@ -60,10 +71,6 @@
NewLine = str


class NothingChanged(UserWarning):
"""Raised when reformatted code is the same as source."""


class WriteBack(Enum):
NO = 0
YES = 1
Expand Down Expand Up @@ -504,6 +511,11 @@ def get_sources(
if is_stdin:
p = Path(f"{STDIN_PLACEHOLDER}{str(p)}")

if p.suffix == ".ipynb" and not jupyter_dependencies_are_installed(
verbose=verbose, quiet=quiet
):
continue

sources.add(p)
elif p.is_dir():
sources.update(
Expand All @@ -516,6 +528,8 @@ def get_sources(
force_exclude,
report,
gitignore,
verbose=verbose,
quiet=quiet,
)
)
elif s == "-":
Expand Down Expand Up @@ -585,6 +599,8 @@ def reformat_one(
if is_stdin:
if src.suffix == ".pyi":
mode = replace(mode, is_pyi=True)
elif src.suffix == ".ipynb":
mode = replace(mode, is_ipynb=True)
if format_stdin_to_stdout(fast=fast, write_back=write_back, mode=mode):
changed = Changed.YES
else:
Expand Down Expand Up @@ -733,6 +749,8 @@ def format_file_in_place(
"""
if src.suffix == ".pyi":
mode = replace(mode, is_pyi=True)
elif src.suffix == ".ipynb":
mode = replace(mode, is_ipynb=True)

then = datetime.utcfromtimestamp(src.stat().st_mtime)
with open(src, "rb") as buf:
Expand All @@ -741,6 +759,8 @@ def format_file_in_place(
dst_contents = format_file_contents(src_contents, fast=fast, mode=mode)
except NothingChanged:
return False
except JSONDecodeError:
raise ValueError(f"File '{src}' cannot be parsed as valid Jupyter notebook.")

if write_back == WriteBack.YES:
with open(src, "w", encoding=encoding, newline=newline) as f:
Expand All @@ -749,7 +769,10 @@ def format_file_in_place(
now = datetime.utcnow()
src_name = f"{src}\t{then} +0000"
dst_name = f"{src}\t{now} +0000"
diff_contents = diff(src_contents, dst_contents, src_name, dst_name)
if src.suffix == ".ipynb":
diff_contents = ipynb_diff(src_contents, dst_contents, src_name, dst_name)
else:
diff_contents = diff(src_contents, dst_contents, src_name, dst_name)

if write_back == WriteBack.COLOR_DIFF:
diff_contents = color_diff(diff_contents)
Expand Down Expand Up @@ -819,6 +842,29 @@ def format_stdin_to_stdout(
f.detach()


def check_stability_and_equivalence(
src_contents: str, dst_contents: str, *, mode: Mode
) -> None:
"""Perform stability and equivalence checks.

Raise AssertionError if source and destination contents are not
equivalent, or if a second pass of the formatter would format the
content differently.
"""
assert_equivalent(src_contents, dst_contents)

# Forced second pass to work around optional trailing commas (becoming
# forced trailing commas on pass 2) interacting differently with optional
# parentheses. Admittedly ugly.
dst_contents_pass2 = format_str(dst_contents, mode=mode)
if dst_contents != dst_contents_pass2:
dst_contents = dst_contents_pass2
assert_equivalent(src_contents, dst_contents, pass_num=2)
assert_stable(src_contents, dst_contents, mode=mode)
# Note: no need to explicitly call `assert_stable` if `dst_contents` was
# the same as `dst_contents_pass2`.


def format_file_contents(src_contents: str, *, fast: bool, mode: Mode) -> FileContent:
"""Reformat contents of a file and return new contents.

Expand All @@ -829,26 +875,116 @@ def format_file_contents(src_contents: str, *, fast: bool, mode: Mode) -> FileCo
if not src_contents.strip():
raise NothingChanged

dst_contents = format_str(src_contents, mode=mode)
if mode.is_ipynb:
dst_contents = format_ipynb_string(src_contents, fast=fast, mode=mode)
else:
dst_contents = format_str(src_contents, mode=mode)
if src_contents == dst_contents:
raise NothingChanged

if not fast:
assert_equivalent(src_contents, dst_contents)

# Forced second pass to work around optional trailing commas (becoming
# forced trailing commas on pass 2) interacting differently with optional
# parentheses. Admittedly ugly.
dst_contents_pass2 = format_str(dst_contents, mode=mode)
if dst_contents != dst_contents_pass2:
dst_contents = dst_contents_pass2
assert_equivalent(src_contents, dst_contents, pass_num=2)
assert_stable(src_contents, dst_contents, mode=mode)
# Note: no need to explicitly call `assert_stable` if `dst_contents` was
# the same as `dst_contents_pass2`.
if not fast and not mode.is_ipynb:
# Jupyter notebooks will already have been checked above.
check_stability_and_equivalence(src_contents, dst_contents, mode=mode)
return dst_contents


def validate_cell(src: str) -> None:
"""Check that cell does not already contain TransformerManager transformations.

If a cell contains ``!ls``, then it'll be transformed to
``get_ipython().system('ls')``. However, if the cell originally contained
``get_ipython().system('ls')``, then it would get transformed in the same way:

>>> TransformerManager().transform_cell("get_ipython().system('ls')")
"get_ipython().system('ls')\n"
>>> TransformerManager().transform_cell("!ls")
"get_ipython().system('ls')\n"

Due to the impossibility of safely roundtripping in such situations, cells
containing transformed magics will be ignored.
"""
if any(transformed_magic in src for transformed_magic in TRANSFORMED_MAGICS):
raise NothingChanged


def format_cell(src: str, *, fast: bool, mode: Mode) -> str:
"""Format code in given cell of Jupyter notebook.

General idea is:

- if cell has trailing semicolon, remove it;
- if cell has IPython magics, mask them;
- format cell;
- reinstate IPython magics;
- reinstate trailing semicolon (if originally present);
- strip trailing newlines.

Cells with syntax errors will not be processed, as they
could potentially be automagics or multi-line magics, which
are currently not supported.
"""
validate_cell(src)
src_without_trailing_semicolon, has_trailing_semicolon = remove_trailing_semicolon(
src
)
try:
masked_src, replacements = mask_cell(src_without_trailing_semicolon)
except SyntaxError:
raise NothingChanged
masked_dst = format_str(masked_src, mode=mode)
if not fast:
check_stability_and_equivalence(masked_src, masked_dst, mode=mode)
dst_without_trailing_semicolon = unmask_cell(masked_dst, replacements)
dst = put_trailing_semicolon_back(
dst_without_trailing_semicolon, has_trailing_semicolon
)
dst = dst.rstrip("\n")
if dst == src:
raise NothingChanged
return dst


def validate_metadata(nb: MutableMapping[str, Any]) -> None:
"""If notebook is marked as non-Python, don't format it.

All notebook metadata fields are optional, see
https://nbformat.readthedocs.io/en/latest/format_description.html. So
if a notebook has empty metadata, we will try to parse it anyway.
"""
language = nb.get("metadata", {}).get("language_info", {}).get("name", None)
if language is not None and language != "python":
raise NothingChanged


def format_ipynb_string(src_contents: str, *, fast: bool, mode: Mode) -> FileContent:
"""Format Jupyter notebook.

Operate cell-by-cell, only on code cells, only for Python notebooks.
If the ``.ipynb`` originally had a trailing newline, it'll be preseved.
"""
trailing_newline = src_contents[-1] == "\n"
modified = False
nb = json.loads(src_contents)
ichard26 marked this conversation as resolved.
Show resolved Hide resolved
validate_metadata(nb)
for cell in nb["cells"]:
if cell.get("cell_type", None) == "code":
try:
src = "".join(cell["source"])
dst = format_cell(src, fast=fast, mode=mode)
except NothingChanged:
pass
else:
cell["source"] = dst.splitlines(keepends=True)
modified = True
if modified:
dst_contents = json.dumps(nb, indent=1, ensure_ascii=False)
if trailing_newline:
dst_contents = dst_contents + "\n"
return dst_contents
else:
raise NothingChanged


def format_str(src_contents: str, *, mode: Mode) -> FileContent:
"""Reformat a string and return new contents.

Expand Down
2 changes: 1 addition & 1 deletion src/black/const.py
@@ -1,4 +1,4 @@
DEFAULT_LINE_LENGTH = 88
DEFAULT_EXCLUDES = r"/(\.direnv|\.eggs|\.git|\.hg|\.mypy_cache|\.nox|\.tox|\.venv|venv|\.svn|_build|buck-out|build|dist)/" # noqa: B950
DEFAULT_INCLUDES = r"\.pyi?$"
DEFAULT_INCLUDES = r"(\.pyi?|\.ipynb)$"
STDIN_PLACEHOLDER = "__BLACK_STDIN_FILENAME__"