Skip to content

Commit

Permalink
馃憣 IMPROVE: Code block highlighting (#478)
Browse files Browse the repository at this point in the history
In markdown a code block (a.k.a fence) is of the form:

````markdown
```language
source text
```
````

MyST-Parser mimics the `code-block` directive to render these blocks:

In sphinx, the lexer name is only recorded as the `language` attribute,
and the text is lexed later by pygments within the `visit_literal_block`
method of the output format ``SphinxTranslator``.
This is the current logic. 

However, in docutils, this directive directly parses the text with the pygments lexer, if syntax highlighting is enabled (the default). This was not handled.

Both cases are now handled, and additionally the following configuration are added:

- `myst_highlight_code_blocks` (docutils only): If True (default) use pygments to create lexical tokens for the given language, otherwise skip lexical analysis
- `myst_number_code_blocks`: A list of languages to add line numbers to
  • Loading branch information
chrisjsewell committed Dec 29, 2021
1 parent 6c44075 commit 2b3a931
Show file tree
Hide file tree
Showing 18 changed files with 1,124 additions and 56 deletions.
1 change: 1 addition & 0 deletions .github/workflows/tests.yml
Expand Up @@ -67,6 +67,7 @@ jobs:
runs-on: ubuntu-latest

strategy:
fail-fast: false
matrix:
docutils-version: ["0.16", "0.17", "0.18"]

Expand Down
1 change: 1 addition & 0 deletions docs/conf.py
Expand Up @@ -88,6 +88,7 @@
"substitution",
"tasklist",
]
myst_number_code_blocks = ["typescript"]
myst_heading_anchors = 2
myst_footnote_transition = True
myst_dmath_double_inline = True
Expand Down
3 changes: 3 additions & 0 deletions docs/sphinx/reference.md
Expand Up @@ -38,6 +38,9 @@ To do so, use the keywords beginning `myst_`.
* - `myst_heading_slug_func`
- `None`
- Use the specified function to auto-generate heading anchors, [see here](syntax/header-anchors) for details.
* - `myst_number_code_blocks`
- `()`
- Add line numbers to code blocks with these languages, [see here](syntax/code-blocks) for details.
* - `myst_substitutions`
- `{}`
- A mapping of keys to substitutions, used globally for all MyST documents when the "substitution" extension is enabled.
Expand Down
8 changes: 4 additions & 4 deletions docs/syntax/reference.md
Expand Up @@ -120,12 +120,13 @@ we have shown equivalent rST syntax for many MyST markdown features below.
======
```
* - Quote
- quoted text
- Quoted text
- ```md
> this is a quote
```
* - CodeFence
- enclosed in 3 or more backticks with an optional language name
- Enclosed in 3 or more `` ` `` or `~` with an optional language name.
See {ref}`syntax/code-blocks` for more information.
- ````md
```python
print('this is python')
Expand Down Expand Up @@ -176,8 +177,7 @@ In addition to these summaries of inline syntax, see {ref}`extra-markdown-syntax
- Description
- Example
* - Role
- See {ref}`syntax/roles` for more
information.
- See {ref}`syntax/roles` for more information.
- ```md
{rolename}`interpreted text`
```
Expand Down
89 changes: 59 additions & 30 deletions docs/syntax/syntax.md
Expand Up @@ -610,6 +610,65 @@ leave the "text" section of the markdown link empty. For example, this
markdown: `[](syntax.md)` will result in: [](syntax.md).
```

(syntax/code-blocks)=
## Code blocks

Code blocks contain a language identifier, which is used to determine the language of the code.
This language is used to determine the syntax highlighting, using an available [pygments lexer](https://pygments.org/docs/lexers/).

````markdown
```python
from a import b
c = "string"
```
````

```python
from a import b
c = "string"
```

You can create and register your own lexer, using the [`pygments.lexers` entry point](https://pygments.org/docs/plugins/#register-plugins),
or within a sphinx extension, with the [`app.add_lexer` method](sphinx:sphinx.application.Sphinx.add_lexer).

Using the `myst_number_code_blocks` configuration option, you can also control whether code blocks are numbered by line.
For example, using `myst_number_code_blocks = ["typescript"]`:

```typescript
type MyBool = true | false;

interface User {
name: string;
id: number;
}
```

### Show backticks inside raw markdown blocks

If you'd like to show backticks inside of your markdown, you can do so by nesting them
in backticks of a greater length. Markdown will treat the outer-most backticks as the
edges of the "raw" block and everything inside will show up. For example:

``` `` `hi` `` ``` will be rendered as: `` `hi` ``

and

`````
````
```
hi
```
````
`````

will be rendered as:

````
```
hi
```
````

## Tables

Tables can be written using the standard [Github Flavoured Markdown syntax](https://github.github.com/gfm/#tables-extension-):
Expand Down Expand Up @@ -746,33 +805,3 @@ This is because, in the current implementation, they may not be available to ref

By default, a transition line (with a `footnotes` class) will be placed before any footnotes.
This can be turned off by adding `myst_footnote_transition = False` to the config file.


## Code blocks


### Show backticks inside raw markdown blocks

If you'd like to show backticks inside of your markdown, you can do so by nesting them
in backticks of a greater length. Markdown will treat the outer-most backticks as the
edges of the "raw" block and everything inside will show up. For example:

``` `` `hi` `` ``` will be rendered as: `` `hi` ``

and

`````
````
```
hi
```
````
`````

will be rendered as:

````
```
hi
```
````
11 changes: 6 additions & 5 deletions myst_parser/__init__.py
Expand Up @@ -33,9 +33,10 @@ def setup_sphinx(app: "Sphinx"):

app.add_post_transform(MystReferenceResolver)

for name, default in MdParserConfig().as_dict().items():
# TODO add types?
app.add_config_value(f"myst_{name}", default, "env")
for name, default, field in MdParserConfig().as_triple():
if not field.metadata.get("docutils_only", False):
# TODO add types?
app.add_config_value(f"myst_{name}", default, "env")

app.connect("builder-inited", create_myst_config)
app.connect("builder-inited", override_mathjax)
Expand All @@ -53,8 +54,8 @@ def create_myst_config(app):

values = {
name: app.config[f"myst_{name}"]
for name in MdParserConfig().as_dict().keys()
if name != "renderer"
for name, _, field in MdParserConfig().as_triple()
if not field.metadata.get("docutils_only", False)
}

try:
Expand Down
103 changes: 89 additions & 14 deletions myst_parser/docutils_renderer.py
Expand Up @@ -34,6 +34,7 @@
from docutils.statemachine import StringList
from docutils.transforms.components import Filter
from docutils.utils import Reporter, new_document
from docutils.utils.code_analyzer import Lexer, LexerError, NumberLines
from markdown_it import MarkdownIt
from markdown_it.common.utils import escapeHtml
from markdown_it.renderer import RendererProtocol
Expand Down Expand Up @@ -427,14 +428,84 @@ def render_code_inline(self, token: SyntaxTreeNode) -> None:
self.add_line_and_source_path(node, token)
self.current_node.append(node)

def create_highlighted_code_block(
self,
text: str,
lexer_name: str,
number_lines: bool = False,
lineno_start: int = 1,
source: Optional[str] = None,
line: Optional[int] = None,
) -> nodes.literal_block:
"""Create a literal block with syntax highlighting.
This mimics the behaviour of the `code-block` directive.
In docutils, this directive directly parses the text with the pygments lexer,
whereas in sphinx, the lexer name is only recorded as the `language` attribute,
and the text is lexed later by pygments within the `visit_literal_block`
method of the output format ``SphinxTranslator``.
Note, this function does not add the literal block to the document.
"""
if self.sphinx_env is not None:
node = nodes.literal_block(text, text, language=lexer_name)
if number_lines:
node["linenos"] = True
if lineno_start != 1:
node["highlight_args"] = {"linenostart": lineno_start}
else:
node = nodes.literal_block(
text, classes=["code"] + ([lexer_name] if lexer_name else [])
)
try:
lex_tokens = Lexer(
text,
lexer_name,
"short"
if self.config.get("myst_highlight_code_blocks", True)
else "none",
)
except LexerError as err:
self.reporter.warning(
str(err),
**{
name: value
for name, value in (("source", source), ("line", line))
if value is not None
},
)
lex_tokens = Lexer(text, lexer_name, "none")

if number_lines:
lex_tokens = NumberLines(
lex_tokens, lineno_start, lineno_start + len(text.splitlines())
)

for classes, value in lex_tokens:
if classes:
node += nodes.inline(value, value, classes=classes)
else:
# insert as Text to decrease the verbosity of the output
node += nodes.Text(value)

if source is not None:
node.source = source
if line is not None:
node.line = line
return node

def render_code_block(self, token: SyntaxTreeNode) -> None:
# this should never have a language, since it is just indented text, however,
# creating a literal_block with no language will raise a warning in sphinx
text = token.content
language = token.info.split()[0] if token.info else "none"
language = language or "none"
node = nodes.literal_block(text, text, language=language)
self.add_line_and_source_path(node, token)
lexer = token.info.split()[0] if token.info else None
lexer = lexer or "none"
node = self.create_highlighted_code_block(
token.content,
lexer,
source=self.document["source"],
line=token_line(token, 0) or None,
)
self.current_node.append(node)

def render_fence(self, token: SyntaxTreeNode) -> None:
Expand Down Expand Up @@ -465,16 +536,20 @@ def render_fence(self, token: SyntaxTreeNode) -> None:
):
return self.render_directive(token)

if not language:
if self.sphinx_env is not None:
language = self.sphinx_env.temp_data.get(
"highlight_language", self.sphinx_env.config.highlight_language
)
if not language and self.sphinx_env is not None:
# use the current highlight setting, via the ``highlight`` directive,
# or ``highlight_language`` configuration.
language = self.sphinx_env.temp_data.get(
"highlight_language", self.sphinx_env.config.highlight_language
)

if not language:
language = self.config.get("highlight_language", "")
node = nodes.literal_block(text, text, language=language)
self.add_line_and_source_path(node, token)
node = self.create_highlighted_code_block(
text,
language,
number_lines=language in self.config.get("myst_number_code_blocks", ()),
source=self.document["source"],
line=token_line(token, 0) or None,
)
self.current_node.append(node)

@property
Expand Down
27 changes: 26 additions & 1 deletion myst_parser/main.py
@@ -1,4 +1,4 @@
from typing import Callable, Dict, Iterable, Optional, Tuple, Union, cast
from typing import Any, Callable, Dict, Iterable, Optional, Sequence, Tuple, Union, cast

import attr
from attr.validators import (
Expand Down Expand Up @@ -131,6 +131,21 @@ def check_extensions(self, attribute, value):
metadata={"help": "Sphinx domain names to search in for references"},
)

highlight_code_blocks: bool = attr.ib(
default=True,
validator=instance_of(bool),
metadata={
"help": "Syntax highlight code blocks with pygments",
"docutils_only": True,
},
)

number_code_blocks: Sequence[str] = attr.ib(
default=(),
validator=deep_iterable(instance_of(str)),
metadata={"help": "Add line numbers to code blocks with these languages"},
)

heading_anchors: Optional[int] = attr.ib(
default=None,
validator=optional(in_([1, 2, 3, 4, 5, 6, 7])),
Expand Down Expand Up @@ -187,11 +202,19 @@ def check_sub_delimiters(self, attribute, value):

@classmethod
def get_fields(cls) -> Tuple[attr.Attribute, ...]:
"""Return all attribute fields in this class."""
return attr.fields(cls)

def as_dict(self, dict_factory=dict) -> dict:
"""Return a dictionary of field name -> value."""
return attr.asdict(self, dict_factory=dict_factory)

def as_triple(self) -> Iterable[Tuple[str, Any, attr.Attribute]]:
"""Yield triples of (name, value, field)."""
fields = attr.fields_dict(self.__class__)
for name, value in attr.asdict(self).items():
yield name, value, fields[name]


def default_parser(config: MdParserConfig):
raise NotImplementedError(
Expand Down Expand Up @@ -282,6 +305,8 @@ def create_md_parser(
"myst_substitutions": config.substitutions,
"myst_html_meta": config.html_meta,
"myst_footnote_transition": config.footnote_transition,
"myst_number_code_blocks": config.number_code_blocks,
"myst_highlight_code_blocks": config.highlight_code_blocks,
}
)

Expand Down

0 comments on commit 2b3a931

Please sign in to comment.