Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ REFACTOR: Parsing logic of Markdown links #467

Merged
merged 6 commits into from Dec 28, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/api/reference.rst
Expand Up @@ -36,7 +36,7 @@ Sphinx

.. autoclass:: myst_parser.sphinx_renderer.SphinxRenderer
:special-members: __output__
:members: handle_cross_reference, render_math_block_label
:members: render_internal_link, render_math_block_label
:undoc-members:
:member-order: alphabetical
:show-inheritance:
Expand Down
6 changes: 6 additions & 0 deletions docs/sphinx/reference.md
Expand Up @@ -19,10 +19,16 @@ To do so, use the keywords beginning `myst_`.
* - `myst_enable_extensions`
- `["dollarmath"]`
- Enable Markdown extensions, [see here](../syntax/optional.md) for details.
* - `myst_all_links_external`
- `False`
- If `True`, all Markdown links `[text](link)` are treated as external.
* - `myst_url_schemes`
- `None`
- [URI schemes](https://en.wikipedia.org/wiki/List_of_URI_schemes) that will be recognised as external URLs in `[](scheme:loc)` syntax, or set `None` to recognise all.
Other links will be resolved as internal cross-references.
* - `myst_ref_domains`
- `None`
- If a list, then only these [sphinx domains](sphinx:domain) will be searched for when resolving Markdown links like `[text](reference)`.
* - `myst_linkify_fuzzy_links`
- `True`
- If `False`, only links that contain a scheme (such as `http`) will be recognised as external links.
Expand Down
1 change: 1 addition & 0 deletions docs/syntax/example.txt
@@ -0,0 +1 @@
Hallo!
2 changes: 1 addition & 1 deletion docs/syntax/reference.md
Expand Up @@ -242,7 +242,7 @@ In addition to these summaries of inline syntax, see {ref}`extra-markdown-syntax
![alt](src "title")
```
* - Link
- Reference `LinkDefinitions`
- Reference `LinkDefinitions`. See {ref}`syntax/referencing` for more details.
- ```md
[text](target "title") or [text][key]
```
Expand Down
33 changes: 33 additions & 0 deletions docs/syntax/syntax.md
Expand Up @@ -518,6 +518,39 @@ Is below, but it won't be parsed into the document.

+++

(syntax/referencing)=

## Markdown Links and Referencing

Markdown links are of the form: `[text](link)`.

If you set the configuration `myst_all_links_external = True` (`False` by default),
then all links will be treated simply as "external" links.
For example, in HTML outputs, `[text](link)` will be rendered as `<a href="link">text</a>`.

Otherwise, links will only be treated as "external" links if they are prefixed with a scheme,
configured with `myst_url_schemes` (by default, `http`, `https`, `ftp`, or `mailto`).
For example, `[example.com](https://example.com)` becomes [example.com](https://example.com).

:::{note}
The `text` will be parsed as nested Markdown, for example `[here's some *emphasised text*](https://example.com)` will be parsed as [here's some *emphasised text*](https://example.com).
:::

For "internal" links, myst-parser in Sphinx will attempt to resolve the reference to either a relative document path, or a cross-reference to a target (see [](syntax/targets)):

- `[this doc](syntax.md)` will link to a rendered source document: [this doc](syntax.md)
- This is similar to `` {doc}`this doc <syntax>` ``; {doc}`this doc <syntax>`, but allows for document extensions, and parses nested Markdown text.
- `[example text](example.txt)` will link to a non-source (downloadable) file: [example text](example.txt)
- The linked document itself will be copied to the build directory.
- This is similar to `` {download}`example text <example.txt>` ``; {download}`example text <example.txt>`, but parses nested Markdown text.
- `[reference](syntax/referencing)` will link to an internal cross-reference: [reference](syntax/referencing)
- This is similar to `` {any}`reference <syntax/referencing>` ``; {any}`reference <syntax/referencing>`, but parses nested Markdown text.
- You can limit the scope of the cross-reference to specific [sphinx domains](sphinx:domain), by using the `myst_ref_domains` configuration.
For example, `myst_ref_domains = ("std", "py")` will only allow cross-references to `std` and `py` domains.

Additionally, only if [](syntax/header-anchors) are enabled, then internal links to document headers can be used.
For example `[a header](syntax.md#markdown-links-and-referencing)` will link to a header anchor: [a header](syntax.md#markdown-links-and-referencing).

(syntax/targets)=

## Targets and Cross-Referencing
Expand Down
1 change: 1 addition & 0 deletions myst_parser/__init__.py
Expand Up @@ -35,6 +35,7 @@ def setup_sphinx(app: "Sphinx"):

for name, default in MdParserConfig().as_dict().items():
if not name == "renderer":
# TODO add types?
app.add_config_value(f"myst_{name}", default, "env")

app.connect("builder-inited", create_myst_config)
Expand Down
4 changes: 2 additions & 2 deletions myst_parser/docutils_.py
Expand Up @@ -58,9 +58,9 @@ def __repr__(self):
"substitutions",
# we can't add substitutions so not needed
"sub_delimiters",
# heading anchors are currently sphinx only
# sphinx only options
"heading_anchors",
# sphinx.ext.mathjax only options
"ref_domains",
"update_mathjax",
"mathjax_classes",
# We don't want to set the renderer from docutils.conf
Expand Down
87 changes: 53 additions & 34 deletions myst_parser/docutils_renderer.py
Expand Up @@ -19,6 +19,7 @@
Union,
cast,
)
from urllib.parse import urlparse

import jinja2
import yaml
Expand Down Expand Up @@ -526,51 +527,68 @@ def render_heading(self, token: SyntaxTreeNode) -> None:
self.current_node = section

def render_link(self, token: SyntaxTreeNode) -> None:
"""Parse `<http://link.com>` or `[text](link "title")` syntax to docutils AST:

- If `<>` autolink, forward to `render_autolink`
- If `myst_all_links_external` is True, forward to `render_external_url`
- If link is an external URL, forward to `render_external_url`
- External URLs start with a scheme (e.g. `http:`) in `myst_url_schemes`,
or any scheme if `myst_url_schemes` is None.
- Otherwise, forward to `render_internal_link`
"""
if token.markup == "autolink":
return self.render_autolink(token)

if self.config.get("myst_all_links_external", False):
return self.render_external_url(token)

# Check for external URL
url_scheme = urlparse(cast(str, token.attrGet("href") or "")).scheme
allowed_url_schemes = self.config.get("myst_url_schemes", None)
if (allowed_url_schemes is None and url_scheme) or (
url_scheme in allowed_url_schemes
):
return self.render_external_url(token)

return self.render_internal_link(token)

def render_external_url(self, token: SyntaxTreeNode) -> None:
"""Render link token `[text](link "title")`,
where the link has been identified as an external URL::

<reference refuri="link" title="title">
text

`text` can contain nested syntax, e.g. `[**bold**](url "title")`.
"""
ref_node = nodes.reference()
self.add_line_and_source_path(ref_node, token)
destination = cast(str, token.attrGet("href") or "")
ref_node["refuri"] = cast(str, token.attrGet("href") or "")
title = token.attrGet("title")
if title:
ref_node["title"] = title
with self.current_node_context(ref_node, append=True):
self.render_children(token)

if self.config.get(
"relative-docs", None
) is not None and destination.startswith(self.config["relative-docs"][0]):
# make the path relative to an "including" document
source_dir, include_dir = self.config["relative-docs"][1:]
destination = os.path.relpath(
os.path.join(include_dir, os.path.normpath(destination)), source_dir
)
def render_internal_link(self, token: SyntaxTreeNode) -> None:
"""Render link token `[text](link "title")`,
where the link has not been identified as an external URL::

<reference refname="link" title="title">
text

ref_node["refuri"] = destination
`text` can contain nested syntax, e.g. `[**bold**](link "title")`.

Note, this is overridden by `SphinxRenderer`, to use `pending_xref` nodes.
"""
ref_node = nodes.reference()
self.add_line_and_source_path(ref_node, token)
ref_node["refname"] = cast(str, token.attrGet("href") or "")
title = token.attrGet("title")
if title:
ref_node["title"] = title
next_node = ref_node

# TODO currently any reference with a fragment # is deemed external
# (if anchors are not enabled)
# This comes from recommonmark, but I am not sure of the rationale for it
if is_external_url(
destination,
self.config.get("myst_url_schemes", None),
"heading_anchors" not in self.config.get("myst_extensions", []),
):
self.current_node.append(next_node)
with self.current_node_context(ref_node):
self.render_children(token)
else:
self.handle_cross_reference(token, destination)

def handle_cross_reference(self, token: SyntaxTreeNode, destination: str) -> None:
if not self.config.get("ignore_missing_refs", False):
self.create_warning(
f"Reference not found: {destination}",
line=token_line(token),
subtype="ref",
append_to=self.current_node,
)
with self.current_node_context(ref_node, append=True):
self.render_children(token)

def render_autolink(self, token: SyntaxTreeNode) -> None:
refuri = target = escapeHtml(token.attrGet("href") or "") # type: ignore[arg-type]
Expand All @@ -594,6 +612,7 @@ def render_image(self, token: SyntaxTreeNode) -> None:
destination, None, True
):
# make the path relative to an "including" document
# this is set when using the `relative-images` option of the MyST `include` directive
destination = os.path.normpath(
os.path.join(
self.config.get("relative-images", ""),
Expand Down
15 changes: 14 additions & 1 deletion myst_parser/main.py
Expand Up @@ -115,11 +115,23 @@ def check_extensions(self, attribute, value):
metadata={"help": "Disable syntax elements"},
)

all_links_external: bool = attr.ib(
default=False,
validator=instance_of(bool),
metadata={"help": "Parse all links as simple hyperlinks"},
)

# see https://en.wikipedia.org/wiki/List_of_URI_schemes
url_schemes: Optional[Iterable[str]] = attr.ib(
default=cast(Optional[Iterable[str]], ("http", "https", "mailto", "ftp")),
validator=optional(deep_iterable(instance_of(str), instance_of((list, tuple)))),
metadata={"help": "URL schemes to allow in links"},
metadata={"help": "URL scheme prefixes identified as external links"},
)

ref_domains: Optional[Iterable[str]] = attr.ib(
default=None,
validator=optional(deep_iterable(instance_of(str), instance_of((list, tuple)))),
metadata={"help": "Sphinx domain names to search in for references"},
)

heading_anchors: Optional[int] = attr.ib(
Expand Down Expand Up @@ -273,6 +285,7 @@ def default_parser(config: MdParserConfig) -> MarkdownIt:
list(config.enable_extensions)
+ (["heading_anchors"] if config.heading_anchors is not None else [])
),
"myst_all_links_external": config.all_links_external,
"myst_url_schemes": config.url_schemes,
"myst_substitutions": config.substitutions,
"myst_html_meta": config.html_meta,
Expand Down
66 changes: 38 additions & 28 deletions myst_parser/myst_refs.py
Expand Up @@ -42,7 +42,6 @@ def run(self, **kwargs: Any) -> None:
contnode = cast(nodes.TextElement, node[0].deepcopy())
newnode = None

typ = node["reftype"]
target = node["reftarget"]
refdoc = node.get("refdoc", self.env.docname)
domain = None
Expand All @@ -54,23 +53,29 @@ def run(self, **kwargs: Any) -> None:
# but first we change the the reftype to 'any'
# this means it is picked up by extensions like intersphinx
node["reftype"] = "any"
newnode = self.app.emit_firstresult(
"missing-reference",
self.env,
node,
contnode,
**(
{"allowed_exceptions": (NoUri,)}
if version_info[0] > 2
else {}
),
)
node["reftype"] = "myst"
try:
newnode = self.app.emit_firstresult(
"missing-reference",
self.env,
node,
contnode,
**(
{"allowed_exceptions": (NoUri,)}
if version_info[0] > 2
else {}
),
)
finally:
node["reftype"] = "myst"
# still not found? warn if node wishes to be warned about or
# we are in nit-picky mode
if newnode is None:
node["refdomain"] = ""
self.warn_missing_reference(refdoc, typ, target, node, domain)
# TODO ideally we would override the warning message here,
# to show the [ref.myst] for supressing warning
self.warn_missing_reference(
refdoc, node["reftype"], target, node, domain
)
except NoUri:
newnode = contnode

Expand Down Expand Up @@ -109,25 +114,30 @@ def resolve_myst_ref(
if res:
results.append(("std:doc", res))

# get allowed domains for referencing
ref_domains = self.env.config.myst_ref_domains

# next resolve for any other standard reference objects
stddomain = cast(StandardDomain, self.env.get_domain("std"))
for objtype in stddomain.object_types:
key = (objtype, target)
if objtype == "term":
key = (objtype, target.lower())
if key in stddomain.objects:
docname, labelid = stddomain.objects[key]
domain_role = "std:" + stddomain.role_for_objtype(objtype)
ref_node = make_refnode(
self.app.builder, refdoc, docname, labelid, contnode
)
results.append((domain_role, ref_node))
if ref_domains is None or "std" in ref_domains:
stddomain = cast(StandardDomain, self.env.get_domain("std"))
for objtype in stddomain.object_types:
key = (objtype, target)
if objtype == "term":
key = (objtype, target.lower())
if key in stddomain.objects:
docname, labelid = stddomain.objects[key]
domain_role = "std:" + stddomain.role_for_objtype(objtype)
ref_node = make_refnode(
self.app.builder, refdoc, docname, labelid, contnode
)
results.append((domain_role, ref_node))

# finally resolve for any other type of reference
# TODO do we want to restrict this at all?
# finally resolve for any other type of allowed reference domain
for domain in self.env.domains.values():
if domain.name == "std":
continue # we did this one already
if ref_domains is not None and domain.name not in ref_domains:
continue
try:
results.extend(
domain.resolve_any_xref(
Expand Down