Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ REFACTOR: Parsing logic of Markdown links #467

Merged
merged 6 commits into from Dec 28, 2021

Conversation

chrisjsewell
Copy link
Member

This PR aims to clarify and improve the parsing of Markdown links, i.e. [Text](link)

Related issues: #193, #402, #411, #435

Current logic

  1. For both docutils and sphinx, if the link starts with a scheme prefix (e.g. http:) and that scheme is in myst_url_schemes or myst_url_schemes=None, then create an external link, e.g. `Text
  2. For both docutils and sphinx, if the link contains a # (i.e. a fragment), then also create an external link (unless the heading_anchors extension is active)
  3. If docutils, and neither of the above, then nothing is rendered, and a warning is raised for a missing reference
  4. If sphinx, and neither of the above, then create a sphinx node addnodes.pending_xref of type "myst"

After (4), the "myst" pending_xref are then intercepted in the MystReferenceResolver post-transform (after all documents have been parsed), and the reference resolved similar to the sphinx any role:
They check the reference against all reference types, taking the first hit as the target and emitting a warning, if none or multiple targets are found.
Different to the standard any role sphinx ReferenceResolver, the myst one allows for (a) matching document paths with their extensions, e.g. [Text](path/to/other.md), and (b) parsing nested syntax as part of the text, e.g. [Text **bold**](link).

there is also a little more complication, when using the heading_anchors extension, to match generated header targets, e.g. [Text](document.md#header-1)

Shortcomings

(2) I believe originally came from recommonmark, but I think it is probably too permissive and unexpected.

(3) Is definitely not ideal, for docutils only use

(4) What is currently not handled, is links to "non-sphinx" files, e.g. non RST or Markdown files like [Text](path/to/file.txt). In sphinx we would want to use the download role for this, to ensure that the referenced files are copied into the build directory and the end href points to that.

Current Improvements

This PR is in draft, but current improvements are:

  • Add a configuration myst_all_links_external=True, which simply directs for all links to be rendered as external links.
    I think this can be helpful for especially docutils use
  • Remove step (2)
  • For step (3) create a docutils internal link
  • In step (4), if the link matches a "non-sphinx" file path, then create a download_reference node.

Perhaps there can be some more configuration,
but one thing I want to balance is (a) having sensible defaults, (b) allowing for some configuration, but (c) not having too much allowed configuration, which would be confusing to basic users and also then need to be replicated in any future MyST parser implementations.

Questions

  • Is it a problem to remove (2); no longer assuming links with # are external links?

I'll expand on this a bit more at a later data, but welcome comments from interested parties: @choldgraf, @mmcky, @rowanc1, @cpitclaudel, etc

@choldgraf
Copy link
Member

This sounds pretty reasonable to me. This sounds like a nice improvement. A few thoughts:

  • Agree that #2 feels like we are trying to make too many assumptions on the user's behalf.
  • Agree that when people link directly to a file, the right behavior to assume is the {download} role
  • Presumably myst_all_links_external=True defaults to False, at least for when we're in a Sphinx context?

@chrisjsewell
Copy link
Member Author

Presumably myst_all_links_external=True defaults to False, at least for when we're in a Sphinx context?

yes, myst_all_links_external default is True

@codecov
Copy link

codecov bot commented Dec 27, 2021

Codecov Report

Merging #467 (35c3a06) into master (30c44d5) will decrease coverage by 0.06%.
The diff coverage is 89.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #467      +/-   ##
==========================================
- Coverage   90.30%   90.23%   -0.07%     
==========================================
  Files          16       16              
  Lines        1990     2017      +27     
==========================================
+ Hits         1797     1820      +23     
- Misses        193      197       +4     
Flag Coverage Δ
pytests 90.23% <89.65%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
myst_parser/__init__.py 88.57% <ø> (ø)
myst_parser/docutils_.py 78.88% <ø> (ø)
myst_parser/utils.py 81.81% <ø> (-9.10%) ⬇️
myst_parser/myst_refs.py 90.76% <72.22%> (-0.51%) ⬇️
myst_parser/docutils_renderer.py 92.63% <95.23%> (-0.23%) ⬇️
myst_parser/main.py 89.06% <100.00%> (+0.17%) ⬆️
myst_parser/sphinx_renderer.py 93.64% <100.00%> (+0.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 30c44d5...35c3a06. Read the comment docs.

@chrisjsewell chrisjsewell changed the title 👌 IMPROVE: Parsing logic of Markdown links ♻️ REFACTOR: Parsing logic of Markdown links Dec 28, 2021
@chrisjsewell
Copy link
Member Author

@chrisjsewell chrisjsewell marked this pull request as ready for review December 28, 2021 03:21
@chrisjsewell chrisjsewell merged commit 90c98aa into master Dec 28, 2021
@chrisjsewell chrisjsewell deleted the improve-link-parsing branch December 28, 2021 03:38
@choldgraf
Copy link
Member

👍

@mmcky
Copy link
Member

mmcky commented Jan 3, 2022

thanks @chrisjsewell -- I think this is a nice improvement.

I particularly like that links to non-source documents are migrated to download links. I think that is nice behaviour. 👍

@antazoey
Copy link

I used to be able to link to local HTML (post-build) files but it seems that is no longer the case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants