Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change to use micromark #536

Merged
merged 11 commits into from Oct 13, 2020
Merged

Change to use micromark #536

merged 11 commits into from Oct 13, 2020

Conversation

wooorm
Copy link
Member

@wooorm wooorm commented Oct 1, 2020

This is a giant change for remark. It replaces the 5+ year old internals with a new low-level parser: https://github.com/micromark/micromark. The old internals have served billions of users well over the years, but markdown has changed over that time. micromark comes with 100% CommonMark (and GFM as an extension) compliance, and (WIP) docs on parsing rules for how to tokenize markdown with a state machine: https://github.com/micromark/common-markup-state-machine. micromark, and micromark in remark, is a good base for the future.

remark-parse

remark-parse now defers its work to micromark and mdast-util-from-markdown. micromark is a new, small, complete, and CommonMark compliant low-level markdown parser. from-markdown turns its tokens into the previously (and still) used syntax tree: mdast. Extensions to remark-parse work differently: they’re a two-part act. See for example micromark-extension-footnote and mdast-util-footnote.

  • change: commonmark is no longer an option — it’s the default
  • move: gfm is no longer an option — moved to remark-gfm
  • remove: pedantic is no longer an option — this legacy and buggy flavor of markdown is no longer widely used
  • remove: blocks is no longer an options — it’s no longer suggested to change the internal list of HTML “block” tag names

remark-stringify

remark-stringify now defers its work to mdast-util-to-markdown. It’s a new and better serializer with powerful features to ensure serialized markdown represents the syntax tree (mdast), no matter what plugins do. Extensions to it work differently: see for example mdast-util-footnote.

options
  • change: commonmark is no longer an option, it’s the default
  • change: emphasis now defaults to *
  • change: bullet now defaults to *
  • move: gfm is no longer an option — moved to remark-gfm
  • move: tableCellPadding — moved to remark-gfm
  • move: tablePipeAlign — moved to remark-gfm
  • move: stringLength — moved to remark-gfm
  • remove: pedantic is no longer an option — this legacy and buggy flavor of markdown is no longer widely used
  • remove: entities is no longer an option — with CommonMark there is almost never a need to use character references, as character escapes are preferred
  • new: quote — you can now prefer single quotes (') over double quotes (") in titles

Changes to output / the tree

All of these are for CommonMark compatibility. They’re all fixes. Most of them are inconsequential to most folks.

  • notable: references (as in, links [text][id] and images ![alt][id]) are no longer present as such in the syntax tree if they don’t have a corresponding definition ([id]: example.com). The reason for this is that CommonMark requires [text *emphasis start][undefined] emphasis end* to be emphasis.
  • notable: it is no longer possible to use two blank lines between two lists or a list and indented code. CommonMark prohibits it. For a solution, use an empty comment to end lists (<!---->)
  • inconsequential: whitespace at the start and end of lines in paragraphs is now ignored
  • inconsequential: <mailto:foobarbaz> are now correctly parsed, and the scheme is part of the tree
  • inconsequential: indented code can now follow a block quote w/o blank line
  • inconsequential: trailing indented blank lines after indented code are no longer part of that code
  • inconsequential: character references and escapes are no longer present as separate text nodes
  • inconsequential: character references which HTML allows but CommonMark doesn’t, such as &copy w/o the semicolon, are no longer recognized
  • inconsequential: the indent field is no longer available on position
  • fix: multiline setext headings
  • fix: lazy lists
  • fix: attention (emphasis, strong)
  • fix: tabs
  • fix: empty alt on images is now present as an empty string
  • …plus a ton of other minor previous differences from CommonMark

For now

  • get folks to use this and report problems!

Up next

  • make remark-gfm
  • start making next branches for plugins
  • get types into {from,to}-markdown and use them here

Closes

Closes GH-218.
Closes GH-306.
Closes GH-315.
Closes GH-324.
Closes GH-398.
Closes GH-402.
Closes GH-407.
Closes GH-439.
Closes GH-450.
Closes GH-459.
Closes GH-493.
Closes GH-494.
Closes GH-497.
Closes GH-504.
Closes GH-517.
Closes GH-521.
Closes GH-523.

Closes remarkjs/remark-lint#111.

Thanks

Thanks to Salesforce, Gatsby, Vercel, and Netlify, and our other backers for sponsoring the work on micromark!
To support our continued work, back us on OpenCollective!

This is a giant change for remark.
It replaces the 5+ year old internals with a new low-level parser:
<https://github.com/micromark/micromark>
The old internals have served billions of users well over the years, but
markdown has changed over that time.
micromark comes with 100% CommonMark (and GFM as an extension) compliance,
and (WIP) docs on parsing rules for how to tokenize markdown with a state
machine: <https://github.com/micromark/common-markup-state-machine>.
micromark, and micromark in remark, is a good base for the future.

`remark-parse` now defers its work to [`micromark`][micromark] and
[`mdast-util-from-markdown`][from-markdown].
`micromark` is a new, small, complete, and CommonMark compliant low-level
markdown parser.
`from-markdown` turns its tokens into the previously (and still) used syntax
tree: [mdast][].
Extensions to `remark-parse` work differently: they’re a two-part act.
See for example [`micromark-extension-footnote`][micromark-footnote] and
[`mdast-util-footnote`][from-markdown-footnote].

* change: `commonmark` is no longer an option — it’s the default
* move: `gfm` is no longer an option — moved to `remark-gfm`
* remove: `pedantic` is no longer an option — this legacy and buggy flavor of
  markdown is no longer widely used
* remove: `blocks` is no longer an options — it’s no longer suggested to
  change the internal list of HTML “block” tag names

remark-stringify now defers its work to [`mdast-util-to-markdown`][to-markdown].
It’s a new and better serializer with powerful features to ensure serialized
markdown represents the syntax tree (mdast), no matter what plugins do.
Extensions to it work differently: see for example
[`mdast-util-footnote`][to-markdown-footnote].

* change: `commonmark` is no longer an option, it’s the default
* change: `emphasis` now defaults to `*`
* change: `bullet` now defaults to `*`
* move: `gfm` is no longer an option — moved to `remark-gfm`
* move: `tableCellPadding` — moved to `remark-gfm`
* move: `tablePipeAlign` — moved to `remark-gfm`
* move: `stringLength` — moved to `remark-gfm`
* remove: `pedantic` is no longer an option — this legacy and buggy flavor of
  markdown is no longer widely used
* remove: `entities` is no longer an option — with CommonMark there is almost
  never a need to use character references, as character escapes are preferred
* new: `quote` — you can now prefer single quotes (`'`) over double quotes
  (`"`) in titles

All of these are for CommonMark compatibility.
Most of them are inconsequential.

* **notable**: references (as in, links `[text][id]` and images `![alt][id]`)
  are no longer present as such in the syntax tree if they don’t have a
  corresponding definition (`[id]: example.com`).
  The reason for this is that CommonMark requires `[text *emphasis
  start][undefined] emphasis end*` to be emphasis.
* **notable**: it is no longer possible to use two blank lines between two
  lists or a list and indented code.
  CommonMark prohibits it.
  For a solution, use an empty comment to end lists (`<!---->`)
* inconsequential: whitespace at the start and end of lines in paragraphs is
  now ignored
* inconsequential: `<mailto:foobarbaz>` are now correctly parsed, and the
  scheme is part of the tree
* inconsequential: indented code can now follow a block quote w/o blank line
* inconsequential: trailing indented blank lines after indented code are no
  longer part of that code
* inconsequential: character references and escapes are no longer present as
  separate text nodes
* inconsequential: character references which HTML allows but CommonMark
  doesn’t, such as `&copy` w/o the semicolon, are no longer recognized
* inconsequential: the `indent` field is no longer available on `position`
* fix: multiline setext headings
* fix: lazy lists
* fix: attention (emphasis, strong)
* fix: tabs
* fix: empty alt on images is now present as an empty string
* …plus a ton of other minor previous differences from CommonMark

* get folks to use this and report problems!

* make `remark-gfm`
* start making next branches for plugins
* get types into {from,to}-markdown and use them here

Closes GH-218.
Closes GH-306.
Closes GH-315.
Closes GH-324.
Closes GH-398.
Closes GH-402.
Closes GH-407.
Closes GH-439.
Closes GH-450.
Closes GH-459.
Closes GH-493.
Closes GH-494.
Closes GH-497.
Closes GH-504.
Closes GH-517.
Closes GH-521.
Closes GH-523.

Closes remarkjs/remark-lint#111.

[micromark]: https://github.com/micromark/micromark

[from-markdown]: https://github.com/syntax-tree/mdast-util-from-markdown

[to-markdown]: https://github.com/syntax-tree/mdast-util-to-markdown

[micromark-footnote]: https://github.com/micromark/micromark-extension-footnote/blob/main/index.js

[to-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/to-markdown.js

[from-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/from-markdown.js

[mdast]: https://github.com/syntax-tree/mdast
@wooorm wooorm added 🐛 type/bug This is a problem 🦋 type/enhancement This is great to have 🧑 semver/major This is a change 🗄 area/interface This affects the public interface 🙆 yes/confirmed This is confirmed and ready to be worked on 📣 type/announcement This is meta 💬 type/discussion This is a request for comments labels Oct 1, 2020
@wooorm wooorm self-assigned this Oct 1, 2020
Copy link
Member

@BarryThePenguin BarryThePenguin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@wooorm
Copy link
Member Author

wooorm commented Oct 7, 2020

Update on the ecosystem

I checked with the community (see the referenced issues above). Most plugins are fine. I’m in contact with authors of stuff that isn’t.

Here is a breakdown of the stuff maintained in the remarkjs org.

Changes

These plugins have new versions which work with the new parser/compiler, but don’t with remark@prev.

  • remark-frontmatter
  • remark-footnotes
  • remark-gfm (new)
  • remarkjs/remark-heading-gap
  • remarkjs/remark-yaml-config
  • remarkjs/remark-comment-config
  • remark-github
  • remark-breaks
  • remarkjs/remark-gemoji
  • remarkjs/remark-math
  • remarkjs/remark-lint (depends, 8% of the tests failed so most is fine, but some subplugins received updates)

Tiny changes

These plugins received a tiny update to match commonmark, but otherwise work w/ remark@next and remark@prev the same.

  • remarkjs/remark-html
  • remarkjs/remark-rehype
  • remarkjs/remark-external-links
  • remarkjs/remark-inline-links
  • remarkjs/remark-strip-badges

No changes

These plugins did not need any update at all for remark@next

  • rehypejs/rehype-remark
  • remarkjs/remark-slug
  • remarkjs/remark-squeeze-paragraphs
  • remarkjs/remark-retext
  • remarkjs/remark-validate-links
  • remarkjs/remark-autolink-headings
  • remarkjs/strip-markdown
  • remarkjs/remark-react
  • remarkjs/remark-message-control
  • remarkjs/remark-reference-links
  • remarkjs/remark-images
  • remarkjs/remark-highlight.js
  • remarkjs/remark-unwrap-images
  • remarkjs/remark-textr
  • remarkjs/remark-license
  • remarkjs/remark-normalize-headings
  • remarkjs/remark-unlink
  • remarkjs/remark-usage
  • remarkjs/remark-defsplit
  • remarkjs/remark-embed-images
  • remarkjs/remark-midas
  • remarkjs/remark-contributors
  • remarkjs/remark-git-contributors
  • remarkjs/remark-man
  • remarkjs/remark-vdom

Archived

Not used a lot, too much time in updating:

  • remarkjs/remark-bookmarks (archived)

@wooorm wooorm merged commit 48b1278 into main Oct 13, 2020
@wooorm wooorm deleted the next branch October 13, 2020 16:30
fisker added a commit to fisker/prettier that referenced this pull request Oct 14, 2020
@wooorm wooorm added ⛵️ status/released and removed 🙆 yes/confirmed This is confirmed and ready to be worked on labels Oct 14, 2020
@wooorm
Copy link
Member Author

wooorm commented Oct 14, 2020

This is now released in remark@13.0.0

Martii added a commit to Martii/OpenUserJS.org that referenced this pull request Oct 19, 2020
* Please read their CHANGELOGs
* *remark* , *remark-strip-html* , and *strip-markdown* are on hold since they are interdependent and needs in-depth retesting. See craftzdog/remark-strip-html#2 , remarkjs/remark#536 , and remarkjs/strip-markdown@0ceb371#diff-5a831ea67cf5cf8703b0de46901ab25bd191f56b320053be9332d9a3b0d01d15
* *sanitize-html* CHANGELOG at https://github.com/apostrophecms/sanitize-html/blob/main/CHANGELOG.md#200-2020-09-23 . We don't DOM insert , pro *node* is acceptable, and we override `allowedTags` to usually match GH.
* *spdx-license-ids* is going to take some time as a bunch of new ones have been added and need to be cross-checked/restricted. On hold.
* *moment* is in "maintenance mode" and deprecated. Will address this much later.
* Delete op retested
Martii added a commit to OpenUserJS/OpenUserJS.org that referenced this pull request Oct 19, 2020
* Please read their CHANGELOGs
* *remark* , *remark-strip-html* , and *strip-markdown* are on hold since they are interdependent and needs in-depth retesting. See craftzdog/remark-strip-html#2 , remarkjs/remark#536 , and remarkjs/strip-markdown@0ceb371#diff-5a831ea67cf5cf8703b0de46901ab25bd191f56b320053be9332d9a3b0d01d15
* *sanitize-html* CHANGELOG at https://github.com/apostrophecms/sanitize-html/blob/main/CHANGELOG.md#200-2020-09-23 . We don't DOM insert , pro *node* is acceptable, and we override `allowedTags` to usually match GH.
* *spdx-license-ids* is going to take some time as a bunch of new ones have been added and need to be cross-checked/restricted. On hold.
* *moment* is in "maintenance mode" and deprecated. Will address this much later.
* Delete op retested

Auto-merge
@wooorm wooorm added the 💪 phase/solved Post is done label Aug 4, 2021
@wooorm wooorm mentioned this pull request Nov 18, 2021
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment