Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant period isn't escaped on stringify #521

Closed
cl8n opened this issue Jul 25, 2020 · 3 comments · Fixed by #536
Closed

Significant period isn't escaped on stringify #521

cl8n opened this issue Jul 25, 2020 · 3 comments · Fixed by #536
Labels
🗄 area/interface This affects the public interface 💪 phase/solved Post is done remark-stringify 👶 semver/patch This is a backwards-compatible fix 🐛 type/bug This is a problem

Comments

@cl8n
Copy link

cl8n commented Jul 25, 2020

Subject of the issue

Text that appears like an ordered list item will be stringified as a list, even if it was escaped.

Steps to reproduce

  1. Parse this paragraph:
    1\. test
  2. The parse tree should look like this, no issues here:
    root[1] (1:1-1:9, 0-8)
    └─ paragraph[3] (1:1-1:9, 0-8)
       ├─ text: "1" (1:1-1:2, 0-1)
       ├─ text: "." (1:2-1:4, 1-3)
       └─ text: " test" (1:4-1:9, 3-8)
    
  3. Stringify

Expected behaviour

The output is identical to the markdown in step 1

Actual behaviour

The escape is removed, and now the markdown represents a list instead of a paragraph

1. test
@cl8n cl8n added 🐛 type/bug This is a problem 🙉 open/needs-info This needs some more info labels Jul 25, 2020
@wooorm
Copy link
Member

wooorm commented Jul 27, 2020

Nice catch! That’s indeed a bug.

We do have code for this, but it doesn’t seem to be working 🤔

@wooorm wooorm added remark-stringify 👶 semver/patch This is a backwards-compatible fix 🗄 area/interface This affects the public interface 🙆 yes/confirmed This is confirmed and ready to be worked on 🐛 type/bug This is a problem and removed 🐛 type/bug This is a problem 🙉 open/needs-info This needs some more info labels Jul 27, 2020
@wooorm wooorm added the help wanted 🙏 This could use your insight or help label Aug 22, 2020
wooorm added a commit that referenced this issue Oct 1, 2020
This is a giant change for remark.
It replaces the 5+ year old internals with a new low-level parser:
<https://github.com/micromark/micromark>
The old internals have served billions of users well over the years, but
markdown has changed over that time.
micromark comes with 100% CommonMark (and GFM as an extension) compliance,
and (WIP) docs on parsing rules for how to tokenize markdown with a state
machine: <https://github.com/micromark/common-markup-state-machine>.
micromark, and micromark in remark, is a good base for the future.

`remark-parse` now defers its work to [`micromark`][micromark] and
[`mdast-util-from-markdown`][from-markdown].
`micromark` is a new, small, complete, and CommonMark compliant low-level
markdown parser.
`from-markdown` turns its tokens into the previously (and still) used syntax
tree: [mdast][].
Extensions to `remark-parse` work differently: they’re a two-part act.
See for example [`micromark-extension-footnote`][micromark-footnote] and
[`mdast-util-footnote`][from-markdown-footnote].

* change: `commonmark` is no longer an option — it’s the default
* move: `gfm` is no longer an option — moved to `remark-gfm`
* remove: `pedantic` is no longer an option — this legacy and buggy flavor of
  markdown is no longer widely used
* remove: `blocks` is no longer an options — it’s no longer suggested to
  change the internal list of HTML “block” tag names

remark-stringify now defers its work to [`mdast-util-to-markdown`][to-markdown].
It’s a new and better serializer with powerful features to ensure serialized
markdown represents the syntax tree (mdast), no matter what plugins do.
Extensions to it work differently: see for example
[`mdast-util-footnote`][to-markdown-footnote].

* change: `commonmark` is no longer an option, it’s the default
* change: `emphasis` now defaults to `*`
* change: `bullet` now defaults to `*`
* move: `gfm` is no longer an option — moved to `remark-gfm`
* move: `tableCellPadding` — moved to `remark-gfm`
* move: `tablePipeAlign` — moved to `remark-gfm`
* move: `stringLength` — moved to `remark-gfm`
* remove: `pedantic` is no longer an option — this legacy and buggy flavor of
  markdown is no longer widely used
* remove: `entities` is no longer an option — with CommonMark there is almost
  never a need to use character references, as character escapes are preferred
* new: `quote` — you can now prefer single quotes (`'`) over double quotes
  (`"`) in titles

All of these are for CommonMark compatibility.
Most of them are inconsequential.

* **notable**: references (as in, links `[text][id]` and images `![alt][id]`)
  are no longer present as such in the syntax tree if they don’t have a
  corresponding definition (`[id]: example.com`).
  The reason for this is that CommonMark requires `[text *emphasis
  start][undefined] emphasis end*` to be emphasis.
* **notable**: it is no longer possible to use two blank lines between two
  lists or a list and indented code.
  CommonMark prohibits it.
  For a solution, use an empty comment to end lists (`<!---->`)
* inconsequential: whitespace at the start and end of lines in paragraphs is
  now ignored
* inconsequential: `<mailto:foobarbaz>` are now correctly parsed, and the
  scheme is part of the tree
* inconsequential: indented code can now follow a block quote w/o blank line
* inconsequential: trailing indented blank lines after indented code are no
  longer part of that code
* inconsequential: character references and escapes are no longer present as
  separate text nodes
* inconsequential: character references which HTML allows but CommonMark
  doesn’t, such as `&copy` w/o the semicolon, are no longer recognized
* inconsequential: the `indent` field is no longer available on `position`
* fix: multiline setext headings
* fix: lazy lists
* fix: attention (emphasis, strong)
* fix: tabs
* fix: empty alt on images is now present as an empty string
* …plus a ton of other minor previous differences from CommonMark

* get folks to use this and report problems!

* make `remark-gfm`
* start making next branches for plugins
* get types into {from,to}-markdown and use them here

Closes GH-218.
Closes GH-306.
Closes GH-315.
Closes GH-324.
Closes GH-398.
Closes GH-402.
Closes GH-407.
Closes GH-439.
Closes GH-450.
Closes GH-459.
Closes GH-493.
Closes GH-494.
Closes GH-497.
Closes GH-504.
Closes GH-517.
Closes GH-521.
Closes GH-523.

Closes remarkjs/remark-lint#111.

[micromark]: https://github.com/micromark/micromark

[from-markdown]: https://github.com/syntax-tree/mdast-util-from-markdown

[to-markdown]: https://github.com/syntax-tree/mdast-util-to-markdown

[micromark-footnote]: https://github.com/micromark/micromark-extension-footnote/blob/main/index.js

[to-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/to-markdown.js

[from-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/from-markdown.js

[mdast]: https://github.com/syntax-tree/mdast
@wooorm
Copy link
Member

wooorm commented Oct 1, 2020

Sorry for the wait! I just wanted to share that there’s now a PR that solves this issue: #536.

wooorm added a commit that referenced this issue Oct 13, 2020
This is a giant change for remark.
It replaces the 5+ year old internals with a new low-level parser:
<https://github.com/micromark/micromark>
The old internals have served billions of users well over the years, but
markdown has changed over that time.
micromark comes with 100% CommonMark (and GFM as an extension) compliance,
and (WIP) docs on parsing rules for how to tokenize markdown with a state
machine: <https://github.com/micromark/common-markup-state-machine>.
micromark, and micromark in remark, is a good base for the future.

`remark-parse` now defers its work to [`micromark`][micromark] and
[`mdast-util-from-markdown`][from-markdown].
`micromark` is a new, small, complete, and CommonMark compliant low-level
markdown parser.
`from-markdown` turns its tokens into the previously (and still) used syntax
tree: [mdast][].
Extensions to `remark-parse` work differently: they’re a two-part act.
See for example [`micromark-extension-footnote`][micromark-footnote] and
[`mdast-util-footnote`][from-markdown-footnote].

* change: `commonmark` is no longer an option — it’s the default
* move: `gfm` is no longer an option — moved to `remark-gfm`
* remove: `pedantic` is no longer an option — this legacy and buggy flavor of
  markdown is no longer widely used
* remove: `blocks` is no longer an options — it’s no longer suggested to
  change the internal list of HTML “block” tag names

remark-stringify now defers its work to [`mdast-util-to-markdown`][to-markdown].
It’s a new and better serializer with powerful features to ensure serialized
markdown represents the syntax tree (mdast), no matter what plugins do.
Extensions to it work differently: see for example
[`mdast-util-footnote`][to-markdown-footnote].

* change: `commonmark` is no longer an option, it’s the default
* change: `emphasis` now defaults to `*`
* change: `bullet` now defaults to `*`
* move: `gfm` is no longer an option — moved to `remark-gfm`
* move: `tableCellPadding` — moved to `remark-gfm`
* move: `tablePipeAlign` — moved to `remark-gfm`
* move: `stringLength` — moved to `remark-gfm`
* remove: `pedantic` is no longer an option — this legacy and buggy flavor of
  markdown is no longer widely used
* remove: `entities` is no longer an option — with CommonMark there is almost
  never a need to use character references, as character escapes are preferred
* new: `quote` — you can now prefer single quotes (`'`) over double quotes
  (`"`) in titles

All of these are for CommonMark compatibility.
Most of them are inconsequential.

* **notable**: references (as in, links `[text][id]` and images `![alt][id]`)
  are no longer present as such in the syntax tree if they don’t have a
  corresponding definition (`[id]: example.com`).
  The reason for this is that CommonMark requires `[text *emphasis
  start][undefined] emphasis end*` to be emphasis.
* **notable**: it is no longer possible to use two blank lines between two
  lists or a list and indented code.
  CommonMark prohibits it.
  For a solution, use an empty comment to end lists (`<!---->`)
* inconsequential: whitespace at the start and end of lines in paragraphs is
  now ignored
* inconsequential: `<mailto:foobarbaz>` are now correctly parsed, and the
  scheme is part of the tree
* inconsequential: indented code can now follow a block quote w/o blank line
* inconsequential: trailing indented blank lines after indented code are no
  longer part of that code
* inconsequential: character references and escapes are no longer present as
  separate text nodes
* inconsequential: character references which HTML allows but CommonMark
  doesn’t, such as `&copy` w/o the semicolon, are no longer recognized
* inconsequential: the `indent` field is no longer available on `position`
* fix: multiline setext headings
* fix: lazy lists
* fix: attention (emphasis, strong)
* fix: tabs
* fix: empty alt on images is now present as an empty string
* …plus a ton of other minor previous differences from CommonMark

* get folks to use this and report problems!

* make `remark-gfm`
* start making next branches for plugins
* get types into {from,to}-markdown and use them here

Closes GH-218.
Closes GH-306.
Closes GH-315.
Closes GH-324.
Closes GH-398.
Closes GH-402.
Closes GH-407.
Closes GH-439.
Closes GH-450.
Closes GH-459.
Closes GH-493.
Closes GH-494.
Closes GH-497.
Closes GH-504.
Closes GH-517.
Closes GH-521.
Closes GH-523.

Closes remarkjs/remark-lint#111.

[micromark]: https://github.com/micromark/micromark

[from-markdown]: https://github.com/syntax-tree/mdast-util-from-markdown

[to-markdown]: https://github.com/syntax-tree/mdast-util-to-markdown

[micromark-footnote]: https://github.com/micromark/micromark-extension-footnote/blob/main/index.js

[to-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/to-markdown.js

[from-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/from-markdown.js

[mdast]: https://github.com/syntax-tree/mdast
@wooorm wooorm added ⛵️ status/released and removed help wanted 🙏 This could use your insight or help 🙆 yes/confirmed This is confirmed and ready to be worked on labels Oct 14, 2020
@wooorm
Copy link
Member

wooorm commented Oct 14, 2020

This is now released in remark@13.0.0

@wooorm wooorm added the 💪 phase/solved Post is done label Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🗄 area/interface This affects the public interface 💪 phase/solved Post is done remark-stringify 👶 semver/patch This is a backwards-compatible fix 🐛 type/bug This is a problem
Development

Successfully merging a pull request may close this issue.

2 participants