fix: Total rework of Emphasis/Strong #1864

calculuschild · 2020-12-07T04:19:38Z

Description

Fixes em and strong (***〜***) #1860, Fixes Asterisks are not properly escaped #1811
Also fixes Commonmark/GFM examples:
- (em & strong) 361, 387, 388, 407, 412, 415, 416, 424, 425, 442, 445, 446, 453, 455, 456, 457, 465, 466, 467, 470
- This puts us up to 100% compatibility with commonmark specs!
Noticeable speedup, especially on the GFM benchmark (~8.7 sec -> ~8.1 sec, pretty consistent over 5 runs on my laptop)

What was attempted

Simplify regex for em & strong, combined now into a single tokenizer
When masking the src string in Lexer, also mask out escaped \* and \_ which further simplifies a lot of regex
Track total opening delimiter characters vs closing characters, and ensure they match
More closely follow CommonMark spec:
- Favor text over text
- Correct some of the "New" spec tests that had this ^ swapped incorrectly (and delete one that is redundant now)
- Handle em/strong CommonMark rules 9-10, that left and right delimiters cannot sum to a multiple of 3, unless each is a multiple of 3
- Handle cases with lots of extra unmatched delimiters, e.g. *text*********

Note this involves significant changes in the Lexer and Tokenizer APIs, which should be noted in the update.

The new Regex should be pretty benign compared to the earlier stuff. It literally checks for sequences of the pattern a***b, that is, runs of * or _ between a single character on each side.

Contributor

Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
no tests required for this PR.
If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

CI is green (no forced merge required).
Squash and Merge PR following conventional commit guidelines.

vercel · 2020-12-07T04:19:44Z

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/markedjs/markedjs/7lc52k1xy
✅ Preview: https://markedjs-git-fork-calculuschild-emstrongrework.markedjs.vercel.app

These tests look like they existed solely to cover the CommonMark examples with Strong and Em together that Marked wasn't passing because it output them backwards: `` instead of ``. This is no longer necessary.

UziTech · 2020-12-07T05:21:36Z

src/Lexer.js

-      // em
-      if (token = this.tokenizer.em(src, maskedSrc, prevChar)) {
+      // em & strong
+      if (token = this.tokenizer.emStrong(src, maskedSrc, prevChar)) {


This would definitely be a breaking change since the tokenizers are part of the public API. Can we do this without combining them? Can we just switch the order of em and strong to get a to switch to a?

Unfortunately, they kind of need to be tackled together to get the right sequence of  to work, which isn't just a stylistic thing. Even though it renders the same as , the processing to get to that point also clears up several other bugs, especially regarding uneven **text***** delimiters on both sides.

Edit for clarification: processing em/strong in this way allows following more of the CommonMark specs in a "natural" way that I think will be much easier to maintain (instead of a monstrous, fiddly regex). However, this also means you don't really know if the output is going to be an em or a strong until the very end of the process (see the very end of the Tokenizer).

This might be worth putting into a v2.

After researching quite a few dependants I think it should be fine to combine them since most dependants will change the renderer instead of the tokenizer. This will have to be a major bump to v2 though. I do want to get a few other breaking changes together before releasing v2 so it might be a while before I get to fully reviewing this PR.

That sounds appropriate. I need to review the other PRs you have waiting as well that should go out before this anyway...

There are some other changes I've seen in the issues list that I'd like to lump into a v2 bump as well.

lib/marked.js

lib/marked.esm.js

src/Tokenizer.js

Co-authored-by: Steven <steven@ceriously.com>

into EmStrongRework

UziTech · 2021-02-05T04:22:30Z

@styfle We should get this and #1926 released as v2 soon. This PR fixes the security issue in #1927

# [2.0.0](v1.2.9...v2.0.0) (2021-02-07) ### Bug Fixes * Join adjacent inlineText tokens ([#1926](#1926)) ([f848e77](f848e77)) * Total rework of Emphasis/Strong ([#1864](#1864)) ([7293251](7293251)) ### BREAKING CHANGES * `em` and `strong` tokenizers have been merged into one `emStrong` tokenizer

github-actions · 2021-02-07T22:26:48Z

🎉 This PR is included in version 2.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

calculuschild added 3 commits December 6, 2020 22:06

Passing all tests

0293db5

console.log's removed & shouldfail: fixed on passing tests

eca1f23

Delete unused rules

d5471d2

vercel bot deployed to Preview December 7, 2020 04:19 View deployment

Tidying up

3f0fbb9

vercel bot deployed to Preview December 7, 2020 04:59 View deployment

Delete test already covered by CommonMark?

e9609ac

These tests look like they existed solely to cover the CommonMark examples with Strong and Em together that Marked wasn't passing because it output them backwards: `` instead of ``. This is no longer necessary.

vercel bot deployed to Preview December 7, 2020 05:03 View deployment

calculuschild requested review from UziTech, joshbruce, davisjam and styfle December 7, 2020 05:06

UziTech reviewed Dec 7, 2020

View reviewed changes

UziTech added this to In Progress in vNext via automation Dec 7, 2020

styfle reviewed Dec 7, 2020

View reviewed changes

lib/marked.js Outdated Show resolved Hide resolved

styfle reviewed Dec 7, 2020

View reviewed changes

lib/marked.esm.js Outdated Show resolved Hide resolved

Remove Libs and min.js

75c641b

vercel bot deployed to Preview December 8, 2020 01:06 View deployment

Handle non-english chars

5897ade

vercel bot deployed to Preview December 9, 2020 18:55 View deployment

styfle reviewed Dec 9, 2020

View reviewed changes

src/Tokenizer.js Outdated Show resolved Hide resolved

Pass more tests involving unbalanced extra asterisks at the end.

c76d179

vercel bot deployed to Preview December 9, 2020 20:57 View deployment

Lint

6cac4e4

vercel bot deployed to Preview December 9, 2020 20:59 View deployment

Typo

710205a

Co-authored-by: Steven <steven@ceriously.com>

vercel bot deployed to Preview December 9, 2020 21:00 View deployment

calculuschild added 2 commits December 9, 2020 16:07

Small Rules regex cleanup

0a62ab8

Merge branch 'EmStrongRework' of https://github.com/calculuschild/marked

b5accad

into EmStrongRework

UziTech approved these changes Feb 5, 2021

View reviewed changes

UziTech requested a review from styfle February 5, 2021 04:20

styfle approved these changes Feb 7, 2021

View reviewed changes

UziTech changed the title ~~Total rework of Emphasis/Strong~~ fix: Total rework of Emphasis/Strong Feb 7, 2021

UziTech merged commit 7293251 into markedjs:master Feb 7, 2021

vNext automation moved this from In Progress to Done Feb 7, 2021

github-actions bot added the released label Feb 7, 2021

alasdairhurst mentioned this pull request Feb 10, 2021

marked@2.0.0 throws SyntaxError: Invalid regular expression #1937

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Total rework of Emphasis/Strong #1864

fix: Total rework of Emphasis/Strong #1864

calculuschild commented Dec 7, 2020 •

edited

vercel bot commented Dec 7, 2020 •

edited

UziTech Dec 7, 2020

calculuschild Dec 7, 2020 •

edited

UziTech Dec 7, 2020

calculuschild Dec 7, 2020

UziTech commented Feb 5, 2021

github-actions bot commented Feb 7, 2021

fix: Total rework of Emphasis/Strong #1864

fix: Total rework of Emphasis/Strong #1864

Conversation

calculuschild commented Dec 7, 2020 • edited

Description

What was attempted

Contributor

Committer

vercel bot commented Dec 7, 2020 • edited

UziTech Dec 7, 2020

Choose a reason for hiding this comment

calculuschild Dec 7, 2020 • edited

Choose a reason for hiding this comment

UziTech Dec 7, 2020

Choose a reason for hiding this comment

calculuschild Dec 7, 2020

Choose a reason for hiding this comment

UziTech commented Feb 5, 2021

github-actions bot commented Feb 7, 2021

calculuschild commented Dec 7, 2020 •

edited

vercel bot commented Dec 7, 2020 •

edited

calculuschild Dec 7, 2020 •

edited