fix: speed up parsing long lists #2302

calculuschild · 2021-11-24T20:33:44Z

Marked version:

v3.0.0 +

Description

Fixes Performance of parse() is 15 times worse in releases after 2.1.3 #2297. The rework of Lists in v3.0.0 accidentally introduced a O(n^2) slowdown issue for long documents.

The List tokenizer was using a RegEx to capture the potential next list Item, and then split that captured text line-by-line to determine if it had proper indentation, etc. and whether each line should be part of the current list Item.

Problem is, the captured text was literally the entire document, so for every potential list item, we were capturing the entire document and then splitting the document into lines. For longer documents, this meant spending the majority of time just splitting the document into lines over and over.

Here's the offending Regex, and notice that (?:\\n[^\\n]*)* matches everything to the end of the document:

marked/src/Tokenizer.js

Line 193 in d098d55

    
           const itemRegex = new RegExp(`^( {0,3}${bull})((?: [^\\n]*| *)(?:\\n[^\\n]*)*(?:\\n|$))`);

This PR changes that so we only capture the first line (with it's bullet point), and once we verify that it is a candidate for starting a new list Item, we just traverse the SRC one line at a time. No more mass line-splitting when we really only need to look one line at a time anyway.

Offending line-splitting line where 90% of processing time was spent:

marked/src/Tokenizer.js

Line 205 in d098d55

lines = cap[2].split('\n');

Needs a bit of cleanup, but it's passing tests.

Contributor

Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
no tests required for this PR.
If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

CI is green (no forced merge required).
Squash and Merge PR following conventional commit guidelines.

vercel · 2021-11-24T20:33:49Z

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/markedjs/markedjs/5cTrJbPLiAZvhvjqo23kEq9L2pBJ
✅ Preview: https://markedjs-git-fork-calculuschild-fixon2scalingforlists-markedjs.vercel.app

UziTech · 2021-11-24T20:54:54Z

test/specs/bug/adjacent_lists.md

@@ -1 +0,0 @@
-*foo __bar *baz bim__ bam*


Why was this test removed? Is it failing with this PR?

nvm I see this was an accidental addition.

That was a test I added playing around trying to debug my program and accidentally committed it to the PR.

UziTech · 2021-11-24T21:13:34Z

test/specs/redos/quadratic_lists.cjs

+module.exports = {
+  markdown: '- a\n'.repeat(10000),
+  html: `<ul>${'<li>a</li>'.repeat(10000)}</ul>`
+};


I can confirm this speeds up parsing long lists.

This test takes about 2 seconds on my machine on master and about 100 ms with this PR.

calculuschild · 2021-11-24T21:30:56Z

I want to go over this code once more before it gets merged. I have a nagging sense that there's still some debris left behind I should clean up but exactly where is eluding me at the moment.

lib/marked.cjs

calculuschild · 2021-11-27T22:17:44Z

@UziTech Cleaned up the logic how I wanted. Passes the specs but is failing the SNYK security test here and I'm not sure why.

Otherwise, this is now ready to merged.

Edit: Ah, there was a merge conflict somewhere setting it off. All fixed now!

…child/marked into FixO(n2)ScalingForLists

src/Tokenizer.js

lib/marked.esm.js

lib/marked.umd.js

lib/marked.cjs

## [4.0.6](v4.0.5...v4.0.6) (2021-12-02) ### Bug Fixes * speed up parsing long lists ([#2302](#2302)) ([e0005d8](e0005d8))

github-actions · 2021-12-02T03:20:17Z

🎉 This PR is included in version 4.0.6 🎉

The release is available on:

Your semantic-release bot 📦🚀

Fixes applied. Passing all tests, but needs cleanup.

af041c7

vercel bot deployed to Preview November 24, 2021 20:33 View deployment

Remove accidental file.

f8f0296

vercel bot deployed to Preview November 24, 2021 20:35 View deployment

Cleanup and Lint

802e7ea

vercel bot deployed to Preview November 24, 2021 20:47 View deployment

calculuschild requested a review from UziTech November 24, 2021 20:51

UziTech reviewed Nov 24, 2021

View reviewed changes

calculuschild changed the title ~~Fixes applied. Passing all tests, but needs cleanup.~~ Fix List Tokenizer O(n2) time on long lists. Nov 24, 2021

add quadratic_lists test

c4744bf

vercel bot deployed to Preview November 24, 2021 21:12 View deployment

UziTech approved these changes Nov 24, 2021

View reviewed changes

UziTech requested review from davisjam, joshbruce and styfle November 24, 2021 21:15

styfle reviewed Nov 24, 2021

View reviewed changes

lib/marked.cjs Outdated Show resolved Hide resolved

styfle reviewed Nov 24, 2021

View reviewed changes

lib/marked.cjs Outdated Show resolved Hide resolved

Simplify logic to remove redundancy

10b8445

vercel bot deployed to Preview November 27, 2021 21:53 View deployment

Merge branch 'master' into FixO(n2)ScalingForLists

d85a70d

vercel bot deployed to Preview November 27, 2021 22:11 View deployment

calculuschild requested review from styfle and UziTech November 27, 2021 22:11

Update compiled files

c16536a

Merge branch 'FixO(n2)ScalingForLists' of https://github.com/calculus…

56c447b

…child/marked into FixO(n2)ScalingForLists

vercel bot deployed to Preview November 27, 2021 23:24 View deployment

calculuschild commented Nov 27, 2021

View reviewed changes

src/Tokenizer.js Outdated Show resolved Hide resolved

Uncomment list.items.loose

5ad1977

vercel bot deployed to Preview November 28, 2021 02:10 View deployment

UziTech approved these changes Nov 28, 2021

View reviewed changes

styfle reviewed Dec 1, 2021

View reviewed changes

lib/marked.esm.js Outdated Show resolved Hide resolved

styfle reviewed Dec 1, 2021

View reviewed changes

lib/marked.umd.js Outdated Show resolved Hide resolved

styfle reviewed Dec 1, 2021

View reviewed changes

lib/marked.cjs Outdated Show resolved Hide resolved

Update lib files

eea62d0

vercel bot deployed to Preview December 1, 2021 21:13 View deployment

styfle approved these changes Dec 1, 2021

View reviewed changes

UziTech changed the title ~~Fix List Tokenizer O(n2) time on long lists.~~ fix: speed up parsing long lists Dec 2, 2021

UziTech merged commit e0005d8 into markedjs:master Dec 2, 2021

github-actions bot pushed a commit that referenced this pull request Dec 2, 2021

chore(release): 4.0.6 [skip ci]

1aa5118

## [4.0.6](v4.0.5...v4.0.6) (2021-12-02) ### Bug Fixes * speed up parsing long lists ([#2302](#2302)) ([e0005d8](e0005d8))

github-actions bot added the released label Dec 2, 2021

calculuschild mentioned this pull request Dec 9, 2021

Fix every third list item broken #2318

Merged

5 tasks

sync-by-unito bot mentioned this pull request Jan 22, 2022

Bump marked from 0.7.0 to 4.0.10 Joystream/joystream#3044

Closed

snyk-bot mentioned this pull request Dec 29, 2022

[Snyk] Security upgrade marked from 0.3.5 to 4.0.10 Omrisnyk/goof-jenkins#21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: speed up parsing long lists #2302

fix: speed up parsing long lists #2302

calculuschild commented Nov 24, 2021 •

edited

vercel bot commented Nov 24, 2021 •

edited

UziTech Nov 24, 2021

UziTech Nov 24, 2021

calculuschild Nov 24, 2021

UziTech Nov 24, 2021

calculuschild Nov 24, 2021

calculuschild commented Nov 24, 2021

calculuschild commented Nov 27, 2021 •

edited

github-actions bot commented Dec 2, 2021

fix: speed up parsing long lists #2302

fix: speed up parsing long lists #2302

Conversation

calculuschild commented Nov 24, 2021 • edited

Description

Contributor

Committer

vercel bot commented Nov 24, 2021 • edited

UziTech Nov 24, 2021

Choose a reason for hiding this comment

UziTech Nov 24, 2021

Choose a reason for hiding this comment

calculuschild Nov 24, 2021

Choose a reason for hiding this comment

UziTech Nov 24, 2021

Choose a reason for hiding this comment

calculuschild Nov 24, 2021

Choose a reason for hiding this comment

calculuschild commented Nov 24, 2021

calculuschild commented Nov 27, 2021 • edited

github-actions bot commented Dec 2, 2021

calculuschild commented Nov 24, 2021 •

edited

vercel bot commented Nov 24, 2021 •

edited

calculuschild commented Nov 27, 2021 •

edited