Consider changing Parser's switch cases to methods, for extensibility #2033

mitranim · 2021-04-29T07:07:09Z

What pain point are you perceiving?

My use case seems to require special logic which cannot be done at the Renderer level, cannot be done by editing the AST via walkTokens (unless #2032 is implemented), but could be done at the Parser level if the API was easier to extend.

Describe the solution you'd like

In Parser, convert the bodies of switch cases to methods:

class Parser {
  parseToken(token, renderer) {
    switch (token.type) {
      case 'link': return this.link(token, renderer)
      case 'text': return this.text(token, renderer)

      // ...more cases for other known types; then:

      default: return this.default(token, renderer)
    }
  }

  link(token, renderer) {
    return renderer.link(token.href, token.title, this.parseInline(token.tokens, renderer))
  }

  text(token, renderer) {
    return renderer.text(token.text)
  }

  // ... more methods for other known types; then:

  default() {throw appropriate error}
}

We could even drop the switches and look up methods dynamically:

class Parser {
  parseToken(token, renderer) {
    if (this[token.type]) return this[token.type](token, renderer)
    return this.default(token, renderer)
  }
}

This would also make it trivial to implement #2032 in user code, by overriding default to support new token types.

Actual use case

To avoid an XY problem (proposing a poorly-chosen solution instead of stating the actual goal), here's what I actually need to do.

Depending on the source data:

Sometimes links are treated as their child .tokens, ignoring href.
Sometimes links are preserved as-is.
Sometimes a text token is converted to multiple tokens including 0 to N links, unless it's already inside a link.

With this proposal, this logic could be implemented by subclassing Parser and overriding just the right methods:

class MdParser extends md.Parser {
  link() {/* just the stuff */}
  text() {/* just the stuff */}
}

The text was updated successfully, but these errors were encountered:

mitranim · 2021-04-29T07:35:14Z

Addendum. For dynamic method lookup, we can avoid collisions with Object.prototype properties by extending null instead:

function Null() {}
Null.prototype = null

class Parser extends Null {}

console.log(Object.getOwnPropertyNames(Parser.prototype))
// ['constructor']

This gives us a squeaky-clean Parser.

UziTech · 2021-04-29T14:48:38Z

We have a proposal to add tokenizers and renderers to marked here. Maybe that would solve this problem.

mitranim · 2021-05-02T05:54:45Z

Thanks for pointing it out, I missed it. I still feel like both proposals are somewhat orthogonal, and this one could be done right now, while the other one is incubating. It doesn't seem to have as many design issues or performance issues. Even if it ends up being somewhat redundant, it would be quite handy for use cases like mine until #1872 is done (at some unspecified point). I might be able to PR it; let me know if I should try.

UziTech · 2021-05-02T06:38:16Z

We are always open to PRs. If it doesn't slow marked down it seems like something that would work.

You can test the speed changes by running npm run bench before and after the change.

Logic for different token types has been moved from switch cases to methods. The main parse loop now simply looks up the method by token type, falling back on `default()`. Token types are now an open set, rather than a closed set. New token types can be supported by subclassing Parser and adding corresponding methods, or by overriding `default()`. Consolidated the parsing logic for "top" and "inline" tokens. Removed `parser.parseInline`. The top-level function `parseInline` uses `lexer.lexInline` to generate an appropriate AST; additional restrictions at the parser level appear redundant. More flexible approach to "contextual" parsing logic, such as using a different renderer or indicating "loose" mode. Now every parse method takes an additional parameter: a "context", which contains a renderer and possibly other settings. By default, such context objects are allocated lazily and no more than once. The method signatures make it possible to use context inheritance, where child contexts inherit from parent contexts, overriding only some of their properties. This is not used by default, but can be useful for advanced cases such as one described in markedjs#2033. Benchmarks indicate no significant performance regressions or improvements. Differences appear to be within noise levels. Known regressions that must be fixed before merging the MR: * No support for coalescing adjacent `text` nodes. * List "loose" mode, with paragraph wrapping, is now a placeholder with incorrect logic. * Fewer tests pass, presumably as a result of ↑. Addresses [markedjs#2033](markedjs#2033).

UziTech · 2021-06-15T23:52:11Z

Custom tokenizers and renderer are available in v2.1.0

see Custom Extensions section in docs

mitranim mentioned this issue Apr 29, 2021

Consider supporting a simple "sequence of tokens" token type #2032

Closed

mitranim mentioned this issue May 3, 2021

[WIP] extensible Parser: separate methods for token types #2038

Closed

UziTech mentioned this issue May 3, 2021

Pass whole token to renderer to increase customization possibilities #2039

Closed

UziTech added the proposal label May 3, 2021

UziTech closed this as completed Jun 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider changing Parser's switch cases to methods, for extensibility #2033

Consider changing Parser's switch cases to methods, for extensibility #2033

mitranim commented Apr 29, 2021

mitranim commented Apr 29, 2021

UziTech commented Apr 29, 2021

mitranim commented May 2, 2021

UziTech commented May 2, 2021 •

edited

UziTech commented Jun 15, 2021

Consider changing Parser's switch cases to methods, for extensibility #2033

Consider changing Parser's switch cases to methods, for extensibility #2033

Comments

mitranim commented Apr 29, 2021

mitranim commented Apr 29, 2021

UziTech commented Apr 29, 2021

mitranim commented May 2, 2021

UziTech commented May 2, 2021 • edited

UziTech commented Jun 15, 2021

UziTech commented May 2, 2021 •

edited