Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider changing Parser's switch cases to methods, for extensibility #2033

Closed
mitranim opened this issue Apr 29, 2021 · 5 comments
Closed
Labels

Comments

@mitranim
Copy link

What pain point are you perceiving?

My use case seems to require special logic which cannot be done at the Renderer level, cannot be done by editing the AST via walkTokens (unless #2032 is implemented), but could be done at the Parser level if the API was easier to extend.

Describe the solution you'd like

In Parser, convert the bodies of switch cases to methods:

class Parser {
  parseToken(token, renderer) {
    switch (token.type) {
      case 'link': return this.link(token, renderer)
      case 'text': return this.text(token, renderer)

      // ...more cases for other known types; then:

      default: return this.default(token, renderer)
    }
  }

  link(token, renderer) {
    return renderer.link(token.href, token.title, this.parseInline(token.tokens, renderer))
  }

  text(token, renderer) {
    return renderer.text(token.text)
  }

  // ... more methods for other known types; then:

  default() {throw appropriate error}
}

We could even drop the switches and look up methods dynamically:

class Parser {
  parseToken(token, renderer) {
    if (this[token.type]) return this[token.type](token, renderer)
    return this.default(token, renderer)
  }
}

This would also make it trivial to implement #2032 in user code, by overriding default to support new token types.

Actual use case

To avoid an XY problem (proposing a poorly-chosen solution instead of stating the actual goal), here's what I actually need to do.

Depending on the source data:

  • Sometimes links are treated as their child .tokens, ignoring href.
  • Sometimes links are preserved as-is.
  • Sometimes a text token is converted to multiple tokens including 0 to N links, unless it's already inside a link.

With this proposal, this logic could be implemented by subclassing Parser and overriding just the right methods:

class MdParser extends md.Parser {
  link() {/* just the stuff */}
  text() {/* just the stuff */}
}
@mitranim
Copy link
Author

Addendum. For dynamic method lookup, we can avoid collisions with Object.prototype properties by extending null instead:

function Null() {}
Null.prototype = null

class Parser extends Null {}

console.log(Object.getOwnPropertyNames(Parser.prototype))
// ['constructor']

This gives us a squeaky-clean Parser.

@UziTech
Copy link
Member

UziTech commented Apr 29, 2021

We have a proposal to add tokenizers and renderers to marked here. Maybe that would solve this problem.

@mitranim
Copy link
Author

mitranim commented May 2, 2021

Thanks for pointing it out, I missed it. I still feel like both proposals are somewhat orthogonal, and this one could be done right now, while the other one is incubating. It doesn't seem to have as many design issues or performance issues. Even if it ends up being somewhat redundant, it would be quite handy for use cases like mine until #1872 is done (at some unspecified point). I might be able to PR it; let me know if I should try.

@UziTech
Copy link
Member

UziTech commented May 2, 2021

We are always open to PRs. If it doesn't slow marked down it seems like something that would work.

You can test the speed changes by running npm run bench before and after the change.

mitranim added a commit to mitranim/marked that referenced this issue May 3, 2021
Logic for different token types has been moved from switch cases to methods.
The main parse loop now simply looks up the method by token type, falling back
on `default()`.

Token types are now an open set, rather than a closed set. New token types can
be supported by subclassing Parser and adding corresponding methods, or by
overriding `default()`.

Consolidated the parsing logic for "top" and "inline" tokens. Removed
`parser.parseInline`. The top-level function `parseInline` uses
`lexer.lexInline` to generate an appropriate AST; additional restrictions at
the parser level appear redundant.

More flexible approach to "contextual" parsing logic, such as using a different
renderer or indicating "loose" mode. Now every parse method takes an additional
parameter: a "context", which contains a renderer and possibly other settings.
By default, such context objects are allocated lazily and no more than once.
The method signatures make it possible to use context inheritance, where child
contexts inherit from parent contexts, overriding only some of their properties.
This is not used by default, but can be useful for advanced cases such as one
described in markedjs#2033.

Benchmarks indicate no significant performance regressions or improvements.
Differences appear to be within noise levels.

Known regressions that must be fixed before merging the MR:

  * No support for coalescing adjacent `text` nodes.
  * List "loose" mode, with paragraph wrapping, is now a placeholder with
    incorrect logic.
  * Fewer tests pass, presumably as a result of ↑.

Addresses [markedjs#2033](markedjs#2033).
@UziTech
Copy link
Member

UziTech commented Jun 15, 2021

Custom tokenizers and renderer are available in v2.1.0

see Custom Extensions section in docs

@UziTech UziTech closed this as completed Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants