Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralise indented code block checks #936

Open
chrisjsewell opened this issue Jun 5, 2023 · 10 comments
Open

Centralise indented code block checks #936

chrisjsewell opened this issue Jun 5, 2023 · 10 comments

Comments

@chrisjsewell
Copy link
Contributor

chrisjsewell commented Jun 5, 2023

In markdown-it-py (executablebooks/markdown-it-py@9251695) and for its plugins (executablebooks/mdit-py-plugins@85f7ed3 + executablebooks/mdit-py-plugins@9e5aff8)
I have centralised the check - present in all block rules - for a block indented by greater than 4 spaces, i.e. the check for an indented code block, and deactivate it when the code rule is disabled.

This allows for behaviour similar to e.g. https://github.com/jgm/djot and https://mdxjs.com/, whereby you can indent syntax blocks arbitrarily, e.g.

<div>
    <div>
        
        > arbitrarily indented syntax block

    </div>
</div>

at the moment, in the code here, IMO it's a bit unintuitive that most block rules will not work like this, even if you disable code blocks

@rlidwka
Copy link
Member

rlidwka commented Jun 6, 2023

Yeah, I had my doubts about this one as well for some time.

Why?

Not clear what the behavior should be if code rule is disabled.

I wrote markdown-it rules to the letter of the spec, which says "must be no more than 4 spaces" on every block rule. But spec is of course written with the assumption that code block is always there.

djot is a great argument in favor of what you're suggesting. By the way, are there any other useful changes or commonmark deviations in there?

How to check if code rule is present?

I don't know how you manage custom rules, but this looks a bit suspicious:

# pre-check if code blocks are enabled, to speed up is_code_block method
self._code_enabled = "code" in self.md["block"].ruler.get_active_rules()
  1. can user define their own block rule named code? what happens if they do?
  2. what if user wants to define another block similar to code, but named differently, and still have the restriction?
  3. and why is 4 spaces hardcoded?

Just some thoughts

Maybe block state should have something like max_starting_spaces as a parameter, which should be +inf by default, and decreased to 4 if code rule is present by the code rule itself.

Maybe block state should also have min_starting_spaces as a parameter to clamp it from the other side. Those checks also happen somewhere.

TL;DR: I like the idea, unsure about implementation details.

@chrisjsewell
Copy link
Contributor Author

By the way, are there any other useful changes or commonmark deviations in there?

I'd recommend just reading https://github.com/jgm/djot#rationale, this essentially details the commonmark deviations and their rationale, which also derives from https://johnmacfarlane.net/beyond-markdown.html

I actually learnt of djot after asking about one of the changes it makes; to remove non-locality of reference link parsing: commonmark/commonmark-spec#702

I don't know how you manage custom rules, but this looks a bit suspicious

yeh certainly open to other ideas

Maybe block state should have something like max_starting_spaces as a parameter, which should be +inf by default, and decreased to 4 if code rule is present by the code rule itself.

I like the idea, however, how could this work, given that individual rules don't have any "initialisation hook" themselves? (obviously, plugins can initialise things in MarkdownIt.use, but built-in rules have no such feature)

@rlidwka
Copy link
Member

rlidwka commented Jun 7, 2023

I like the idea, however, how could this work, given that individual rules don't have any "initialisation hook" themselves?

In theory, every markdown-it rule can be wrapped in its own plugin, which can configure itself (and then there would be a single large plugin for commonmark syntax, loading smaller plugins for each individual rule).

In practice, probably noone here would dedicate time and energy into changing the way it works now, and this library is too widely used to introduce breaking changes.

Maybe you can do so in python, if you don't mind diverging from source code here, and if python code patterns would align that way, and if you don't get performance issues.

@chrisjsewell
Copy link
Contributor Author

Maybe you can do so in python, if you don't mind diverging from source code here

Yeh ughh, I'd be very wary of doing this as obviously any divergences make it more difficult to port fixes/improvements you make here and ensure markdown-it-py is kept in-sync

@chrisjsewell
Copy link
Contributor Author

From markdown-it-rust/markdown-it#7 (comment), I see now why you suggest this 😄

So I guess this is definitely possible to add in markdown-it.rs

@rlidwka
Copy link
Member

rlidwka commented Jun 22, 2023

Yeh ughh, I'd be very wary of doing this as obviously any divergences make it more difficult to port fixes/improvements you make here and ensure markdown-it-py is kept in-sync

I see now why you suggest this 😄

I've suggested this, because markdown-it right now is in de-facto feature freeze.

The project it was created for is up and running, so we don't need to fix anything. There's no time/money to make any major changes. And even if we did have resources, markdown-it is now used by too many people to justify any breaking changes.

If you are porting it to python however:

  • you don't have backward compatibility issues
  • you don't have to sync to upstream (because as I said feature-freeze)
  • you may have some time to do it

If you were to do that, I have some ideas how to improve the parser. Most of these you can see in my Rust port, but not all (for example, I'm thinking this entire thing should work under ECS paradigm, which probably helps with performance and usability). That's actually why I wrote the that thing - I found out that in Rust I can refactor things a lot faster than in JS, and quickly test any ideas.

@chrisjsewell
Copy link
Contributor Author

you don't have backward compatibility issues

well not on the same scale as here, but markdown-it-py has a fair few users, so certainly not trivial to change too much 😅

But yes the Rust port is looking really cool thanks, so I would rather also invest my time there and seeing what I can do with https://github.com/chrisjsewell/markdown-it-pyrs

@rlidwka
Copy link
Member

rlidwka commented Jun 23, 2023

so I would rather also invest my time there and seeing what I can do with

Not a discussion for this repo, but...

"Parser plugins cannot currently be written in Python and added dynamically to the parser."

What exactly is missing for that to work? I was thinking about reflection API already, and that's a good use-case for it.

@chrisjsewell
Copy link
Contributor Author

I was thinking about reflection API already, and that's a good use-case for it.

Oo very interesting 👀

What exactly is missing for that to work?

Well I guess it is just trying to work where the program language barriers lie, and how to minimise that surface area

Obviously, the simplest interface is strings: Python string (MD) -> Rust -> Python string (HTML).
Here there is no configurability

Then I have a "wrapper" around markdown_it::MarkdownIt,
to allow for configurability of the parser, i.e. adding plugin
(https://github.com/chrisjsewell/markdown-it-pyrs/blob/c474ab3276f63844bac0a55351738d938c4a866f/src/lib.rs#L8)

I then have a converter from the Rust AST to a Python AST.
This feels like it could be more "elegant" though, because at present I have to hard-code all the conversions: https://github.com/chrisjsewell/markdown-it-pyrs/blob/c474ab3276f63844bac0a55351738d938c4a866f/src/nodes.rs#L164

To have proper parser plugins, on the Python side, though,
I think you would need at least "two-way" AST conversion, and you would need to interact with other aspects of the parser.

Also I'm not sure how you could "dynamically" define new Node types on the Python side,
since these are compiled on the Rust side

@rlidwka
Copy link
Member

rlidwka commented Nov 19, 2023

Please send a pull request for this library.

We might get this change in (if performance impact is negligible).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants