Custom extras #519

Crozzers · 2023-07-02T15:00:22Z

This PR closes #382 by adding support for custom extras.
@JeffAbrahamson, @Einenlum and @kjaymiller, you've all expressed interest in this feature, so I'd love to hear any thoughts from you all.

How it works

Extras

Extras must inherit from the new Extra class. See the Strike extra for a basic example of the implementation.
Each extra must define the following properties:

name
- A string name the user will be able to pass in the existing extras=[] param, like fenced-code-blocks.
order
- A list of integers controlling when the extra will be invoked. You can specify before or after a certain stage, before AND after different stages or even before/after another extra
run()
- The function that will be called with text for the extra to process
test()
- Checks whether the extra should be run against a block of text. For example, the fenced-code-blocks extra might check '```' in text

Extra options

The new extras support passing in extra parameters, like with existing extras. These are passed to the initialiser of the extra via the extras dict in the Markdown class (see _setup_extras)

Stages

Converting markdown to HTML happens in distinct stages, such as preprocessing, hashing HTML, processing lists, etc. These distinct stages are given numeric IDs in the Stage class which denote the order they run in.

The Markdown class methods that correspond to each stage get decorated with @Stage.mark. For example:

class Markdown:
    ...
    @Stage.mark(Stage.PREPROCESS)
    def preprocess(self, text):
        ...

Now, when preprocess is called, a wrapper function will find all registered extras that are in the same band as this stage and execute them.

Order of execution

As mentioned previously, each stage has its own numeric ID, each 100 apart from the last. Calling Stage.before/after(Stage.WHATEVER) will return that stage's ID minus/plus 5. Conescutive calls increment an internal counter and will continue subtracting/adding 5. For example:

Stage.PREPROCESS  # 50
Stage.after(Stage.PREPROCESS)  # [55]
Stage.after(Stage.PREPROCESS)  # [60]

This means that multiple extras can be registered for the same stage and will be executed in the order in which they were registered.

Known flaws

Calling Stage.before 10 times on the same stage will result in extras overflowing into the next stage
- So your Stage.after([some stage]) will actually mean Stage.before([next stage])
- This can be mitigated by calling Stage.after([stage], step=1) to increase the number of calls that can be made before this happens
Registering extras is a bit of a mess at the moment
- All subclasses of Extra are auto-detected and registered during _setup_extras. I intend to change so extras must explicitly register themselves
Not all extras have been converted to new format
- Notably the footnotes extra and a few others are tightly integrated and I haven't been able to untangle them as of yet
Not all extras have been completely seperated from the Markdown class so there are still shims in place
- EG: smarty-pants, markdown-in-html

The way this works is by creating subclasses of the `Extra` class. These subclasses will have an order and a name. The name is the same one specified in the `extras` list/dict given to the Markdown init function. The order will be at which point the function will be executed. This is done by attaching the extra to a "Stage", a distinct step in the markdown process (eg: forming paragraphs, processing links... etc). You can set the extra to run before or after the stage. At the moment, extras are automatically registered, activated and executed by the Markdown class. TODO: * More elegant way to register and init extras * Optimise `Stage.mark` * Convert more extras to new class format

Stay up to date with master

Also converted `mermaid` extra as part of this process. As a plus, you no longer need fenced-code-blocks activated to use mermaid. Also added the ability for extras to be triggered before or after another extra

All block extras have now been converted

Still TODO is things like `footnotes` extra. Also need to think of a system for extras to replace parts of the standard syntax

Merge master to pull in trentm#514 for conversion to new Extra format

Also fix tests not running on Windows

At time of commit, Python 3.5 is 2 years 9 months EOL.

Crozzers · 2023-07-02T16:40:14Z

Registering extras is a bit of a mess at the moment

All subclasses of Extra are auto-detected and registered during _setup_extras. I intend to change so extras must explicitly register themselves

Changed in 5478960 so that extras must manually registered by calling the Extra.register function.
During Markdown._setup_extras it will now iterate over all registered extras and instantiate any that are found in self.extras using the parameters from self.extras

I have also dropped Python 3.5 support since it was throwing errors in the test suite due to some type hints. Python 3.5 was EOL 2 years and 9 months ago so this shouldn't be too much of an issue

nicholasserra · 2023-07-23T17:33:22Z

Somehow this slipped through the cracks for me. I'll get it into my queue

…nto custom-extras Resolve merge conflicts for PR

Crozzers · 2023-07-23T21:21:49Z

No worries! Might also be worth pulling in some downstream project maintainers for feedback. Off the top of my head, @mhils (pdoc) and @rouilj (roundup) come to mind. Since both of these projects use the library, I imagine there will be some ideas for what the feature should look like.

rouilj · 2023-07-23T21:41:26Z

Thanks for the ping.

Markdown2 is just one of three supported markdown engines. Each is configured a little differently with different capabilities. The things I would use this for (recognizing autolinked phrases like issue1) etc are done by changing the text seen by markdown2 so it includes the proper markdown syntax for the links.

So I try to do all customization via markdown syntax. I do enable options (e.g. fenced blocks, line breaks on newline) that are supported on all the engines.

That said, I do have custom code for markdown and mistune that handle setting the rel
attributes on links or try to make embedded html safer.

Not sure if the customization methods of other engines may be useful feedback.
But you can see what other contributers have done at:

https://github.com/roundup-tracker/roundup/blob/master/roundup/cgi/templating.py#L122-L235

mhils · 2023-07-24T09:43:27Z

Thanks for the heads-up! This looks like a sensible refactor, no complaints really. :)

From an API standpoint I'm not sure if having a dedicated test method is particularly useful (extras could just do this check as the first part of run if they want to improve performance?`), but overall this all looks good.

Crozzers · 2023-08-14T17:55:58Z

Apologies for the slow reply, Thanks all for the feedback!

@rouilj I took a look at the examples you linked, and tried my hand at implementing a few of them. All worked out pretty well, although a bit more manual work was needed since this library lacks facilities like the TreeProcessor base class. Instead of iterating over a list of elements, I parsed the output text for <a href... and replaced the hashed links. Not nearly as elegant but definitely achievable.

The developer experience could be made even better by perhaps implementing a LinkPostProcessor base class that finds all <a> tags, parses the HREF and REL attrs and passes it to an internal sub method that can edit the text for final output.

class LinkPostProcessor(Extra, ABC):
  order = Stage.after(Stage.LINKS)
  def test(self, text):
    return True
  @abstractmethod
  def sub(self, match):
    ...
  def run(self, text):
    return re.sub(r'(<a.*?)(?P<href>href="md5-[a-z0-9]{32}")(.*>)', self.sub, text)

class MyExtra(LinkPostProcessor):
  name = 'my-extra'
  def sub(self, text):
    return f'[REMOVED A LINK WITH WITH HREF {match.group("href"}]'

I could see such convenience classes being quite useful for quickly creating extras. Would likely implement as and when needed, probably to simplify existing extras that function in similar ways.
It's not a direct replacement for TreeProcessor and the like but hopefully that sort of idea will do.

@mhils The separated test method was something that I saw in Markdown's extension API docs and replicated here. Personally I think separating this functionality from the run function is cleaner and the explicit method is a good reminder to think about performance.
That said, if one didn't care for performance, including a def text(*_): return True method for each extra would be tedious so perhaps having the base implementation always return True with the option for overriding it would be a good compromise?

Allows extras to check exactly when they've been triggered - before or after certain stages

Remove unneeded type hints

Ended up implementing a `ItalicsAndBoldProcessor` ABC extra class for both to piggy-back off of. Works by hashing text that we don't want processed and then doing nothing with the other text. Hashed text is then replaced after I&B has run

Resolve merge conflicts for PR

mhils · 2023-12-01T09:52:23Z

Thanks @Crozzers! 🍰

Personally I think separating this functionality from the run function is cleaner and the explicit method is a good reminder to think about performance.

This sounds reasonable to me - I think an explicit reminder may be a good idea. :) In that case it should probably not have a default implementation, otherwise that defeats that purpose. 😄

We'd have our first real use case for custom extras over in mitmproxy/pdoc#639, so while no hurries please I'm looking forward to this landing at some point. 😃

Merge master to resolve PR conflicts

Resolve PR conflicts

Resolve conflicts for PR

Crozzers · 2024-01-03T14:19:24Z

@nicholasserra would it be alright to revisit this PR? With all the recent changes to different extras it's getting a bit cumbersome to keep porting them back to this branch. There are also a couple of issues that would benefit from this:

Add typing to package #562: this PR drops Python 3.5 support which makes type hinting easier
Easy way to replace links to other markdown files to html files #528: a custom extra for converting *.md links to .html links would be easier with custom extra infrastructure

Many thanks

nicholasserra · 2024-01-03T22:26:36Z

Sure thing i'll get this on my priority list thank you

Merge recent changes to `breaks` functionality

nicholasserra · 2024-01-12T00:56:21Z

This is looking pretty fancy to me. I like it. Anythings better than the free for all we have right now. Still wrapping my head around the numerical aspect of the stages though. What's that getting us that something like an ordered list or dict isn't? I need to read through the code a bit longer to properly grasp what's happening.

But overall I think this should be great and we'll land it as-is or with minor tweaks. Just give me a little more time to think on it.

Crozzers · 2024-01-14T16:02:18Z

Still wrapping my head around the numerical aspect of the stages though. What's that getting us that something like an ordered list or dict isn't?

An ordered list would be simpler and easier and would dodge this flaw in the number system:

Calling Stage.before 10 times on the same stage will result in extras overflowing into the next stage

The only issue is: if an extra is tied to a particular stage and that stage gets skipped for some reason, do we still want to run that extra?

With the numbering system, we can say that anything where 100 <= order <= 200 is tied to HASH_HTML and we can skip them. With an ordered list, it would look like this:

[Stage.PREPROCESS, FencedCodeBlocks, Stage.HASH_HTML, FencedCodeBlocks, Stage.LINK_DEFS]

FencedCodeBlocks is registered as before LINK_DEFS and after PREPROCESS, but with an ordered list we can't tell and therefore can't skip it.

We could instead have ordered lists for each extra. For example:

# I've manually filled out the lists here but this would be filled by calling `Stage.before/after`
class Stage:
    # tuple containing the `before` and `after` lists respectively. Could be done as a namedtuple
    PREPROCESS = ([], [FencedCodeBlocks])
    HASH_HTML = ([], [])
    LINK_DEFS = ([FencedCodeBlocks], [])

This solves the issue of numbers overflowing whilst still allowing us to tie extras to particular stages. The only downside is if an extra sets its order as Stage.before(FencedCodeBlocks), it's a bit more tedious to insert it in all the right lists, but this only happens at startup so isn't a huge concern.

I'll play around with this and see what happens

… ordering Lists let us have a theoreticaly infinite number of extras registered, with no overflowing into the next stage

… to `Extra` ABC

Crozzers · 2024-01-28T13:45:00Z

@nicholasserra, I've refactored the ordering system for extras. I've moved most of the ordering logic into the Extra class and made Stage into a proper enum. The big change is that the numbering system is now gone.

The type of the Extra.order attribute is now a tuple of two iterables. This represents "before" and "after". EG:

class MyExtra(Extra):
  # run before preprocessing, after HASH_HTML and after the FencedCodeBlocks extra is run
  order = [Stage.PREPROCESS], [Stage.HASH_HTML, FencedCodeBlocks]

When you call MyExtra.register, it will calculate when this extra should be triggered and store this info in Extra._exec_order. When all the extras are registered, it looks something like this:

{<Stage.PREPROCESS: 1>: (
                         [],  # before
                         [<class 'markdown2.Mermaid'>,  # after
                          <class 'markdown2.Wavedrom'>,
                          <class 'markdown2.FencedCodeBlocks'>]),
 <Stage.HASH_HTML: 2>: (
                         [],  # before
                         [<class 'markdown2.MarkdownInHTML'>]),  # after
[...]
 <Stage.ITALIC_AND_BOLD: 14>: (
                         [<class 'markdown2.Underline'>,  # before
                          <class 'markdown2.Strike'>,
                          <class 'markdown2.MiddleWordEm'>,
                          <class 'markdown2.CodeFriendly'>],
                         [<class 'markdown2.Breaks'>,  # after
                          <class 'markdown2.MiddleWordEm'>,
                          <class 'markdown2.CodeFriendly'>,
                          <class 'markdown2.TelegramSpoiler'>])}

As mentioned previously, the Stage class is now an IntEnum containing the different stages. This was done so that extras could check the current stage of execution, and whether they were being called before/after a particular stage. This is useful for extras that need to be called before AND after a stage.

Calling Extra.deregister now also removes it from Extra._exec_order, allowing users to potentially change the order of an extra on the fly.

Finally, in the changelog I've put these changes under the most recent header, which is 2.4.13, but I suspect this change may warrant a bump to 2.5.0

nicholasserra · 2024-02-19T22:52:01Z

@Crozzers This is all looking amazing. I think i'm ready to merge this in. Thank you for all the work on this!

I see you have some other open PRs. How would you like to proceed here? Merge this in now, and then fix up the others (if needed), or should I merge those others in and this PR can be rebased? Your call.

I think i'll do a release of 2.4.13 before this lands. Then we can move these changes into 2.5.0 as you recommended. I'll just edit the changelog manually when it comes time.

Crozzers · 2024-02-21T17:29:04Z

None of the other PRs are particularly high priority and #568 requires some further investigation so I'd say merge this one first and I can update the others if needed. They're small patches anyway so shouldn't need much doing to them.

Thanks for taking the time to review this PR, it ended up as a bit of a monster

nicholasserra · 2024-02-22T00:59:27Z

Ok then i'm gonna do a release of the current changes, then merge this, then manually bump us to 2.5.0 after

Crozzers added 17 commits March 13, 2023 18:28

Convert wavedrom and numbering extras to new format

aad9e33

Merge branch 'master' into custom-extras

d3efe1e

Stay up to date with master

Convert femced-code-blocks extra to new Extra format.

212131b

Also converted `mermaid` extra as part of this process. As a plus, you no longer need fenced-code-blocks activated to use mermaid. Also added the ability for extras to be triggered before or after another extra

Initialise extras with instance of Markdown and convert pyshell extra

26ae9a5

Add some docstrings

f4b3d66

Convert tables and wiki-tables extras to new format.

283fcdf

All block extras have now been converted

Pass extra options to initialiser

cb1fd28

Allow link patterns to be specified via extras dict

e460b2e

Convert span extras to new format.

ad7a3ff

Still TODO is things like `footnotes` extra. Also need to think of a system for extras to replace parts of the standard syntax

Merge branch 'master' into custom-extras.

8a3fdcf

Merge master to pull in trentm#514 for conversion to new Extra format

Convert markdown-in-html to new extra format

a2d8baa

Fix failing Python 3.6 tests due to missing re.Match class.

262d7ff

Also fix tests not running on Windows

Drop Python 3.5 support.

cb05bdc

At time of commit, Python 3.5 is 2 years 9 months EOL.

Update changes.md

666ebef

Fix wavedrom tests

8eb13b4

Make Extra registration manual

5478960

Crozzers force-pushed the custom-extras branch from 1a47bc1 to 5478960 Compare July 2, 2023 16:38

Crozzers marked this pull request as ready for review July 8, 2023 11:07

Merge branch 'master' of https://github.com/trentm/python-markdown2 i…

1682d6f

…nto custom-extras Resolve merge conflicts for PR

Fix incorrect order being returned in Stage.__order

7751098

Crozzers added 3 commits August 19, 2023 20:04

Add order attribute to Markdown class.

6233ea1

Allows extras to check exactly when they've been triggered - before or after certain stages

Fix extra options being replaced if falsey.

a63343d

Remove unneeded type hints

Crozzers mentioned this pull request Sep 5, 2023

Easy way to replace links to other markdown files to html files #528

Open

Crozzers added 2 commits September 24, 2023 18:58

Merge branch 'master' into custom-extras

eaa4544

Resolve merge conflicts for PR

Merge branch 'master' into custom-extras

bf2bec3

Resolve merge conflicts for PR

Crozzers added 5 commits December 1, 2023 17:51

Merge branch 'master' into custom-extras

4353751

Merge master to resolve PR conflicts

Merge branch 'master' into custom-extras

fc33019

Resolve PR conflicts

Merge branch 'master' into custom-extras

4373eb3

Resolve conflicts for PR

Add Python 3.12 to tox config

c269ccf

Fix changelog after merge

cc34905

nicholasserra added Priority High priority tickets Feature Feature request labels Jan 3, 2024

nicholasserra self-assigned this Jan 3, 2024

Merge branch 'master' into custom-extras

a9fb886

Merge recent changes to `breaks` functionality

Crozzers added 3 commits January 27, 2024 18:33

Refactor extras ordering system to use lists rather than number based…

123b6a9

… ordering Lists let us have a theoreticaly infinite number of extras registered, with no overflowing into the next stage

Refactor the Stage class to be a proper enum, move exec order logic…

66d7e4f

… to `Extra` ABC

Merge branch 'master' into custom-extras

b41ca8f

nicholasserra merged commit 44e0277 into trentm:master Feb 25, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom extras #519

Custom extras #519

Crozzers commented Jul 2, 2023

Crozzers commented Jul 2, 2023 •

edited

nicholasserra commented Jul 23, 2023 •

edited

Crozzers commented Jul 23, 2023 •

edited

rouilj commented Jul 23, 2023

mhils commented Jul 24, 2023

Crozzers commented Aug 14, 2023

mhils commented Dec 1, 2023

Crozzers commented Jan 3, 2024 •

edited

nicholasserra commented Jan 3, 2024

nicholasserra commented Jan 12, 2024

Crozzers commented Jan 14, 2024

Crozzers commented Jan 28, 2024

nicholasserra commented Feb 19, 2024

Crozzers commented Feb 21, 2024

nicholasserra commented Feb 22, 2024

Custom extras #519

Custom extras #519

Conversation

Crozzers commented Jul 2, 2023

How it works

Extras

Extra options

Stages

Order of execution

Known flaws

Crozzers commented Jul 2, 2023 • edited

nicholasserra commented Jul 23, 2023 • edited

Crozzers commented Jul 23, 2023 • edited

rouilj commented Jul 23, 2023

mhils commented Jul 24, 2023

Crozzers commented Aug 14, 2023

mhils commented Dec 1, 2023

Crozzers commented Jan 3, 2024 • edited

nicholasserra commented Jan 3, 2024

nicholasserra commented Jan 12, 2024

Crozzers commented Jan 14, 2024

Crozzers commented Jan 28, 2024

nicholasserra commented Feb 19, 2024

Crozzers commented Feb 21, 2024

nicholasserra commented Feb 22, 2024

Crozzers commented Jul 2, 2023 •

edited

nicholasserra commented Jul 23, 2023 •

edited

Crozzers commented Jul 23, 2023 •

edited

Crozzers commented Jan 3, 2024 •

edited