-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle inline code spans with multiple backticks #56
Conversation
Before, `options.code_block_token` was used for inline code spans. Now we always use a backtick. As far as I can tell from the reference[1] and from experimentation[2], code spans should always use one or more backticks and never use something else such as tildes. [1]: https://spec.commonmark.org/0.30/#code-spans [2]: https://spec.commonmark.org/dingus/?text=%60foo%60%0A
We now insert the necessary number of backticks to quote any backticks found in the code span itself. Before, only the case with a single backtick was handled, now the code can contain arbitrarily many backticks. This was found using a fuzz test which tries to fuzz a single paragraph of Markdown text. that is, it only fuzzes text which matches [Event::Start(Tag::Paragraph), .., Event::End(Tag::Paragraph)]
One of the issues found by the fuzz test is pulldown-cmark/pulldown-cmark#655. The input "`\n`\n`" -> [
Start(Paragraph)
Code(Borrowed(""))
SoftBreak
Text(Borrowed("`"))
End(Paragraph)
] This is wrong since the code span should contain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a million for your continued work - I feel that you will chip off some misbehaviour with each PR until this crate finally works correctly!
With that said, I have added some assertions for count_consecutive_backticks
just to validate that it is supposed to work that way - my intuition is that it should abort early and no test breaks if it does. Maybe there can be another test to pin down the 'non-breaking' behaviour.
Finally, there is a question about whitespace.
Please let me know what you think - even as is it can be merged as I am sure the fuzzer will force all actual issues out already so it's OK to make mistakes on the way and start with fast broad strokes without need for details like the ones I brought up.
I'm very happy to be able to help!
Thanks for adding the tests! I don't think it can abort early since it needs to look for backticks through the entire string. Well... there are two small things one could do:
|
This sounds like making these changing to counting backticks goes beyond broad strokes 😁, so let's keep the momentum and make round-tripping possible :). Thanks again, I can't wait to see more PRs just like this one. |
Cool, thanks for merging it! I found what looks like another small issue in pulldown-cmark, see pulldown-cmark/pulldown-cmark#657. |
Cool, thanks for merging it! I found what looks like another small issue in pulldown-cmark, see pulldown-cmark/pulldown-cmark#657. It seems that combining the two crates like this is a fruitful way to tease out inconsistencies 😄 |
Before,
options.code_block_token
was used for inline code spans. Now we always use a backtick.As far as I can tell from the reference and from experimentation, code spans should always use one or more backticks and never use something else such as tildes.
The PR also ensures that we insert the necessary number of backticks to quote any backticks found in the code span itself. Before, only the case with a single backtick was handled, now the code can contain arbitrarily many backticks.
This was found using a fuzz test like the one in #55. The difference is that I restricted the input to a single paragraph of Markdown text, meaning that it only fuzzes inputs which match
There seems to be some more corner cases that are not handled already, so I'll try to send more PRs to fix those.