New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add boolean to tell if it's an indented code block or not #416
Add boolean to tell if it's an indented code block or not #416
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a quickfix this seems fine. But if this feature is required long-term, I'd suggest to add a proper way to differenciate not only indented and fenced code-blocks, but also their fence. That includes the symbol used as a fence (~
or `
) as well as the number of fence-characters (3 for ```
, 4 for ~~~~
). The size of Tag
shouldn't be affected because the CodeBlock
variants suggested below are still smaller than the Link
variant.
This PR and the associated issue might also be related to / help with #271.
I could imagine something like this:
enum Tag<'a> {
CodeBlock(Fence, CowStr<'a>),
}
enum Fence {
Indented,
Fenced(FenceCharacter, usize), // enum FenceCharacter { Tilde, Grave }
}
// or
enum Fence<'a> {
Indented,
Fenced(CowStr<'a>), // e.g. "```" or `~~~~~`
}
src/parse.rs
Outdated
CodeBlock(CowStr<'a>), | ||
/// A code block. | ||
/// | ||
/// The boolean is `true` is this is an indented code block (not starting with "\`\`\`"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The boolean is `true` is this is an indented code block (not starting with "\`\`\`"). | |
/// The boolean is `true` if this is an indented code block (not starting with ```` ``` ```` or `~~~`). |
Rendered (with GitHub Markdown): /// The boolean is true
if this is an indented code block (not starting with ```
or ~~~
).
Normally I'd be okay with merging a small change like this, but since it is a compatibility break, we should be a bit more careful. First, could you elaborate on when the distinction between indented and delimited codeblocks is relevant? I see what you wrote here, but I don't quite get what rustdoc does there. Ideally, we'd only really expose semantic differences and leave users to dig into the source map if they care about constitutes a code block, even if that feels a bit hacky. Secondly, since this is a public interface, some care should be taken with respect to the design. A boolean works for sure, but @oberien's suggestion may be more future proof if we can find a purpose for the size of the delimiters. Wrapping the Finally, it's a bit uncomfortable to do a major release so soon after |
I saw a very funny case where an indented code block started with "```". Since we are parsing this to check if we're supposed to compile it and test it, we're in a problematic situation because we don't know if it's actually an indented code block. If we had this information, we wouldn't need to parse and just assume it's a code block. With the small hack I wrote to know this, it's an issue in case a code block is inside a list. So like @oberien suggested, I'll extend this PR a bit in order to also provide the string that makes this code block as a code block ("```" for example). And yes, it'll require a medium release. :) |
So rustdoc only compiles delimited codeblocks (starting with either |
No we compile both. But the difference is that you can give instructions on non-indented codeblocks. |
How are these instructions encoded? Are they in the language snippet of the codeblock? |
Yes, looks like this for example: "```compile_fail,edition2018,E0594" |
Cool, thanks. Are delimited codeblocks without instructions treated like indented codeblocks? Because you could just go off the language snippet then, right? |
The same way at least. If no additional data is provided, we just assume it's a rust codeblock. |
As marcus said, in your given example |
We do. The problem is that we still need to know if we have to. |
6a7cd69
to
1de6cb8
Compare
Ok, I updated the PR to include the fence string as well (using an enum). I also added some tests. I think it's ready? |
Any news? |
It's still not exactly clear to me what problem this solves. If you could provide a comprehensive summary that would be very useful. |
"```" should be consided content in a codeblock:
This one should be not indented code block:
In both cases, I need to know if I am in an indented codeblock or not to know if I need to check the "```" content. I found a trick by checking if whitespaces were present before the codeblock start but I realized it would fail in case it was in a list. To provide even more context: if the "syntax" is empty, we assume by default this is rust. We then check for this "```" to see if extra information has been passed down. This is the issue. |
I still don't understand. What do you mean by "```" content? And by this extra information, do you mean stuff like |
I'm bad at explaining. Please read the link to the rustdoc PR in the first comment to have full information. That might be better than me trying to resume the situation... |
I can try to clarify. When rustdoc fails to parse a code block, it has a suggestion to add the The problem is that it can be tricky to determine if something is a fenced code block or indented code block. You can't just check if the code block starts with ```, because someone could have an indented code block that starts with ```. Due to some changes in the 0.6 API, this is now even harder than before. I'm still not convinced my suggestion given in rust-lang/rust#65894 (comment) wouldn't work. That is, check if It would be cleaner if pulldown-cmark indicated which kind of code block it was, but I also think that semver-breaking changes are disruptive. I think a temporary workaround would be better, and then make this change (or something like it) when there are enough changes to justify a bump to 0.7. |
Thanks @ehuss ! I don't see the problem to make a medium release with such small amount of changes. It's just a release and they're pretty easy to do. Anyway, I'd love to finally be able to update this rustdoc's dependency. |
0.x changes are the same as major version changes. It's a problem for projects like mdbook, where plugins need to keep semver-compatible versions in sync. It also requires all projects to manually opt-in to the new version. |
If they want to use it, sure. But nothing forces them to. |
Thanks @ehuss, that did the trick!
I feel much the same way. Exposing the distinction between indented and delimited codeblocks seems to make sense for the next major version change, but I'm very hesitant to release straight away. Concerning the implementation: do we need to expose the delimiters themselves? Those could be recovered using the sourcemap, right? We could just wrap the syntax string in an option, where |
Well, we're forced to remain on an older buggy pulldown-cmark version on rustdoc so I personally see a lot of reasons to make an update with "just" this.
I like your suggestion but I find it a bit less clear from just reading the code. So as you prefer. Just tell me if I need to update the current PR. |
Any news? |
No new insights here. I believe the way forward is to wrap the syntax string in either an |
I prefer the enum way then: more explicit from my point of view. So I guess this PR is ready from my point of view. Waiting this next release very eagerly. |
There doesn't seem to be a need to add the fence. A codeblock could just contain a |
Ok, updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea seems right, but needs a few more changes.
src/parse.rs
Outdated
@@ -204,7 +234,7 @@ enum ItemBody { | |||
|
|||
Rule, | |||
Heading(u32), // heading level | |||
FencedCodeBlock(CowIndex), | |||
FencedCodeBlock(CowIndex, CowIndex), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fence is still stored here, which is unnecessary and increases the size of a Node<Item>
by a word.
src/parse.rs
Outdated
} | ||
} | ||
|
||
pub fn get_fences(&self) -> Option<&CowStr<'a>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns the syntax string and not the fence, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the enum is so simple, I'd argue we could drop the impl
block altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll keep the two other methods, it's pretty convenient. :)
1cd6631
to
2755bb0
Compare
Removed the unneeded parts. |
Ping? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay. Apart from the re-exporting issue, this looks good to merge.
@@ -38,6 +38,30 @@ use crate::tree::{Tree, TreeIndex, TreePointer}; | |||
// https://spec.commonmark.org/0.29/#link-destination | |||
const LINK_MAX_NESTED_PARENS: usize = 5; | |||
|
|||
/// Codeblock kind. | |||
#[derive(Clone, Debug, PartialEq)] | |||
pub enum CodeBlockKind<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definition should probably be re-exported here, as there is currently no way to extract the language string.
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Any news about a new release? |
Hi, I'm wondering about the status of: @marcusklaas is there any work that I can help with that will move us towards a |
Sorry for the delay in replying here. We could do a release, but because of a very minor API change it would have to be a 'major' version bump. I see several options:
I'm not sure what's optimal. Thoughts? |
Why not making a major release anyway and another later? I don't have a problem with it. :) |
The 0.7.0 release is finally out! |
Awesome, thanks! |
Fixes #415.
Once merged, can we have another release as quickly as possible please? I'd really love to be able to merge rust-lang/rust#65894 (a few fixes depend on it as well).