Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add boolean to tell if it's an indented code block or not #416

Merged
merged 5 commits into from Dec 18, 2019

Conversation

GuillaumeGomez
Copy link
Contributor

Fixes #415.

Once merged, can we have another release as quickly as possible please? I'd really love to be able to merge rust-lang/rust#65894 (a few fixes depend on it as well).

Copy link
Contributor

@oberien oberien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a quickfix this seems fine. But if this feature is required long-term, I'd suggest to add a proper way to differenciate not only indented and fenced code-blocks, but also their fence. That includes the symbol used as a fence (~ or `) as well as the number of fence-characters (3 for ```, 4 for ~~~~). The size of Tag shouldn't be affected because the CodeBlock variants suggested below are still smaller than the Link variant.

This PR and the associated issue might also be related to / help with #271.

I could imagine something like this:

enum Tag<'a> {
    CodeBlock(Fence, CowStr<'a>),
}
enum Fence {
    Indented,
    Fenced(FenceCharacter, usize), // enum FenceCharacter { Tilde, Grave }
}
// or
enum Fence<'a> {
    Indented,
    Fenced(CowStr<'a>), // e.g. "```" or `~~~~~`
}

src/parse.rs Outdated
CodeBlock(CowStr<'a>),
/// A code block.
///
/// The boolean is `true` is this is an indented code block (not starting with "\`\`\`").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The boolean is `true` is this is an indented code block (not starting with "\`\`\`").
/// The boolean is `true` if this is an indented code block (not starting with ```` ``` ```` or `~~~`).

Rendered (with GitHub Markdown): /// The boolean is true if this is an indented code block (not starting with ``` or ~~~).

@marcusklaas
Copy link
Collaborator

Normally I'd be okay with merging a small change like this, but since it is a compatibility break, we should be a bit more careful.

First, could you elaborate on when the distinction between indented and delimited codeblocks is relevant? I see what you wrote here, but I don't quite get what rustdoc does there. Ideally, we'd only really expose semantic differences and leave users to dig into the source map if they care about constitutes a code block, even if that feels a bit hacky.

Secondly, since this is a public interface, some care should be taken with respect to the design. A boolean works for sure, but @oberien's suggestion may be more future proof if we can find a purpose for the size of the delimiters. Wrapping the CowStr<'a> in an option could also work.

Finally, it's a bit uncomfortable to do a major release so soon after 0.6 with only very minor changes. I'm not opposed to it, but we should at least clearly identify the reasoning for the bump. If we can do that, I'm sure we can settle on a design quickly (FLW, I know)

@GuillaumeGomez
Copy link
Contributor Author

I saw a very funny case where an indented code block started with "```". Since we are parsing this to check if we're supposed to compile it and test it, we're in a problematic situation because we don't know if it's actually an indented code block. If we had this information, we wouldn't need to parse and just assume it's a code block.

With the small hack I wrote to know this, it's an issue in case a code block is inside a list.

So like @oberien suggested, I'll extend this PR a bit in order to also provide the string that makes this code block as a code block ("```" for example). And yes, it'll require a medium release. :)

@marcusklaas
Copy link
Collaborator

So rustdoc only compiles delimited codeblocks (starting with either ``` or ~~~~) but not indented codeblocks, is that correct?

@GuillaumeGomez
Copy link
Contributor Author

No we compile both. But the difference is that you can give instructions on non-indented codeblocks.

@marcusklaas
Copy link
Collaborator

How are these instructions encoded? Are they in the language snippet of the codeblock?

@GuillaumeGomez
Copy link
Contributor Author

Yes, looks like this for example:

"```compile_fail,edition2018,E0594"

@marcusklaas
Copy link
Collaborator

Cool, thanks. Are delimited codeblocks without instructions treated like indented codeblocks? Because you could just go off the language snippet then, right?

@GuillaumeGomez
Copy link
Contributor Author

The same way at least. If no additional data is provided, we just assume it's a rust codeblock.

@oberien
Copy link
Contributor

oberien commented Nov 18, 2019

As marcus said, in your given example ```compile_fail,edition2018,E0594, you can parse the lang attribute of the code-block. That's also what I'm doing in heradoc, where I also add custom attributes you can use within the language definition of the code-block.

@GuillaumeGomez
Copy link
Contributor Author

We do. The problem is that we still need to know if we have to.

@GuillaumeGomez
Copy link
Contributor Author

Ok, I updated the PR to include the fence string as well (using an enum). I also added some tests. I think it's ready?

@GuillaumeGomez
Copy link
Contributor Author

Any news?

@marcusklaas
Copy link
Collaborator

It's still not exactly clear to me what problem this solves. If you could provide a comprehensive summary that would be very useful.

@GuillaumeGomez
Copy link
Contributor Author

"```" should be consided content in a codeblock:

    ```

This one should be not indented code block:

1. this is a list!
2. Yeay
    ```rust
    content
    ```

In both cases, I need to know if I am in an indented codeblock or not to know if I need to check the "```" content. I found a trick by checking if whitespaces were present before the codeblock start but I realized it would fail in case it was in a list.

To provide even more context: if the "syntax" is empty, we assume by default this is rust. We then check for this "```" to see if extra information has been passed down. This is the issue.

@marcusklaas
Copy link
Collaborator

I still don't understand. What do you mean by "```" content? And by this extra information, do you mean stuff like compile_fail,edition2018,E0594? That's in the syntax string right? It's not yet clear to me why you can't just check the syntax string and pass that along, or default to rust when it's not set.

@GuillaumeGomez
Copy link
Contributor Author

I'm bad at explaining. Please read the link to the rustdoc PR in the first comment to have full information. That might be better than me trying to resume the situation...

@ehuss
Copy link
Contributor

ehuss commented Nov 21, 2019

I can try to clarify. When rustdoc fails to parse a code block, it has a suggestion to add the text annotation. But it only gives that suggestion for fenced code blocks (with ```), since it wouldn't make sense for indented code blocks.

The problem is that it can be tricky to determine if something is a fenced code block or indented code block. You can't just check if the code block starts with ```, because someone could have an indented code block that starts with ```. Due to some changes in the 0.6 API, this is now even harder than before.

I'm still not convinced my suggestion given in rust-lang/rust#65894 (comment) wouldn't work. That is, check if Event::Start(Tag::CodeBlock(_)) and Event::Text(_) start at the same offset, then it should probably be an indented code block.

It would be cleaner if pulldown-cmark indicated which kind of code block it was, but I also think that semver-breaking changes are disruptive. I think a temporary workaround would be better, and then make this change (or something like it) when there are enough changes to justify a bump to 0.7.

@GuillaumeGomez
Copy link
Contributor Author

Thanks @ehuss !

I don't see the problem to make a medium release with such small amount of changes. It's just a release and they're pretty easy to do. Anyway, I'd love to finally be able to update this rustdoc's dependency.

@ehuss
Copy link
Contributor

ehuss commented Nov 22, 2019

I don't see the problem to make a medium release with such small amount of changes.

0.x changes are the same as major version changes. It's a problem for projects like mdbook, where plugins need to keep semver-compatible versions in sync. It also requires all projects to manually opt-in to the new version.

@GuillaumeGomez
Copy link
Contributor Author

If they want to use it, sure. But nothing forces them to.

@marcusklaas
Copy link
Collaborator

Thanks @ehuss, that did the trick!

It would be cleaner if pulldown-cmark indicated which kind of code block it was, but I also think that semver-breaking changes are disruptive. I think a temporary workaround would be better, and then make this change (or something like it) when there are enough changes to justify a bump to 0.7.

I feel much the same way. Exposing the distinction between indented and delimited codeblocks seems to make sense for the next major version change, but I'm very hesitant to release straight away.

Concerning the implementation: do we need to expose the delimiters themselves? Those could be recovered using the sourcemap, right? We could just wrap the syntax string in an option, where Some(syntax_string) would indicate a delimited codeblock and None an indented one.

@GuillaumeGomez
Copy link
Contributor Author

Well, we're forced to remain on an older buggy pulldown-cmark version on rustdoc so I personally see a lot of reasons to make an update with "just" this.

Concerning the implementation: do we need to expose the delimiters themselves? Those could be recovered using the sourcemap, right? We could just wrap the syntax string in an option, where Some(syntax_string) would indicate a delimited codeblock and None an indented one.

I like your suggestion but I find it a bit less clear from just reading the code. So as you prefer. Just tell me if I need to update the current PR.

@GuillaumeGomez
Copy link
Contributor Author

Any news?

@marcusklaas
Copy link
Collaborator

No new insights here. I believe the way forward is to wrap the syntax string in either an Option or a bespoke enum. We could then include this patch in the next major release. I understand there is an effort under way to add smart punctuation, which may be a good feature to do a release for.

@GuillaumeGomez
Copy link
Contributor Author

GuillaumeGomez commented Nov 27, 2019

I prefer the enum way then: more explicit from my point of view. So I guess this PR is ready from my point of view. Waiting this next release very eagerly.

@marcusklaas
Copy link
Collaborator

There doesn't seem to be a need to add the fence. A codeblock could just contain a CodeBlockKind, where the CowStr refers to the syntax string instead of the fence.

@GuillaumeGomez
Copy link
Contributor Author

Ok, updated.

Copy link
Collaborator

@marcusklaas marcusklaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea seems right, but needs a few more changes.

src/parse.rs Outdated
@@ -204,7 +234,7 @@ enum ItemBody {

Rule,
Heading(u32), // heading level
FencedCodeBlock(CowIndex),
FencedCodeBlock(CowIndex, CowIndex),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fence is still stored here, which is unnecessary and increases the size of a Node<Item> by a word.

src/parse.rs Outdated
}
}

pub fn get_fences(&self) -> Option<&CowStr<'a>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns the syntax string and not the fence, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the enum is so simple, I'd argue we could drop the impl block altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep the two other methods, it's pretty convenient. :)

@GuillaumeGomez
Copy link
Contributor Author

Removed the unneeded parts.

@GuillaumeGomez
Copy link
Contributor Author

Ping?

Copy link
Collaborator

@marcusklaas marcusklaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay. Apart from the re-exporting issue, this looks good to merge.

@@ -38,6 +38,30 @@ use crate::tree::{Tree, TreeIndex, TreePointer};
// https://spec.commonmark.org/0.29/#link-destination
const LINK_MAX_NESTED_PARENS: usize = 5;

/// Codeblock kind.
#[derive(Clone, Debug, PartialEq)]
pub enum CodeBlockKind<'a> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition should probably be re-exported here, as there is currently no way to extract the language string.

@GuillaumeGomez
Copy link
Contributor Author

Done!

Copy link
Collaborator

@marcusklaas marcusklaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@marcusklaas marcusklaas merged commit e10b680 into pulldown-cmark:master Dec 18, 2019
@GuillaumeGomez GuillaumeGomez deleted the indented-or-not branch December 18, 2019 12:42
@GuillaumeGomez
Copy link
Contributor Author

Any news about a new release?

@Others
Copy link

Others commented Jan 31, 2020

Hi, I'm wondering about the status of:
rust-lang/rust#66035 (an error improvement)
which is blocked on:
rust-lang/rust#65894 (upgrading pulldown-cmark)
which is (in my understanding) blocked on a new release of pulldown-cmark

@marcusklaas is there any work that I can help with that will move us towards a pulldown-cmark release? It's now been three months since the last release of the crate, so I do not think another release would be premature :)

@marcusklaas
Copy link
Collaborator

Sorry for the delay in replying here. We could do a release, but because of a very minor API change it would have to be a 'major' version bump. I see several options:

  • simply bump a major version and do a release
  • wait until a new feature lands, like smart punctuation or custom header ids. This could take a while.
  • do a minor release of the master branch before the breaking change

I'm not sure what's optimal. Thoughts?

cc @GuillaumeGomez @raphlinus @ScottAbbey

@GuillaumeGomez
Copy link
Contributor Author

Why not making a major release anyway and another later? I don't have a problem with it. :)

@marcusklaas
Copy link
Collaborator

The 0.7.0 release is finally out!

@GuillaumeGomez
Copy link
Contributor Author

Awesome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include full codeblock start
5 participants