Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Tag and Event source locations. #725

Open
Caellian opened this issue Oct 11, 2023 · 5 comments
Open

Provide Tag and Event source locations. #725

Caellian opened this issue Oct 11, 2023 · 5 comments

Comments

@Caellian
Copy link

Caellian commented Oct 11, 2023

I want to suggest that Events and Tags retain their source location and for it to be accessible when they are consumed.

Currently, if the CommonMark parses something as an element, there's no way of restoring it back to source form (without possibly losing spacing/symbols).

This would allow multi-line Events to capture raw contents which would make it straightforward to fix issues like #716.

It also addresses extensibility as non-standard parsers could detect markers/delimiters in Event::Text and reinterpret all intermediate events/tags as raw text for processing. That means that extensions could be implemented as structs implementing Iterator<Item = MyEvent> and wrapping pulldown-cmark::Parser which would make them very composable.

Also, bare CommonMark parser could be separated and extended as needed by end-users, while an extension layer that takes in current Options could provide currently supported non-standard functionality.

@Martin1887
Copy link
Collaborator

It seems a pretty big change, but it may be very useful and it will be considered. Thanks for your suggestion

@Caellian
Copy link
Author

Caellian commented Oct 12, 2023

For reference, I have implemented it in my fork here. I estimate ~1/3 of the additions is due to rustfmt.

Tags don't need to pass it as providing it with Events effectively does that (Tag wrapped in Event::Start and Event::End).

As I had a lot of code already working with events the way they are currently structured, I ended up:

  • Renaming Event to EventData.
  • Adding Event that contains EventData.
  • Implementing Deref on Event.

Using Deref meant that in my code that was doing match event {...} I just had to dereference the variable (match &*event {...}) and the rest of the code didn't have to change.

As all places that were constructing Event already had access to span this ended up being a pretty trivial change to implement, although it touches a lot and breaks semver.

Adding Span into Event seemed like a worse choice because of repetition.

If you want I can make a PR, but I'm not entirely happy with Span interface (though it's usable).

@ollpu
Copy link
Collaborator

ollpu commented Oct 17, 2023

FWIW, I don't see how this external API change would help fixing #716. The span of the inline code item is known when the content is collected, but the challenge is trimming indentation from subsequent lines.

The idea for a more composable extension system is certainly interesting. It becomes challenging when you want to parse something with higher priority than something else, after the fact. If, for example, strikethrough was implemented as an extension that runs after the fact (or before other inline passes), ~~*x~~*y* or *~~x*~~ couldn't be parsed the same as they are now.

@Caellian
Copy link
Author

Caellian commented Oct 17, 2023

FWIW, I don't see how this external API change would help fixing #716.

I misunderstood the issue. You're right. I wasn't familiar with the codebase yet and assumed too much.

It becomes challenging when you want to parse something with higher priority than something else

That is also true. From my experiments I've found that parts that are missing for proper composition support are:

  • being able to halt the inner parser when the outer one detects a start tag (to prevent double parsing)
  • being able to run the inner parser from a certain point forward
    • I think only forward seek is necessary as the first start tag should take precedence
      • Unless it's unclosed... uhm...

I'll do some more testing once I catch up with other stuff and create another issue/PR for composition if someone doesn't do it first. That being said... there's really not much use for Spans other than that - if the Parser position can't be moved back or forth.

@Martin1887
Copy link
Collaborator

Interesting idea but out of the scope for the 0.10 version. Maybe it is more for mid term than long term, however.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants