Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax highlighting support #67

Open
zkat opened this issue Sep 21, 2021 · 13 comments
Open

Syntax highlighting support #67

zkat opened this issue Sep 21, 2021 · 13 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@zkat
Copy link
Owner

zkat commented Sep 21, 2021

Add support for syntax highlighting through syntect.

I figure this can be done by having the graphical handler take an optional SyntaxSet, and have an optional identifier token method as part of SpanContents, so the renderer can color code accordingly.

The biggest challenge here is designing things such that things are appropriately feature flagged and don't grow binaries too much?

@zkat zkat added enhancement New feature or request help wanted Extra attention is needed labels Sep 21, 2021
@zkat zkat mentioned this issue Sep 21, 2021
14 tasks
@fasterthanlime
Copy link

If you're looking for something a little less regexpy (TextMate grammars are basically large regexp soups if I recall correctly), you may want to look at https://lib.rs/crates/tree-sitter - I've had good experiences with it, it was just missing a couple grammars so I rolled my own crate to build them: https://github.com/fasterthanlime/tree-sitter-collection

(Hopefully that's useful input, if not feel free to collapse that comment!)

@zkat
Copy link
Owner Author

zkat commented Sep 23, 2021

Oh hey no this is actually really cool and relevant! I wonder what the story would be with people writing their own custom ones 🤔

@dswij
Copy link

dswij commented Sep 27, 2021

The biggest challenge here is designing things such that things are appropriately feature flagged and don't grow binaries too much?

I wonder if this is easy to be benchmarked. That might be really helpful

@erratic-pattern
Copy link
Contributor

erratic-pattern commented Apr 23, 2023

@zkat I made a very rough draft attempt at writing a custom SourceCode for integrating syntect, but I've been running into issues with getting the labels to render on the correct columns. I basically adapted the context_info function from miette and added some logic to ignore ANSI escapes to try to trick the graphical reporter into doing the right thing, but it may not be possible without a new custom reporter or I may be misunderstanding something about what the SpanContents is supposed to represent.

My current test example looks like this:

image

Code is here: https://github.com/kallisti-dev/miette-syntect/blob/40f1a63b398e9c9cbda964076cd1d11e7b538c56/src/highlighted_source.rs#L195

I'm likely just doing something wrong here, but I'm not sure what.

I might try this as a PR to miette instead (with everything behind a feature flag) since that would make it easier to ensure it's doing the right things with labels, and also makes it possible to only run the highlighter on the actual part that's being rendered. my version highlights the whole source because the 'a constraint on SpanContents data() makes it difficult to reference anything that's owned by the SpanContents itself.

As a side note, you'd probably also want a smarter renderer because the default syntect::util::as_24_bit_terminal_escaped inserts a lot of unnecessary escape codes. the rendering logic in bat is smarter and would be good to use as a reference.

@erratic-pattern
Copy link
Contributor

erratic-pattern commented Apr 23, 2023

Some more thoughts on this.

My initial idea was to try this as a custom SourceCode, since that gives the user the ability to customize which syntax highlighting they use per-file rather than having the reporter try to figure it out. It also doesn't require an entirely different ReportHandler beyond what ReportHandlerOpts provides, which is convenient. However it looks like there needs to be a new ReportHandler that's aware of ANSI escape sequences so that it can ignore them when handling label placement.

I suppose an idealized API would be that you set the SyntaxSet and Theme in the ReportHandler options and then you set the SyntaxReference on each individual SourceCode using wrappers over the find_syntax_* family of functions from syntect. But I'm not sure how you would pass the SyntaxSet and Theme to the SourceCode when it comes time to render without an overhaul of the current design. I think it requires too many changes to miettes design for something that should be an optional feature.

Maybe the simpler approach in my example could work, where the SourceCode type takes everything needed for highlighting, but we just augment the existing GraphicalReporter to simply be aware of and ignore ANSI escape sequences when rendering?

@erratic-pattern
Copy link
Contributor

erratic-pattern commented Apr 23, 2023

I tried several approaches to only highlight the SpanContents instead of the entire SourceCode, but couldn't get anything working under the constraints of the SourceCode and SpanContents traits as they currently exist, so I resigned myself to just ANSI escaping the whole source code all at once.

A change to get this working would be a slight modification to the SpanContents trait to allow something like fn data(&self) -> Cow<'a, [u8]> so that it's easier to allocate new data containing the ANSI escapes.

Alternatively, changing SourceCode::read_span to have a &mut self receiver could also work because then the SourceCode implementation can cache the syntect ParseState and HighlightState and the ANSIfied lines in a sparse map of the form IndexMap<usize, String>. But I don't necessarily think this is the best choice, because it doesn't really make sense for the API to use a mutable reference here, and interior mutability with a Mutex lets you workaround this...

...almost. The &'a [u8] constraint in SpanContents::data strikes again, making it impossible to generate any data via interior mutability that outlives the Mutex lock. I think Cow<'a, [u8]> would make all of this much simpler because I can simply return an owned Vec of the ANSI escaped data in the SpanContents rather than trying to figure out some way to create a slice with the same lifetime as the entire SourceCode

@zkat
Copy link
Owner Author

zkat commented Apr 23, 2023

My thoughts on implementing this:

  1. the SourceCode trait should be extended with a new (optional) method: fn syntax<'a>(&'a self) -> Option<&syntect::parsing::SyntaxReference>.
  2. Change the Line struct in graphical.rs to include a new rendered: Option<String> field, which will contain the post-highlighting version of the line.
  3. Use the output from highlight_line to generate initial syntect stuff, then have logic on our end to convert its Vec<(Style, &str)> output into Vec<(owo_colors::Style, &str)>, and then apply that directly to the line, saving it as rendered: Some(line_with_styles_applied). This'll probably mean that we hard-code a mapping of one particular syntect theme to a generic "color" (non-RGB) mapping that will adapt across terminals.
  4. Do all width/location calculation with Line.text, just like we do now.
  5. Actually render Line.rendered.unwrap_or_else(|| line.text) when we're rendering lines themselves.
  6. Keep the mutable HighlightLines state right int get_lines.
  7. (optional, more work?): allow configuration of scope highlights through the ThemeStyles struct. Probably just shove a pub scopes: HashMap<String, Scope> in there, where String is the scope name.

tl;dr I think all the actual highlighting logic should live entirely in GraphicalReportHandler, but we can allow external configuration through an optional method in SourceCode.

@erratic-pattern
Copy link
Contributor

Quick update on this. I have a rough draft of this working locally. Using a similar approach to what you outlined above and keeping all the mutable state in GraphicalReportHandler

image

I'll submit a draft PR once I clean it up a bit.

Next steps are to design the API and document. It would be nice if the configuration API could wrap everything in syntect so the user doesn't need to include a syntect dependency for SyntaxReference or Theme, but I'm not sure what would be a sensible way to do this.

Making the choice of highlighter customizable with a trait would be nice as well, for tree-sitter support or for a more specialized highlighter. For my own use case, I really only need Rust highlighting so I would opt for a specialized parser/highlighter if I had the option to use one.

@zkat
Copy link
Owner Author

zkat commented Oct 27, 2023

@kallisti-dev omg that looks amazing

@erratic-pattern
Copy link
Contributor

Draft PR at #313

It ended up being easier to create a new global for the highlighter than it was to cram it into GraphicalReportHandler, because doing so meant that it could no longer derive Debug and Clone, the latter of which it relies on in render_causes. Not sure if that's the approach we wanna go with, but it certainly made things easier.

This adds a new syntect feature flag, which pulls in fancy automatically when used. It also automatically configures the global highlighter to use syntect by default, though the user can override the default settings (to override default theme, for example) with set_highlighter.

There is also a generic Highlighter trait. The goal with this design is to make the choice of highlighter customizable, and eliminate any exposing of crate-specific types in the API surface. I was using tree_sitter as a reference for what another syntax highlighting API might look like.

To make language detection easier, I added a name() method to SourceCode. This makes it possible for syntect to detect language via file extensions if the name corresponds to a file name (This would also pair well with a future SourceCode implementation for files such as #297).

I am also considering adding a language() method to SourceCode, which would make the language choice explicit, similar to the syntax() method mentioned above but agnostic of any particular highlighting crate. This string gets passed directly into SyntaxSet::find_syntax_by_name so for example Rust, C, TOML, and TypeScript would all work. Right now this isn't exposed anywhere in the API for the user to configure, but I think it would make sense to add a with_language method to NamedSource to explicitly choose a language, as well as perhaps a source_code(language = "Rust") syntax to the derive attributes which would only worked with a NamedSource.

I think implementing language detection for tree-sitter might be a bit more of a challenge because it uses an opaque FFI type that you import from a language-specific crate (e.g. tree_sitter_rust) to configure its language choice. So if we want to do tree sitter support it might be best to just let the highlighter struct itself choose how to detect language, by having a config hook such as FnOnce<&dyn SourceCode> -> tree_sitter::Language and then just have the user supply that.

@erratic-pattern
Copy link
Contributor

Here's what dbg!(syntect::highlighting::ThemeSet::load_defaults().themes["base16-ocean.dark"]) looks like:

https://gist.github.com/kallisti-dev/d43f4feb726ee15d50f7a15e7fb8c9d4

Maybe we could apply settings from GraphicalTheme onto a custom syntect Theme so that the user can customize it. It would also make supporting nocolor easier. Unfortunately it's a rather complicated structure so I'm not sure where to start.

@erratic-pattern
Copy link
Contributor

https://gist.github.com/kallisti-dev/95ca7db90e6c635e0c3a67b858f39509

For anyone who wants to see what the default themes look like on their local setup, I've made a quick script to render a Rust hello world with all of them. You can run this locally on my branch to see how they render with your terminal settings.

Here's what it looks like on my screen, using Fira Code with VS Code terminal

image

image

I think a bolder color palette like base16-eighties.dark or Solarized would work well with the default miette theme.

@Benjamin-L
Copy link
Contributor

@kallisti-dev I used a script to take screenshots of the example program with some different light and dark background themes:

screenshots-syntect

IMO none of the syntect themes are readable across all the terminal themes I tested, with solarized probably being the closest.

I'm thinking the best thing to do might be to set our own rgb background color when using syntect. We'd also probably want to use a rgb theme for the rest of miette in this case, since the user's terminal theme isn't guaranteed to work with our background. It isn't great, but is probably the least-bad compromise until we get 16-color ANSI support with syntect.

theme testing script

This is very unlikely to work on somebody else's setup, but could probably be adapted to any linux wm without too much effort.

#!/usr/bin/env bash

set -e -u -o pipefail

out="$1"
themes=(
    ayu-light
    github-dark
    github-light
    gruvbox-dark
    gruvbox-light-medium
    ibm3270
    mellow-purple
    red-sands
    solarized-dark
    solarized-light
    argonaut
    ocean
)

mkdir -p $out
for theme in ${themes[@]}; do
    nix run nixpkgs#theme-sh "$theme"
    grim -g "$(swaymsg -t get_tree | jq -j '.. | select(.type?) | select(.focused).rect | "\(.x),\(.y) \(.width)x\(.height)"')" "$out/$theme.png"
done

montage "./$out/*" -mode Concatenate "$out.png"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants