New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not compress the outer parts of a SyntaxSet #398
Do not compress the outer parts of a SyntaxSet #398
Conversation
Since the lazy-loaded syntaxes already has their data compressed, it is only detrimental to performance to compress another time. So stop compressing the SyntaxSet and add APIs for clients too.
/// data that has been embedded in your own binary with the [`include_bytes!`] | ||
/// macro. | ||
#[cfg(any(feature = "dump-load", feature = "dump-load-rs"))] | ||
fn from_uncompressed_data<T: DeserializeOwned>(v: &[u8]) -> Result<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the spirit of #98 I chose to return a Result<T>
here instead of T
as from_binary
. I also chose to call it _data
instead of _binary
since I think _data
is clearer. But I am of course willing to make any changes that my reviewer(s) request.
I'm not entirely comfortable to merge this PR without any review, so I'll leave this open for at least another month. After it is merged, it would be fantastic with a new syntect release, because we can then practically immediately make a bat release that makes use of lazy-loading of syntaxes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, this isn't that big or weird of a change so I would have been fine with you merging this on your own after a month, for reference.
I'm happy to do a release now! Do you know offhand whether any important public APIs changed in a breaking way? I remember one minor change to something public that I'm pretty sure nobody uses, that I'm fine doing in a non-major release, but was there anything other than that? If you don't remember I can look through myself.
Good to know, I'll take this into account when evaluating complexity/risk of PRs in the future 👍 Looking forward to the release! I quickly skimmed through
In terms of changes to the public API, I checked the diff using my home-cooked method: #!/usr/bin/env bash
set -o errexit -o nounset -o pipefail
RUSTDOCFLAGS='-Z unstable-options --output-format json' cargo +nightly doc --no-deps
jq '.index | .[] | select(.crate_id == 0) | select(.visibility == "public") | .name' target/doc/syntect.json | sort > public-api-symbols.txt Diffing the output of v4.6.0 with origin/master produces the following list of changes:
To be precise; the diff of adding I did all this in a bit of a rush since now I need to prepare for Christmas, but it should be a good baseline for you when you double-check the diff yourself :) Feel free to remove and add and do whatever changes you like to my draft! |
Thanks so much for figuring out the changelog for me, that made things much quicker! I just pushed the release and published Thanks again for all your great contributions and your patience with how spottily I've put time into maintenance. |
Here is the last PR in the lazy-loading series.
Since the lazy-loaded syntaxes already has their data compressed, it is only
detrimental to performance to compress another time. So stop compressing the
SyntaxSet
and add APIs for clients too.To summarize impact of lazy-loading work, here are criterion benchmark numbers with v4.6.0 as baseline (with some back-ported benches to make it possible to compare):
Much faster:
from_dump_file
takes 1.3263 ms instead of 42.257 ms, an improvement with -96.859%load_internal_dump
show similar numbers asfrom_dump_file
Significant improvements:
load_and_highlight/*
: Each test is ~41 ms faster compared to before (due to theload_internal_dump
improvement). For example,load_and_highlight/"parser.rs"
takes ~434 ms instead of ~467 ms, andload_and_highlight/"highlight_test.erb"
takes ~32 ms instead of ~62 ms.No practical difference in performance compared to v4.6.0:
parse/*
highlight/*
add_from_folder
load_internal_themes
stack_matching
highlight_html
Much slower:
link_syntaxes
takes ~200 ms instead of ~17 ms since it now involves serialization and compression. It's for the greater good though of course since linking is a one time thing while loading dumps happens over and over, and affects startup time in a much more fundamental way.Another example of something that has gotten much faster is
syncat
. The improvement numbers I gave in the prototype code is still applicable.Overview of the syntax lazy-loading work:
contexts
inSyntaxReference
tocontext_ids
#379contexts
fromSyntaxSet
into individualSyntaxReference
s #382(Note that MSRV CI will fail due to #397 not being merged)
(Note that projects that have their own code for dumps (like bat) is not affected by this PR, but for "normal" clients this PR is important.)