Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add from_path and from_string variants that don't buffer output #86

Closed
wants to merge 14 commits into from

Conversation

hsanzg
Copy link

@hsanzg hsanzg commented Oct 7, 2023

Makes the Serializer API take an instance of std::io::Write where the serialized contents are written, instead of placing them in an internal buffer that is then converted to a String.

Moves the ASCII-detection code to the visitor API, where we can check for non-ASCII characters only in the relevant places: strings, selectors and units. This also means that the finish method no longer performs a memory copy of the entire serialized output when a non-ASCII character is detected.

Other miscellaneous changes include removing some allocations in the visit_calculation, write_comment and visit_stmt (in particular, the code in charge of writing a KeyframesRuleSet) methods.

The current APIs create a buffer, pass it to a new serializer instance, and finally interpret the buffer's contents as a UTF-8 string. The new public compiler functions write_from_path and write_from_string work exactly in the same manner, but take an (ideally buffered) writer as their last argument.

This passes all tests run by executing cargo test on the main directory, but I'm not sure if I missed another test suite present in the repo. My computer is far too slow to run any benchmarks, but I suspect this changeset should speed up the serialization stage considerably.

A `Serializer` now receives an instance of `std::io::Write`. This means
that we can no longer write the UTF-8 BOM nor the `@charset "UTF-8"
directive when serialization is done and instead do the ASCII-content
check while visiting the AST.

All `inspect_*` methods now create a temporary buffer and pass it to a
`Serializer`. Then they (unsafely) interpret the buffer as a UTF-8
string. The goal is to avoid this extra set of allocations and always
write directly to the writer passed by the API user.

We also removed some extra allocations marked with todo comments.
This also adds a public API to write the output of a compiled set of
SASS files into a `Write`. We should add a recommendation to the
rustdoc about passing a buffered writer; otherwise performance will
probably be horrible.

The previous `String`-based APIs were also moved to this new set of APIs
by creating a temporary `Vec<u8>` buffer, passing it to the serializer,
and finally (unsafely) reinterpreting the buffer contents as a UTF-8
string.
@connorskees
Copy link
Owner

Thank you for this change and opening this PR. I'm realizing now that the serialization code could use some more work, and this is an interesting step towards fixing some of that up. I quite like the @keyframes change.

Do you have a particular use-case that relies on being able to write to an arbitrary dyn Write vs getting back a String? I assume this, alongside the is_ascii changes have been done for performance?

For the is_ascii change, my immediate assumption is that the final is_ascii check over the entire string is actually faster than doing many smaller checks, because the final is_ascii should get compiled down to a very fast SIMD loop over a large number of characters at once vs lots of function calls and tree walking in the visitor. Additionally, the is_ascii check (and subsequent memcpy in the case of non-ascii output) shouldn't be a large part of the total execution time. I would be surprised if it showed up in a profiler at all for any input, though I also don't think I've ever profiled Sass code that does have non-ascii output. At the very least, in the case of bootstrap and other large libraries, the serialization step barely shows up during profiling iirc.

For the dyn Write change, I'm wary of adding new functionality to the public API and some of the code complexity that comes with supporting this. Similar to the above, I'd be surprised if writing to an intermediate buffer String before writing to a file shows up in the profile at all for any input. If you have a particular use case or a profile that demonstrates the performance win, I'd be happy to add such functionality.

Even if your computer is slow, you might still be able to run benchmarks and profiles. For benchmarking, I usually use hyperfine to benchmark the CLI and often ad hoc usage of std::time::Instant to print out how long particular sections of code take to run.

If you find your computer is still too noisy to benchmark reliably, I do all of my benchmarking and profiling on dedicated virtual machines which I SSH into. The pricing is quite reasonable and they're very easy to set up and tear down.

For profiling I primarily use perf. I'm less familiar with good and free tools to use on Windows, but I think some people use DTrace. cargo flamegraph is also a great tool that builds on top of perf and can show you a different visualization from perf report.

@connorskees
Copy link
Owner

Hello,

Thank you again for your contribution. I am closing out this PR as it seems inactive, but please feel free to re-open at any time. If you're able to expand on your use case for accepting a dyn Writer, I'd be interested to hear more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants