Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlib: add zstd support #52100

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

zlib: add zstd support #52100

wants to merge 4 commits into from

Conversation

jkrems
Copy link
Contributor

@jkrems jkrems commented Mar 15, 2024

Adds ZstdCompress and ZsdDecompress to the zlib module which can be used to compress/decompress with the Zstandard ("zstd") algorithm.

Notable omissions:

  • Providing dictionaries isn't implemented.
  • The docs in zlib.md don't call out any params beyond the basic compression level.

The code follows similar patterns to the PR that added Brotli support. Just that instead of brotli, it adds the equivalent zstd APIs. Just like Brotli, this required separate compression/decompression context objects.

Zstd itself has been around and stable for multiple years but this PR is early in terms of web support: It only just starts shipping by default in Chrome 123. On the other hand, by shipping in Chrome it will soon be supported quite widely on the web. Firefox also signaled support (https://bugzilla.mozilla.org/show_bug.cgi?id=1301878#c65).

Official support in node.js would allow passing additional WPTs around fetch (nodejs/undici#2847).

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/security-wg

@nodejs-github-bot nodejs-github-bot added dependencies Pull requests that update a dependency file. needs-ci PRs that need a full CI run. labels Mar 15, 2024
@jkrems jkrems force-pushed the zstd branch 4 times, most recently from ec19b6e to 28914c7 Compare March 17, 2024 00:07
@jkrems
Copy link
Contributor Author

jkrems commented Mar 17, 2024

Alright, got up to making the basics (seemingly) work:

$ out/Debug/node -p 'zlib.zstdDecompressSync(zlib.zstdCompressSync("Hello World")).toString()'
Hello World

@jkrems jkrems force-pushed the zstd branch 6 times, most recently from 383ff46 to 47c7ab8 Compare March 17, 2024 20:59
@jkrems jkrems marked this pull request as ready for review March 17, 2024 21:00
jkrems added a commit to jkrems/node that referenced this pull request Mar 17, 2024
jkrems added a commit to jkrems/node that referenced this pull request Mar 17, 2024
jkrems added a commit to jkrems/node that referenced this pull request Mar 17, 2024
jkrems added a commit to jkrems/node that referenced this pull request Mar 17, 2024
jkrems added a commit to jkrems/node that referenced this pull request Mar 17, 2024
jkrems added a commit to jkrems/node that referenced this pull request Mar 17, 2024
jkrems added a commit to jkrems/node that referenced this pull request Mar 17, 2024
jkrems added a commit to jkrems/node that referenced this pull request Mar 17, 2024
Copy link
Contributor

The notable-change PRs with changes that should be highlighted in changelogs. label has been added by @richardlau.

Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section.

@jkrems
Copy link
Contributor Author

jkrems commented Mar 18, 2024

/cc @terrelln @Cyan4973 from the zstd side in case this PR is doing a bad job representing how zstd should be used.

@nodejs-github-bot
Copy link
Collaborator

Copy link

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only looked at the usage of the Zstd API. It looks about right, I've left some minor comments. I'm not familiar with the code, and it isn't clear to me exactly how DoThreadPoolWork() is called, so I can't be completely confident. But broadly it looks right.

One thing to consider is that some advanced parameters directly can cause Zstd to allocate a lot of memory (e.g. setting ZSTD_c_hashLog == ZSTD_c_chainLog == ZSTD_c_windowLog == 31 will cause Zstd to allocate at least 6 GB during compression). Additionally, the parameter ZSTD_c_nbWorkers will cause Zstd to spawn worker threads during compression.

This normally isn't an issue, because the author of the code is trusted. I assume that this is also the threat model for NodeJS. But, because it is JS, I did want to raise the point.

src/node_zlib.cc Outdated Show resolved Hide resolved
src/node_zlib.cc Outdated Show resolved Hide resolved
@terrelln
Copy link

There's no .params() support even though zstd should support changing (some) params at runtime.

There must be at least compression level support ZSTD_c_compressionLevel. This is the most important parameter by far.

What do you mean at runtime? You can change any parameters you want before the (de)compression starts. After that, changing compression parameters generally isn't allowed.

No input size hints for compression because it looked like that API in zstd wasn't stable yet.

If the input size is known in advance, then you can use ZSTD_setPledgedSrcSize(), however it is an error to pass an incorrect value. Additionally, if the first call to ZSTD_compressStream2() uses ZSTD_e_end, Zstd will infer the input size.

If the input size is only approximately known, but may be wrong, then Zstd also supports setting ZSTD_c_srcSizeHint. Zstd will use this parameter for tuning its tradeoffs for that source size, but it is allowed to be wrong. But, as you say this is a newer feature that we haven't yet stabilized. It works and is fully supported, but all new features go through our experimental API before being stabilized, so we have the freedom to tweak or remove them in the future.

@jkrems
Copy link
Contributor Author

jkrems commented Apr 13, 2024

What do you mean at runtime? You can change any parameters you want before the (de)compression starts. After that, changing compression parameters generally isn't allowed.

Thanks for confirming. The "at runtime" refers to the deflate-specific feature (deflateParams in zlib) that allows changing certain params within the same compression stream (at block boundaries IIUC). Node-level docs: https://nodejs.org/api/zlib.html#zlibparamslevel-strategy-callback

Updated the PR description to correct this part.

If the input size is known in advance, then you can use ZSTD_setPledgedSrcSize(), however it is an error to pass an incorrect value. Additionally, if the first call to ZSTD_compressStream2() uses ZSTD_e_end, Zstd will infer the input size.

Ah, thanks. Saw ZSTD_c_srcSizeHint but should've kept looking. Added pledgedSrcSize as an option.

@jkrems jkrems added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 13, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 13, 2024
@nodejs-github-bot
Copy link
Collaborator

@jkrems
Copy link
Contributor Author

jkrems commented Apr 13, 2024

Rebased after the fixes to ensure it still lands cleanly.

@jkrems jkrems added the zlib Issues and PRs related to the zlib subsystem. label Apr 13, 2024
@jkrems jkrems mentioned this pull request Apr 13, 2024
@richardlau richardlau added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 14, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 14, 2024
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@jkrems
Copy link
Contributor Author

jkrems commented Apr 15, 2024

Looking at the test failures, they look unrelated at first glance (different tests on different OS, not related to compression afaict). Resuming the build to see if the are flakes.

@nodejs-github-bot
Copy link
Collaborator

@jkrems jkrems added the c++ Issues and PRs that require attention from people who are familiar with C++. label May 8, 2024
@bricss
Copy link

bricss commented May 12, 2024

Maybe it's possible to bump zstd to the latest version? 🤔

@jkrems
Copy link
Contributor Author

jkrems commented May 12, 2024

Happy to rerun the import. I think right now I'm also looking for signal that this is something that other collaborators want in core. Otherwise rebasing (potentially) and reimporting wouldn't really be worth it - it would just bitrot again.

@bricss
Copy link

bricss commented May 12, 2024

I cannot speak on behalf of collaborators, but imho W3C community would certainly like to see it in the core 🧐
Given the fact that Zstd Content-Encoding will soon be supported by majority of modern browsers 🧭

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. dependencies Pull requests that update a dependency file. needs-ci PRs that need a full CI run. notable-change PRs with changes that should be highlighted in changelogs. semver-minor PRs that contain new features and should be released in the next minor version. zlib Issues and PRs related to the zlib subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants