Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: zstd: add DecodedSize function #647

Closed
dsnet opened this issue Jul 27, 2022 · 6 comments · Fixed by #649
Closed

proposal: zstd: add DecodedSize function #647

dsnet opened this issue Jul 27, 2022 · 6 comments · Fixed by #649

Comments

@dsnet
Copy link
Contributor

dsnet commented Jul 27, 2022

I'm trying to implement proper rate-limiting and I'm finding it difficult to adequately rate-limit a server when dealing with untrusted []byte inputs since I don't know how large the decompressed output will end up. I could convert the []byte into a io.Reader and perform rate limiting based on the number of bytes read from zstd.Decoder, but that seems somewhat unfortunate.

I propose the addition of:

// DecodedLength reports the decompressed length of src in bytes.
func (*Decoder) DecodedLength(src []byte) (int64, error)

It may or may not make sense as a method on Decoder.

IIUC, this should only need to perform work proportional to O(len(src)). The amount of memory it will occupy is primarily what's needed to initialize the ANS tables and to read each symbol. We would then count each literal symbol and the length of each LZ77 back-reference symbol.

This API also assists in determining what sized buffers to grab for the output.

@klauspost
Copy link
Owner

I would say that unless you have a huge amount of blocks you want to discard, you are much better off just decoding.

Reconstructing the sequences are about 50% of the workload.

For streams the decoding is split, so sequences are reconstructed separately from executing the sequences, so this is a pretty representative time.

For blocks the operations are performed at the same time, since a few blocks will not gain much benefit from this split and single blocks will be much slower.

So in overall terms, it seems you might as well decompress them.

@dsnet
Copy link
Contributor Author

dsnet commented Jul 27, 2022

Thanks for the thoughts. I trust your judgement on the cost of this functionality.

Unless I'm missing something, a challenge I'm facing is that there's no efficient way to change the WithDecoderMaxMemory for a given Decoder without creating a new one.

For example, an untrusted client claims a message decompresses to N number of bytes, and so I account for N number of bytes in the rate-limiter. I then decompress the message and discover that it has actually decompressed to something much larger than N, thus bypassing the rate-limiter. In this situation N changes for every message, while the Decoder API expects N to be fairly static.

@klauspost
Copy link
Owner

@dsnet Correct, the max size is set per Decoder. It is mainly there for DOS protection, not as a validation.

While it could be a solution, instead of adding more API, instead we could add a WithDecodeAllCapLimit(b bool) that would change the DecodeAll to set the maximum decoded size to cap(dst)-len(dst) for each call.

How does that sound?

@dsnet
Copy link
Contributor Author

dsnet commented Jul 27, 2022

It is mainly there for DOS protection

Technically, rate-limiting exists as a form of DOS protection 😏 . I just happen to be implementing it at a finer granularity than what the API appears to have been designed for.

instead we could add a WithDecodeAllCapLimit(b bool) that would ...

That would work for my use-case. You could also call it WithDecodeAllNoAllocate or something to signal that it doesn't allocate any new buffers.

@klauspost
Copy link
Owner

Technically, rate-limiting exists as a form of DOS protection 😏

Not DDOS, but DOS by "zip bombing" and causing OOM.

That would work for my use-case. You could also call it WithDecodeAllNoAllocate or something to signal that it doesn't allocate any new buffers.

While that would likely be a side-effect, I don't want to promise that.

klauspost added a commit that referenced this issue Jul 27, 2022
WithDecodeAllCapLimit will limit DecodeAll to decoding cap(dst)-len(dst) bytes, or any size set in WithDecoderMaxMemory.

This can be used to limit decoding to a specific maximum output size.

Disabled by default.

Fixes #647
@klauspost
Copy link
Owner

klauspost commented Jul 27, 2022

@dsnet PTAL at #649

I don't have much time for testing, but added a basic test. Is it possible for you to try it out?

klauspost added a commit that referenced this issue Jul 29, 2022
* zstd: Add DecodeAllCapLimit

WithDecodeAllCapLimit will limit DecodeAll to decoding cap(dst)-len(dst) bytes, or any size set in WithDecoderMaxMemory.

This can be used to limit decoding to a specific maximum output size.

Disabled by default.

Fixes #647
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants