proposal: zstd: add DecodedSize function #647

dsnet · 2022-07-27T01:55:37Z

I'm trying to implement proper rate-limiting and I'm finding it difficult to adequately rate-limit a server when dealing with untrusted []byte inputs since I don't know how large the decompressed output will end up. I could convert the []byte into a io.Reader and perform rate limiting based on the number of bytes read from zstd.Decoder, but that seems somewhat unfortunate.

I propose the addition of:

// DecodedLength reports the decompressed length of src in bytes.
func (*Decoder) DecodedLength(src []byte) (int64, error)

It may or may not make sense as a method on Decoder.

IIUC, this should only need to perform work proportional to O(len(src)). The amount of memory it will occupy is primarily what's needed to initialize the ANS tables and to read each symbol. We would then count each literal symbol and the length of each LZ77 back-reference symbol.

This API also assists in determining what sized buffers to grab for the output.

The text was updated successfully, but these errors were encountered:

klauspost · 2022-07-27T08:06:50Z

I would say that unless you have a huge amount of blocks you want to discard, you are much better off just decoding.

Reconstructing the sequences are about 50% of the workload.

For streams the decoding is split, so sequences are reconstructed separately from executing the sequences, so this is a pretty representative time.

For blocks the operations are performed at the same time, since a few blocks will not gain much benefit from this split and single blocks will be much slower.

So in overall terms, it seems you might as well decompress them.

dsnet · 2022-07-27T08:12:20Z

Thanks for the thoughts. I trust your judgement on the cost of this functionality.

Unless I'm missing something, a challenge I'm facing is that there's no efficient way to change the WithDecoderMaxMemory for a given Decoder without creating a new one.

For example, an untrusted client claims a message decompresses to N number of bytes, and so I account for N number of bytes in the rate-limiter. I then decompress the message and discover that it has actually decompressed to something much larger than N, thus bypassing the rate-limiter. In this situation N changes for every message, while the Decoder API expects N to be fairly static.

klauspost · 2022-07-27T08:27:44Z

@dsnet Correct, the max size is set per Decoder. It is mainly there for DOS protection, not as a validation.

While it could be a solution, instead of adding more API, instead we could add a WithDecodeAllCapLimit(b bool) that would change the DecodeAll to set the maximum decoded size to cap(dst)-len(dst) for each call.

How does that sound?

dsnet · 2022-07-27T08:33:50Z

It is mainly there for DOS protection

Technically, rate-limiting exists as a form of DOS protection 😏 . I just happen to be implementing it at a finer granularity than what the API appears to have been designed for.

instead we could add a WithDecodeAllCapLimit(b bool) that would ...

That would work for my use-case. You could also call it WithDecodeAllNoAllocate or something to signal that it doesn't allocate any new buffers.

klauspost · 2022-07-27T08:37:27Z

Technically, rate-limiting exists as a form of DOS protection 😏

Not DDOS, but DOS by "zip bombing" and causing OOM.

That would work for my use-case. You could also call it WithDecodeAllNoAllocate or something to signal that it doesn't allocate any new buffers.

While that would likely be a side-effect, I don't want to promise that.

WithDecodeAllCapLimit will limit DecodeAll to decoding cap(dst)-len(dst) bytes, or any size set in WithDecoderMaxMemory. This can be used to limit decoding to a specific maximum output size. Disabled by default. Fixes #647

klauspost · 2022-07-27T10:19:20Z

@dsnet PTAL at #649

I don't have much time for testing, but added a basic test. Is it possible for you to try it out?

* zstd: Add DecodeAllCapLimit WithDecodeAllCapLimit will limit DecodeAll to decoding cap(dst)-len(dst) bytes, or any size set in WithDecoderMaxMemory. This can be used to limit decoding to a specific maximum output size. Disabled by default. Fixes #647

klauspost mentioned this issue Jul 27, 2022

zstd: Add DecodeAllCapLimit #649

Merged

klauspost closed this as completed in #649 Jul 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: zstd: add DecodedSize function #647

proposal: zstd: add DecodedSize function #647

dsnet commented Jul 27, 2022 •

edited

klauspost commented Jul 27, 2022

dsnet commented Jul 27, 2022

klauspost commented Jul 27, 2022

dsnet commented Jul 27, 2022

klauspost commented Jul 27, 2022

klauspost commented Jul 27, 2022 •

edited

proposal: zstd: add DecodedSize function #647

proposal: zstd: add DecodedSize function #647

Comments

dsnet commented Jul 27, 2022 • edited

klauspost commented Jul 27, 2022

dsnet commented Jul 27, 2022

klauspost commented Jul 27, 2022

dsnet commented Jul 27, 2022

klauspost commented Jul 27, 2022

klauspost commented Jul 27, 2022 • edited

dsnet commented Jul 27, 2022 •

edited

klauspost commented Jul 27, 2022 •

edited