Performance: Require `BufRead` instead of just `Read` for inputs. #427

This commit makes a breaking API change - it changes the `R: Read` constraint (in `Decoder` and `Reader` structs) to the `R: BufRead` constraint. This helps performance by avoiding copying the input data into an additional, intermediate `BufReader` that used to be stored in the (internal) `ReadDecoder::reader` field (after these changes that field is `R` rather than `BufReader`). In particular, some input types (e.g. when decoding from a `&[u8]`) already implement `BufRead` and for such types it is wasteful to introduce additional buffering via `BufReader`. The impact of the change is significant, but relatively small - this means that it mostly shows up in `noncompressed...` benchmarks which magnify the cost of code paths that are not related to `fdeflate` nor `unfilter`. Impact on benchmark runtime looks as follows (run once, and then rerun after compiling before+after with a fresh nightly `rustc`): * kodim02.png: - No change in performance detected (p = 0.08 > 0.05) - [+1.3713% +1.7241% +2.0960%] (p = 0.00 < 0.05) * kodim07.png: - [-1.1466% -0.6693% -0.2705%] (p = 0.00 < 0.05) - No change in performance detected. (p = 0.35 > 0.05) * kodim17.png: - [-2.3062% -1.2878% +0.1746%] (p = 0.05 < 0.05) - [-2.7355% -1.9939% -0.7986%] (p = 0.00 < 0.05) * kodim23.png: - No change in performance detected. (p = 0.51 > 0.05) - [-1.4834% -1.0648% -0.6692%] (p = 0.00 < 0.05) * Lohengrin...png: - [-2.0606% -1.7935% -1.4756%] (p = 0.00 < 0.05) - [-4.2412% -3.6723% -3.0327%] (p = 0.00 < 0.05) * Transparency.png: - [+1.4991% +1.8812% +2.3429%] (p = 0.00 < 0.05) - [-0.7939% -0.5746% -0.3590%] (p = 0.00 < 0.05) * noncompressed-8x8.png: - [-2.2881% -1.4801% -0.4110%] (p = 0.00 < 0.05) - [-7.5687% -7.2013% -6.8838%] (p = 0.00 < 0.05) * noncompressed-128x128.png: - [-12.495% -12.132% -11.760%] (p = 0.00 < 0.05) - [-10.597% -10.230% -9.8399%] (p = 0.00 < 0.05)

Similarily to image-rs@1636b55 this commit tries to ensure that the working set fits into the L1 cache. Before this commit, the whole `ZlibStream::out_buffer` could be filled out and this buffer is potentially bigger than the typical 32kB of the L1 cache. After this commit, `MAX_INCREMENTAL_DECOMPRESSION_SIZE` limits how many bytes can be written to `ZlibStream::out_buffer` in a single call to `fdeflate::Decompressor::read`.

After this commit the size of the initial allocation of `ZlibStream::out_buffer` should be big enough to avoid having to grow the capacity of the buffer backing this vector of bytes. Removing this overhead is expected to have a positive impact on runtime performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: Require `BufRead` instead of just `Read` for inputs. #427

Performance: Require `BufRead` instead of just `Read` for inputs. #427

Commits on Jan 10, 2024

Performance: Require BufRead instead of just Read for inputs. #427

Are you sure you want to change the base?

Performance: Require BufRead instead of just Read for inputs. #427

Commits on Jan 10, 2024

Performance: Require `BufRead` instead of just `Read` for inputs. #427

Performance: Require `BufRead` instead of just `Read` for inputs. #427