Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Require BufRead instead of just Read for inputs. #427

Draft
wants to merge 3 commits into
base: next
Choose a base branch
from

Commits on Jan 10, 2024

  1. Require BufRead instead of just Read for inputs.

    This commit makes a breaking API change - it changes the `R: Read`
    constraint (in `Decoder` and `Reader` structs) to the `R: BufRead`
    constraint.  This helps performance by avoiding copying the input data
    into an additional, intermediate `BufReader` that used to be stored in
    the (internal) `ReadDecoder::reader` field (after these changes that
    field is `R` rather than `BufReader`).  In particular, some input types
    (e.g. when decoding from a `&[u8]`) already implement `BufRead` and for
    such types it is wasteful to introduce additional buffering via
    `BufReader`.
    
    The impact of the change is significant, but relatively small - this
    means that it mostly shows up in `noncompressed...` benchmarks which
    magnify the cost of code paths that are not related to `fdeflate` nor
    `unfilter`.  Impact on benchmark runtime looks as follows (run once, and
    then rerun after compiling before+after with a fresh nightly `rustc`):
    
    * kodim02.png:
        - No change in performance detected (p = 0.08 > 0.05)
        - [+1.3713% +1.7241% +2.0960%] (p = 0.00 < 0.05)
    * kodim07.png:
        - [-1.1466% -0.6693% -0.2705%] (p = 0.00 < 0.05)
        - No change in performance detected. (p = 0.35 > 0.05)
    * kodim17.png:
        - [-2.3062% -1.2878% +0.1746%] (p = 0.05 < 0.05)
        - [-2.7355% -1.9939% -0.7986%] (p = 0.00 < 0.05)
    * kodim23.png:
        - No change in performance detected. (p = 0.51 > 0.05)
        - [-1.4834% -1.0648% -0.6692%] (p = 0.00 < 0.05)
    * Lohengrin...png:
        - [-2.0606% -1.7935% -1.4756%] (p = 0.00 < 0.05)
        - [-4.2412% -3.6723% -3.0327%] (p = 0.00 < 0.05)
    * Transparency.png:
        - [+1.4991% +1.8812% +2.3429%] (p = 0.00 < 0.05)
        - [-0.7939% -0.5746% -0.3590%] (p = 0.00 < 0.05)
    * noncompressed-8x8.png:
        - [-2.2881% -1.4801% -0.4110%] (p = 0.00 < 0.05)
        - [-7.5687% -7.2013% -6.8838%] (p = 0.00 < 0.05)
    * noncompressed-128x128.png:
        - [-12.495% -12.132% -11.760%] (p = 0.00 < 0.05)
        - [-10.597% -10.230% -9.8399%] (p = 0.00 < 0.05)
    anforowicz committed Jan 10, 2024
    Configuration menu
    Copy the full SHA
    ee896d1 View commit details
    Browse the repository at this point in the history
  2. Cap how much output a single decompression step can produce.

    Similarily to
    image-rs@1636b55
    this commit tries to ensure that the working set fits into the L1 cache.
    Before this commit, the whole `ZlibStream::out_buffer` could be filled
    out and this buffer is potentially bigger than the typical 32kB of the
    L1 cache.
    
    After this commit, `MAX_INCREMENTAL_DECOMPRESSION_SIZE` limits how many
    bytes can be written to `ZlibStream::out_buffer` in a single call to
    `fdeflate::Decompressor::read`.
    anforowicz committed Jan 10, 2024
    Configuration menu
    Copy the full SHA
    ac815e7 View commit details
    Browse the repository at this point in the history
  3. Avoid the cost of reallocating+copying ZlibStream::out_buffer

    After this commit the size of the initial allocation of
    `ZlibStream::out_buffer` should be big enough to avoid having to grow
    the capacity of the buffer backing this vector of bytes.  Removing this
    overhead is expected to have a positive impact on runtime performance.
    anforowicz committed Jan 10, 2024
    Configuration menu
    Copy the full SHA
    b2495a3 View commit details
    Browse the repository at this point in the history