New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urllib3.response.GzipDecoder is accidentally quadratic, which allows a malicious server to DoS urllib3 clients #1467
Labels
Comments
Cool, do you wanna send a PR to fix this or just let me or @SethMichaelLarson pick it up? |
If you could pick it up that would be great |
I've created a PR to resolve the issue, take a look when you've got time. |
This was referenced Mar 9, 2021
This was referenced Mar 16, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Here's a <100 KB file that the
gzip
module decompresses in ~200 ms. But if we useurllib3.request.GzipDecoder
to decompress it, then it burns CPU for >10 seconds.Since urllib3 attempts to decode gzip files by default, this means a malicious server can easily cause urllib3-based clients to waste tons of CPU time.
The problem is that this is a gzip file with lots and lots of members concatenated together. When urllib3 encounters such a file, it decodes each member in sequence, and accumulates the result into a
bytes
object via repeated calls to+=
.On a
bytes
object, each call to+=
is O(n), so this loop is accidentally-quadratic.If we make
ret
abytearray
instead, it fixes the problem:In this test, the only thing I changed is to replace the line
ret = b""
withret = bytearray()
. A real fix would probably want to avoid returning bytearray objects to the user, so I guess you'd either want to accumulate a list-of-bytes and calljoin
at the end, or else accumulate in a bytearray and then convert back tobytes
at the end?Even after this fix I think there's technically still some quadratic behavior in the way we pass
.unused_data
from one decompression object to the next, but at least that's quadratic in the size of the compressed file, rather than the uncompressed file? I'm not sure if this is triggerable in practice. If we want to be extra careful, we could put an upper bound on how much data we feed intoself._obj.decompress
on each pass through the loop.I haven't hit this in the real world; I just noticed it by accident when looking at the code.
I don't think this is a particularly serious vulnerability – gzip decompression inherently allows some amount of DoS (e.g. by sending a file that expands by a factor of 1000 to use up lots of memory). But it is a real issue, and I guess if someone wants to go get a CVE I guess it probably qualifies.
The text was updated successfully, but these errors were encountered: