urllib3.response.GzipDecoder is accidentally quadratic, which allows a malicious server to DoS urllib3 clients #1467

njsmith · 2018-11-01T00:57:19Z

Here's a <100 KB file that the gzip module decompresses in ~200 ms. But if we use urllib3.request.GzipDecoder to decompress it, then it burns CPU for >10 seconds.

In [51]: evil = gzip.compress(b"\x00" * 1032 * 40) * 1350                                      

In [52]: len(evil)                                                                             
Out[52]: 99900

In [53]: %time x = gzip.decompress(evil)                                                       
CPU times: user 230 ms, sys: 11.9 ms, total: 242 ms
Wall time: 240 ms

In [54]: %time x = urllib3.response.GzipDecoder().decompress(evil)                             
CPU times: user 5.87 s, sys: 7.73 s, total: 13.6 s
Wall time: 13.6 s

Since urllib3 attempts to decode gzip files by default, this means a malicious server can easily cause urllib3-based clients to waste tons of CPU time.

The problem is that this is a gzip file with lots and lots of members concatenated together. When urllib3 encounters such a file, it decodes each member in sequence, and accumulates the result into a bytes object via repeated calls to +=.

On a bytes object, each call to += is O(n), so this loop is accidentally-quadratic.

If we make ret a bytearray instead, it fixes the problem:

In [62]: %time x = MyGzipDecoder().decompress(evil)                                            
CPU times: user 167 ms, sys: 8.41 ms, total: 175 ms
Wall time: 174 ms

In this test, the only thing I changed is to replace the line ret = b"" with ret = bytearray(). A real fix would probably want to avoid returning bytearray objects to the user, so I guess you'd either want to accumulate a list-of-bytes and call join at the end, or else accumulate in a bytearray and then convert back to bytes at the end?

Even after this fix I think there's technically still some quadratic behavior in the way we pass .unused_data from one decompression object to the next, but at least that's quadratic in the size of the compressed file, rather than the uncompressed file? I'm not sure if this is triggerable in practice. If we want to be extra careful, we could put an upper bound on how much data we feed into self._obj.decompress on each pass through the loop.

I haven't hit this in the real world; I just noticed it by accident when looking at the code.

I don't think this is a particularly serious vulnerability – gzip decompression inherently allows some amount of DoS (e.g. by sending a file that expands by a factor of 1000 to use up lots of memory). But it is a real issue, and I guess if someone wants to go get a CVE I guess it probably qualifies.

The text was updated successfully, but these errors were encountered:

theacodes · 2018-11-01T01:38:00Z

Cool, do you wanna send a PR to fix this or just let me or @SethMichaelLarson pick it up?

njsmith · 2018-11-01T05:05:47Z

If you could pick it up that would be great

sethmlarson · 2018-11-01T13:27:23Z

I've created a PR to resolve the issue, take a look when you've got time.

theacodes added Urgent Proposed Solution Accepted labels Nov 1, 2018

sethmlarson self-assigned this Nov 1, 2018

sethmlarson mentioned this issue Nov 1, 2018

Use bytearray to accumulate bytes from gzip #1468

Merged

sethmlarson closed this as completed in #1468 Nov 1, 2018

ofek mentioned this issue Nov 28, 2018

Upgrade requests DataDog/integrations-core#2656

Merged

njsmith mentioned this issue Mar 1, 2019

Should we merge the Stream and Channel interfaces? python-trio/trio#959

Open

This was referenced Mar 16, 2021

build(deps): bump urllib3 from 1.23 to 1.24.2 in /docker/qc shahcompbio/wgs#36

Closed

Bump urllib3 from 1.21.1 to 1.24.2 shubham-shrivastava/smsify#5

Merged

Bump urllib3 from 1.23 to 1.24.2 in /modulos/proj-isolado-teste victorbertoldo/py#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

urllib3.response.GzipDecoder is accidentally quadratic, which allows a malicious server to DoS urllib3 clients #1467

urllib3.response.GzipDecoder is accidentally quadratic, which allows a malicious server to DoS urllib3 clients #1467

njsmith commented Nov 1, 2018

theacodes commented Nov 1, 2018

njsmith commented Nov 1, 2018

sethmlarson commented Nov 1, 2018

urllib3.response.GzipDecoder is accidentally quadratic, which allows a malicious server to DoS urllib3 clients #1467

urllib3.response.GzipDecoder is accidentally quadratic, which allows a malicious server to DoS urllib3 clients #1467

Comments

njsmith commented Nov 1, 2018

theacodes commented Nov 1, 2018

njsmith commented Nov 1, 2018

sethmlarson commented Nov 1, 2018