Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib3.response.GzipDecoder is accidentally quadratic, which allows a malicious server to DoS urllib3 clients #1467

Closed
njsmith opened this issue Nov 1, 2018 · 3 comments

Comments

@njsmith
Copy link
Contributor

njsmith commented Nov 1, 2018

Here's a <100 KB file that the gzip module decompresses in ~200 ms. But if we use urllib3.request.GzipDecoder to decompress it, then it burns CPU for >10 seconds.

In [51]: evil = gzip.compress(b"\x00" * 1032 * 40) * 1350                                      

In [52]: len(evil)                                                                             
Out[52]: 99900

In [53]: %time x = gzip.decompress(evil)                                                       
CPU times: user 230 ms, sys: 11.9 ms, total: 242 ms
Wall time: 240 ms

In [54]: %time x = urllib3.response.GzipDecoder().decompress(evil)                             
CPU times: user 5.87 s, sys: 7.73 s, total: 13.6 s
Wall time: 13.6 s

Since urllib3 attempts to decode gzip files by default, this means a malicious server can easily cause urllib3-based clients to waste tons of CPU time.

The problem is that this is a gzip file with lots and lots of members concatenated together. When urllib3 encounters such a file, it decodes each member in sequence, and accumulates the result into a bytes object via repeated calls to +=.

On a bytes object, each call to += is O(n), so this loop is accidentally-quadratic.

If we make ret a bytearray instead, it fixes the problem:

In [62]: %time x = MyGzipDecoder().decompress(evil)                                            
CPU times: user 167 ms, sys: 8.41 ms, total: 175 ms
Wall time: 174 ms

In this test, the only thing I changed is to replace the line ret = b"" with ret = bytearray(). A real fix would probably want to avoid returning bytearray objects to the user, so I guess you'd either want to accumulate a list-of-bytes and call join at the end, or else accumulate in a bytearray and then convert back to bytes at the end?

Even after this fix I think there's technically still some quadratic behavior in the way we pass .unused_data from one decompression object to the next, but at least that's quadratic in the size of the compressed file, rather than the uncompressed file? I'm not sure if this is triggerable in practice. If we want to be extra careful, we could put an upper bound on how much data we feed into self._obj.decompress on each pass through the loop.

I haven't hit this in the real world; I just noticed it by accident when looking at the code.

I don't think this is a particularly serious vulnerability – gzip decompression inherently allows some amount of DoS (e.g. by sending a file that expands by a factor of 1000 to use up lots of memory). But it is a real issue, and I guess if someone wants to go get a CVE I guess it probably qualifies.

@theacodes
Copy link
Member

Cool, do you wanna send a PR to fix this or just let me or @SethMichaelLarson pick it up?

@njsmith
Copy link
Contributor Author

njsmith commented Nov 1, 2018

If you could pick it up that would be great

@sethmlarson
Copy link
Member

I've created a PR to resolve the issue, take a look when you've got time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants