Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Brotli algorithm encoding #4525

Closed
ThomasProctor opened this issue Feb 27, 2018 · 17 comments
Closed

Support Brotli algorithm encoding #4525

ThomasProctor opened this issue Feb 27, 2018 · 17 comments

Comments

@ThomasProctor
Copy link

Right now, only gzip and deflate are supported. Brotli is pretty common, it ought to be too.

Expected Result

Responses encoded with the Brotli Algorithm should be decoded

Actual Result

Brotli responses need to be manually decoded

Reproduction Steps

import requests


headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
 'Accept-Encoding': 'br',
 'Accept-Language': 'en-US,en;q=0.5',
 'Connection': 'keep-alive',
 'DNT': '1',
 'Upgrade-Insecure-Requests': '1',
 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0'}

print(requests.get('https://www.google.com', headers=test_headers).text```

prints an encoded page.
## System Information

{
"chardet": {
"version": "3.0.4"
},
"cryptography": {
"version": "2.0.3"
},
"idna": {
"version": "2.6"
},
"implementation": {
"name": "CPython",
"version": "3.4.3"
},
"platform": {
"release": "3.13.0-142-generic",
"system": "Linux"
},
"pyOpenSSL": {
"openssl_version": "1010006f",
"version": "17.2.0"
},
"requests": {
"version": "2.18.4"
},
"system_ssl": {
"version": "1000106f"
},
"urllib3": {
"version": "1.22"
},
"using_pyopenssl": true
}

@sigmavirus24
Copy link
Contributor

This is dependent upon urllib3/urllib3#713 which has stalled. Until then, we cannot add it ourselves and the work must be tracked in urllib3.

@ihgazni2
Copy link

similiar problem

import json
import brotli
pobj(r.headers)
{
'Server': 'openresty',
'Date': 'Fri, 13 Apr 2018 03:44:15 GMT',
'Content-Type': 'application/json;charset=utf-8',
'Transfer-Encoding': 'chunked',
'Connection': 'close',
'Vary': 'Accept-Encoding',
'Cache-Control': 'no-cache,no-store',
'Pragma': 'no-cache',
'Last-Modified': 'Fri, 13 Apr 2018 03:37:14.577 GMT',
'Strict-Transport-Security': 'max-age=31536000',
'Content-Encoding': 'br'
}
r.headers['Content-Encoding']
'br'
r.content
b"\x15\xb8\x00\x00\xc4\xaa\xe9\x94_\x9d\x84+\xdf\x1d\x12\x8f\xa2\x7f\x15\x8aS\x0c\x8eq\xe0\xb4\xc9\xb5\xe0\xbe\xc34\xe0\xe0!+\xe53\x92J\xfc\xb0\xe9~Q\x14<,\x84\xe5\x1e\x16|\xa3\x87\xb9I\xf4\xb0\xd8z|\x98W_\x1f\xe6[\x95\x87\xc5\x90\xdc\xc3<'O\xd526\x18\xf3\xfcEA\xe9\xda\x9d\xdc\xcez%-\x1a\x0b\xbb\xc3\xc7x\xc3\x86pO\x8a\xb3\x82\xf52P\xe0\xeb \xf1\xb3\xd6\xeeI&\x91\xf6\x1a\xb5\xd2\xc0\x1f"
bytstrm = brotli.decompress(r.content)
bytstrm
b'{"content3":"\u66f4\u65b0\u4e00\u7bc7\u535a\u5ba2\u7684\u5185\u5bb9","createdAt":"2018-04-13T03:37:14.577Z","updatedAt":"2018-04-13T03:37:14.577Z","objectId":"5ad025eaac502e003ca73a0d"}'
r.encoding
'utf-8'
jstr = bytstrm.decode(r.encoding)
jstr
'{"content3":"\u66f4\u65b0\u4e00\u7bc7\u535a\u5ba2\u7684\u5185\u5bb9","createdAt":"2018-04-13T03:37:14.577Z","updatedAt":"2018-04-13T03:37:14.577Z","objectId":"5ad025eaac502e003ca73a0d"}'
js = json.loads(jstr)
pobj(js)
{
'content3': '更新一篇博客的内容',
'createdAt': '2018-04-13T03:37:14.577Z',
'updatedAt': '2018-04-13T03:37:14.577Z',
'objectId': '5ad025eaac502e003ca73a0d'
}

@immerrr
Copy link

immerrr commented Jan 30, 2019

@sigmavirus24 now that brotli has landed in urllib3 (urllib3/urllib3#1532, urllib3/urllib3#1533, urllib3/urllib3#1534), should we reopen this issue or do you want me to open a new one?

@VeNoMouS
Copy link

VeNoMouS commented May 2, 2019

While not ideal, the best work around I came up for the time being... if anyone actually needs it..

I am actually doing this in a super() for one of my modules..

import brotli

session = requests.session()
rsp = session.get('http://somehostwithbrotli/')
if resp.headers.get('Content-Encoding') == 'br':
    resp._content = brotli.decompress(resp.content)

# As content is a function, which calls upon the attribute variable _content.
print resp.content
# etc... etc...
print resp.text

@northtree
Copy link

northtree commented May 20, 2019

Since Brotli is not installed via urllib3 by default, you have to install brotli package if you want to support Brotli encoding via latest Requests (v2.22.0).

$ pip install brotlipy

or

$ pip install brotli

After that, just specify br for requests header.

>>> import requests
>>> requests.__version__
'2.22.0'
>>> headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux i686; rv:64.0) Gecko/20100101 Firefox/64.0', 'Accept-Encoding': 'br, gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
>>> r = requests.get('https://www.google.com/', headers=headers)
>>> r.headers['Content-Encoding']
'br'
>>> r.text[:10]
'<!doctype '

@VeNoMouS
Copy link

VeNoMouS commented May 20, 2019

Not sure why they're using brotlipy when the pypi brotli package is maintained by google and brotli was initially written by... google

@northtree
Copy link

@VeNoMouS I believe the reason is google brotli

Only Python 2.7+ is supported.

@VeNoMouS
Copy link

VeNoMouS commented May 20, 2019

yea you might be right i just saw that myself , that said... 2.7+ would mean py3 ... and frankly wtf uses below 2.7?

@northtree
Copy link

@VeNoMouS You are right. I also read the BrotliDecoder in urllib3. Both are supported.

Supports both 'brotlipy' and 'Brotli' packages since they share an import name.

https://github.com/urllib3/urllib3/blob/64e413f1b2fef86a150ae747f00aab0e2be8e59c/src/urllib3/response.py#L100

@VeNoMouS
Copy link

ah, nice spotting there :)

@gdubicki
Copy link
Contributor

Can we please reopen this, @nateprewitt or @sethmlarson ? - Brotli support is in urllib since 1.25.1 released in 2019-04-24.

Would you accept a PR for this feature?

@sethmlarson
Copy link
Member

@gdubicki I'd accept a PR adding Brotli support if it's detected.

@sethmlarson sethmlarson reopened this Jul 27, 2020
gdubicki pushed a commit to gdubicki/requests that referenced this issue Jul 30, 2020
gdubicki pushed a commit to gdubicki/requests that referenced this issue Aug 9, 2020
gdubicki pushed a commit to gdubicki/requests that referenced this issue Aug 9, 2020
gdubicki pushed a commit to gdubicki/requests that referenced this issue Aug 9, 2020
by the urllib3 version (>= 1.23.1) and brotli package being
present
gdubicki pushed a commit to gdubicki/requests that referenced this issue Aug 9, 2020
by the urllib3 version (>= 1.23.1) and brotli package being
present
gdubicki pushed a commit to gdubicki/requests that referenced this issue Aug 9, 2020
by the urllib3 version (>= 1.23.1) and brotli package being
present
@gdubicki
Copy link
Contributor

gdubicki commented Aug 9, 2020

Can you please check out my draft (but almost complete, I think) PR #5554, @sethmlarson ?

gdubicki pushed a commit to gdubicki/requests that referenced this issue Aug 10, 2020
by the urllib3 version (>= 1.25.1) and brotli package being
present
dgtlmoon added a commit to dgtlmoon/changedetection.io that referenced this issue Feb 2, 2021
…ts, be sure that users cant accidently use this content type encoding in the headers
@dilyanpalauzov
Copy link
Contributor

I think fixing this is a matter of utilizing urllib3.util.make_headers(keep_alive=True, accept_encoding=True, user_agent='...') in requests.requests.default_headers() . This will automatically add ,br to accept encoding, if urllib3 can decode brotli.

Urllib3 can decode brotli, if https://pypi.org/project/Brotli/ or https://pypi.org/project/brotlicffi/ are installed. When they are installed, ,br' is included in make_headers()`.

I propose changing the patch to use urllib3.util.make_headers() in requests.default_headers() and stating in the documentation of the requests when brotli decoding is utilized.

@dilyanpalauzov
Copy link
Contributor

While everybody talks about Brotli-Encoding support, I guess it is meant brotli decoding. When the http agent sends “Accept-Encoding: br’, the server encodes with brotli, and the http agents decodes. Is this about decoding brotli compressed content in the http agent (requester), or is this case about adding brotli-encoding for servers?

@dilyanpalauzov
Copy link
Contributor

#5783 proposes a patch to request and decode in brotli, transparently by Requests, whenever the package brotli or brotlicffi are installed.

@nateprewitt
Copy link
Member

This was resolved in #5783.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants