-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
br handling #5038
br handling #5038
Conversation
Why does these checks failed when I used simple print? Logging a warning passed all of these |
Can you please check why the test failed? I think the first time I used to log warning and this time I simply printed the statement which can be the error |
The errors seem unrelated to your changes, and caused by internal issues in the GitHub infrastructure. I’ll re-run them, let’s hope the GitHub issues have been fixed. |
Codecov Report
@@ Coverage Diff @@
## master #5038 +/- ##
==========================================
- Coverage 88.01% 87.81% -0.21%
==========================================
Files 158 158
Lines 9736 9740 +4
Branches 1435 1436 +1
==========================================
- Hits 8569 8553 -16
- Misses 911 930 +19
- Partials 256 257 +1
|
flag=0 | ||
import brotli | ||
ACCEPTED_ENCODINGS.append(b'br') | ||
except ImportError: | ||
pass | ||
else: | ||
flag=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think you need a new variable here, you could just use b'br' in ACCEPTED_ENCODINGS
to the same effect, which would be more readable.
@@ -73,7 +76,10 @@ def _decode(self, body, encoding): | |||
# http://www.gzip.org/zlib/zlib_faq.html#faq38 | |||
body = zlib.decompress(body, -15) | |||
if encoding == b'br' and b'br' in ACCEPTED_ENCODINGS: | |||
body = brotli.decompress(body) | |||
if (flag): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The b'br' in ACCEPTED_ENCODINGS
in the if
statement above already implies that flag
is 0
here. So flag
is unnecessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the condition here needs to be
if encoding == b'br':
if b'br' in ACCEPTED_ENCODINGS:
body = brotli.decompress(body)
else:
# warning or error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I believe we'll need a test for this, to ensure we get the message when required
if (flag): | ||
body = brotli.decompress(body) | ||
else: | ||
print("brotli not installed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a warning here, not a simple print
.
To some extent, it could even be an error as it will fail when processing the response, as it wasn't decoded properly.
if (flag): | ||
body = brotli.decompress(body) | ||
else: | ||
print("brotli not installed") | ||
if encoding == b'zstd' and b'zstd' in ACCEPTED_ENCODINGS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should apply the same condition as above here, as it falls in the same case/issue (not having the dependency)
@@ -10,10 +10,13 @@ | |||
ACCEPTED_ENCODINGS = [b'gzip', b'deflate'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can change this to a set
as we're performing many in
(pertinence checks) and this operation is faster on set
than on a list
@@ -73,7 +76,10 @@ def _decode(self, body, encoding): | |||
# http://www.gzip.org/zlib/zlib_faq.html#faq38 | |||
body = zlib.decompress(body, -15) | |||
if encoding == b'br' and b'br' in ACCEPTED_ENCODINGS: | |||
body = brotli.decompress(body) | |||
if (flag): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the condition here needs to be
if encoding == b'br':
if b'br' in ACCEPTED_ENCODINGS:
body = brotli.decompress(body)
else:
# warning or error
@@ -73,7 +76,10 @@ def _decode(self, body, encoding): | |||
# http://www.gzip.org/zlib/zlib_faq.html#faq38 | |||
body = zlib.decompress(body, -15) | |||
if encoding == b'br' and b'br' in ACCEPTED_ENCODINGS: | |||
body = brotli.decompress(body) | |||
if (flag): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I believe we'll need a test for this, to ensure we get the message when required
import brotli | ||
ACCEPTED_ENCODINGS.append(b'br') | ||
except ImportError: | ||
pass | ||
else: | ||
flag=1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just taking the chance here.
You could change encoding = content_encoding.pop()
to encoding = content_encoding[0]
in process_response
, as this removes the header when it arrives in the spider and can make our life a bit hard to detect the issue.
Not sure if it follows here or as a fix at all.
cc @Gallaecio
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, line 48, and yes, I think you are right.
I would not require this change in this pull request, but if not done here, we should probably open a ticket for it.
Resolves #4697 .
Sorry, being the new contributor after having some issues i panic and deleted my fork to start afresh. Really sorry for the trouble. Last pr was #5037 .