Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

br handling #5038

Closed
wants to merge 0 commits into from
Closed

br handling #5038

wants to merge 0 commits into from

Conversation

Anupam-USP
Copy link

@Anupam-USP Anupam-USP commented Mar 12, 2021

Resolves #4697 .

Sorry, being the new contributor after having some issues i panic and deleted my fork to start afresh. Really sorry for the trouble. Last pr was #5037 .

@Anupam-USP
Copy link
Author

Why does these checks failed when I used simple print? Logging a warning passed all of these

@Anupam-USP
Copy link
Author

Can you please check why the test failed? I think the first time I used to log warning and this time I simply printed the statement which can be the error

@Gallaecio
Copy link
Member

The errors seem unrelated to your changes, and caused by internal issues in the GitHub infrastructure. I’ll re-run them, let’s hope the GitHub issues have been fixed.

@codecov
Copy link

codecov bot commented Mar 17, 2021

Codecov Report

Merging #5038 (96400a4) into master (ab037ab) will decrease coverage by 0.20%.
The diff coverage is 60.00%.

❗ Current head 96400a4 differs from pull request most recent head 35850f1. Consider uploading reports for the commit 35850f1 to get more accurate results

@@            Coverage Diff             @@
##           master    #5038      +/-   ##
==========================================
- Coverage   88.01%   87.81%   -0.21%     
==========================================
  Files         158      158              
  Lines        9736     9740       +4     
  Branches     1435     1436       +1     
==========================================
- Hits         8569     8553      -16     
- Misses        911      930      +19     
- Partials      256      257       +1     
Impacted Files Coverage Δ
scrapy/downloadermiddlewares/httpcompression.py 84.74% <60.00%> (-2.53%) ⬇️
scrapy/robotstxt.py 75.30% <0.00%> (-22.23%) ⬇️

Comment on lines 13 to 19
flag=0
import brotli
ACCEPTED_ENCODINGS.append(b'br')
except ImportError:
pass
else:
flag=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think you need a new variable here, you could just use b'br' in ACCEPTED_ENCODINGS to the same effect, which would be more readable.

@@ -73,7 +76,10 @@ def _decode(self, body, encoding):
# http://www.gzip.org/zlib/zlib_faq.html#faq38
body = zlib.decompress(body, -15)
if encoding == b'br' and b'br' in ACCEPTED_ENCODINGS:
body = brotli.decompress(body)
if (flag):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The b'br' in ACCEPTED_ENCODINGS in the if statement above already implies that flag is 0 here. So flag is unnecessary here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the condition here needs to be

if encoding == b'br':
    if b'br' in ACCEPTED_ENCODINGS:
        body = brotli.decompress(body)
    else:
        # warning or error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I believe we'll need a test for this, to ensure we get the message when required

if (flag):
body = brotli.decompress(body)
else:
print("brotli not installed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a warning here, not a simple print.
To some extent, it could even be an error as it will fail when processing the response, as it wasn't decoded properly.

if (flag):
body = brotli.decompress(body)
else:
print("brotli not installed")
if encoding == b'zstd' and b'zstd' in ACCEPTED_ENCODINGS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should apply the same condition as above here, as it falls in the same case/issue (not having the dependency)

@@ -10,10 +10,13 @@
ACCEPTED_ENCODINGS = [b'gzip', b'deflate']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can change this to a set as we're performing many in (pertinence checks) and this operation is faster on set than on a list

@@ -73,7 +76,10 @@ def _decode(self, body, encoding):
# http://www.gzip.org/zlib/zlib_faq.html#faq38
body = zlib.decompress(body, -15)
if encoding == b'br' and b'br' in ACCEPTED_ENCODINGS:
body = brotli.decompress(body)
if (flag):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the condition here needs to be

if encoding == b'br':
    if b'br' in ACCEPTED_ENCODINGS:
        body = brotli.decompress(body)
    else:
        # warning or error

@@ -73,7 +76,10 @@ def _decode(self, body, encoding):
# http://www.gzip.org/zlib/zlib_faq.html#faq38
body = zlib.decompress(body, -15)
if encoding == b'br' and b'br' in ACCEPTED_ENCODINGS:
body = brotli.decompress(body)
if (flag):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I believe we'll need a test for this, to ensure we get the message when required

import brotli
ACCEPTED_ENCODINGS.append(b'br')
except ImportError:
pass
else:
flag=1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just taking the chance here.
You could change encoding = content_encoding.pop() to encoding = content_encoding[0] in process_response, as this removes the header when it arrives in the spider and can make our life a bit hard to detect the issue.

Not sure if it follows here or as a fix at all.
cc @Gallaecio

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, line 48, and yes, I think you are right.

I would not require this change in this pull request, but if not done here, we should probably open a ticket for it.

@Anupam-USP Anupam-USP closed this Apr 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warn about br handling if brotlipy is not installed
3 participants