Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Place Content-encoding header into response's flags is desired #1988 #4025

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

VorBoto
Copy link

@VorBoto VorBoto commented Sep 18, 2019

Fixes #1988

Threw up some trash not completely knowing what I was doing but got help from some other contributors on this project and hope this will be acceptable.

I added the ability to keep the 'Content-Encoding' header value in a response as a member of the 'flag' attribute of a response.

@codecov
Copy link

codecov bot commented Sep 19, 2019

Codecov Report

Merging #4025 (0a03b45) into master (534de73) will increase coverage by 0.25%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4025      +/-   ##
==========================================
+ Coverage   85.43%   85.69%   +0.25%     
==========================================
  Files         167      165       -2     
  Lines        9732     9739       +7     
  Branches     1456     1464       +8     
==========================================
+ Hits         8315     8346      +31     
+ Misses       1159     1136      -23     
+ Partials      258      257       -1     
Impacted Files Coverage Δ
scrapy/downloadermiddlewares/httpcompression.py 89.79% <100.00%> (+0.90%) ⬆️
scrapy/settings/default_settings.py 98.70% <100.00%> (+<0.01%) ⬆️
scrapy/commands/parse.py 20.21% <0.00%> (-0.12%) ⬇️
scrapy/logformatter.py 90.47% <0.00%> (ø)
scrapy/xlib/tx.py
scrapy/xlib/pydispatch.py
scrapy/core/engine.py 92.79% <0.00%> (+0.03%) ⬆️
scrapy/item.py 98.66% <0.00%> (+0.11%) ⬆️
scrapy/spiders/crawl.py 82.14% <0.00%> (+0.43%) ⬆️
scrapy/pipelines/files.py 66.16% <0.00%> (+0.78%) ⬆️
... and 3 more

@VorBoto
Copy link
Author

VorBoto commented Sep 21, 2019

@Gallaecio I did some of the changes you recommended but wasn't sure which tests should get what parameters so this will fail the tests but wanted to see if I pulled the right parts out for a separate function.

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_spider_mw looks fine to me. Maybe you could set default values for the two parameters, the values that are most common in the tests, so that you can write less code later.

scrapy/downloadermiddlewares/httpcompression.py Outdated Show resolved Hide resolved
@VorBoto
Copy link
Author

VorBoto commented Sep 24, 2019

Got defaults now.
Okay yeah, from not fully understanding flags I didn't think of them being checked outside of this. So I will treat that as possibly existing instance of a list. Second the first item it gets is a list and not a singular string?

The header values are in bytes it looks like from the b",".join(...) in process_request right above it. Also _decode right below also uses bytes. I'm going to assume I should make sure the flags' values are in bytes too. Also on account of the data received by response being in bytes too?

@VorBoto
Copy link
Author

VorBoto commented Sep 24, 2019

I have the if statement for putting in decoded, should I put it in the if ssubclass(respcls,TextResponse): body or on the same level, does it matter much?

@Gallaecio
Copy link
Member

I have the if statement for putting in decoded, should I put it in the if ssubclass(respcls,TextResponse): body or on the same level, does it matter much?

I think it’s on the same level, but I am not 100%, I’m not familiar with this part of the code base yet. In any case, tests (existing and new) should tell whether or not you made the right choice.

@VorBoto
Copy link
Author

VorBoto commented Sep 25, 2019

Not sure where to remove or change things to test with different values testing with or without headers removed.

@Gallaecio
Copy link
Member

Gallaecio commented Sep 30, 2019

Not sure where to remove or change things to test with different values testing with or without headers removed.

I think the way to go now is to remove the setUp method from the test class, and have each test method call your new text function at the beginning:

spider, mw = self.create_spider_mw()

Then you can add new tests covering your changes that also call that function, but use parameter values different from the defaults when needed for your test cases.

@VorBoto
Copy link
Author

VorBoto commented Oct 1, 2019

Okay I'll have to work on that a little bit at a time through out the week.
Have an opportunity at work I have to do some additional research and work that at the moment has a little higher priority.
Just wanted to get that out there so if I'm not active for a day or three its not like I just ghosted this just getting something else done before coming back.

@VorBoto
Copy link
Author

VorBoto commented Oct 2, 2019

Isn't setup function an inherited function from unittest that needs to be filled out for it to work?
I'm not to familiar with unittest I'm just looking it up and seeing what is going on.
Is there possibly someone who is more versed in testing phase of the project we could ask for more insight?(Not saying you're not capable or don't know. I am just guessing there might be someone who while it may not be their primary responsibility knows the testing functionality and base better than the rest. Or am I assuming to much?)

@Gallaecio
Copy link
Member

The setup method only needs to be defined to to set up stuff that is common to all tests in the class.

This was the right way to define the spider and middleware, since the same spider and middleware was used in every test.

Now that we want to add tests that use a different spider and middleware, you can remove that function and instead create a spider and a middleware in each test method.

Alternatively, and to minimize changes, you could leave existing tests as they are and leave the setup method as is, and simply use a different spider and middleware in your new tests that require one. It will actually make for a cleaner patch, so it is probably a good idea.

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, but the key one is the last, about how you use create_spider_mw in your new tests.

scrapy/downloadermiddlewares/httpcompression.py Outdated Show resolved Hide resolved
scrapy/downloadermiddlewares/httpcompression.py Outdated Show resolved Hide resolved
tests/test_downloadermiddleware_httpcompression.py Outdated Show resolved Hide resolved
tests/test_downloadermiddleware_httpcompression.py Outdated Show resolved Hide resolved
tests/test_downloadermiddleware_httpcompression.py Outdated Show resolved Hide resolved
tests/test_downloadermiddleware_httpcompression.py Outdated Show resolved Hide resolved
@Gallaecio
Copy link
Member

It has been a while, and I need to review my review 😅

This needs also a documentation change for the new setting, as well as taking into account backward compatibility for users subclassing the middleware.

@T0shik
Copy link

T0shik commented Oct 23, 2021

created #5290

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Content-Encoding" header gets stripped from response headers
4 participants