Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make HTTPResponse.stream use read1 when amt=None #3216

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

smason
Copy link
Contributor

@smason smason commented Nov 27, 2023

Following up on #3186 (and suggested in #2125) this uses the new read1 method when streaming a non-chunked response and amt=None.

This would mean that psf/requests#5536 could also be closed.

@smason
Copy link
Contributor Author

smason commented Nov 27, 2023

@illia-v A hopefully simple followup to my last PR. This makes the actual change to the behavior of stream(None) — hopefully in a way that people expect

Not sure why all those tests timed out! I rebased to main branch before making this, I see there's been a bit of a rework of the test infrastructure recently could that be related?

@illia-v
Copy link
Member

illia-v commented Nov 29, 2023

@saschpe thanks! The test failures look to be a result of this PR.

The case is pretty similar to hanging you described in python/cpython#112064.
I guess this while loop iterates many times needlessly because self._fp is not closed when all data is read:

while not is_fp_closed(self._fp) or len(self._decoded_buffer) > 0:

@smason
Copy link
Contributor Author

smason commented Nov 30, 2023

@illia-v I was only running the few tests I cared about locally as the whole suite takes a long time and missed that unintended breakage, should be fixed now.

I tried to rework stream to look more like a conventional read loop, but the behavior of read auto-closing the stream in a caller observable manner seems to go against this. AFAIK the modern idiomatic way would look like:

while data := fp.read(amt):
    yield data

but this raises on the final read due to the response having been implicitly closed in the read before, which seems awkward.

Am wondering whether this is worth trying to clean up closed handling separately?

src/urllib3/response.py Outdated Show resolved Hide resolved
Comment on lines 1029 to 1032
if not data:
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the loop is broken here after read1 was used, self._fp remains not closed even though all data has been read. Closing it may be a better way to break the loop. What do you think about checking self.length_remaining after calling read1 and, if it is 0 and not self.closed, calling self._fp.close()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the loop is broken here after read1 was used, self._fp remains not closed even though all data has been read.

Hadn't realised the semantics around what gets closed and when, have done a bit more reading now. Have had a bit of a refactor of the change.

Closing it may be a better way to break the loop. What do you think about checking self.length_remaining after calling read1 and, if it is 0 and not self.closed, calling self._fp.close()?

It feels as though it could raise an IncompleteRead if it finishes and length_remaining indicates there's missing data (as _raw_read does), any preferences on this / other changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels as though it could raise an IncompleteRead if it finishes and length_remaining indicates there's missing data (as _raw_read does), any preferences on this / other changes

Good point!
Looking at CPython's code, it's possible to get in the situation at least if self._fp is closed between read1 calls. However, you have to respect self.enforce_content_length as _raw_read does.

@smason
Copy link
Contributor Author

smason commented Dec 6, 2023

rebased due to all the bumping in dependencies and test changes

src/urllib3/response.py Show resolved Hide resolved
src/urllib3/response.py Show resolved Hide resolved
Comment on lines 1029 to 1032
if not data:
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels as though it could raise an IncompleteRead if it finishes and length_remaining indicates there's missing data (as _raw_read does), any preferences on this / other changes

Good point!
Looking at CPython's code, it's possible to get in the situation at least if self._fp is closed between read1 calls. However, you have to respect self.enforce_content_length as _raw_read does.

@illia-v
Copy link
Member

illia-v commented Dec 15, 2023

Since we discovered some problems with http.client.HTTPResponse.read1 while working on this PR, more confidence needs to be gained to make the change to HTTPResponse.stream.

In the meanwhile, I created #3235 and #3236 to fix two blocking issues.

@illia-v illia-v mentioned this pull request Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants