make HTTPResponse.stream use read1 when amt=None #3216

smason · 2023-11-27T17:11:32Z

Following up on #3186 (and suggested in #2125) this uses the new read1 method when streaming a non-chunked response and amt=None.

This would mean that psf/requests#5536 could also be closed.

smason · 2023-11-27T19:39:04Z

@illia-v A hopefully simple followup to my last PR. This makes the actual change to the behavior of stream(None) — hopefully in a way that people expect

Not sure why all those tests timed out! I rebased to main branch before making this, I see there's been a bit of a rework of the test infrastructure recently could that be related?

illia-v · 2023-11-29T19:43:48Z

@saschpe thanks! The test failures look to be a result of this PR.

The case is pretty similar to hanging you described in python/cpython#112064.
I guess this while loop iterates many times needlessly because self._fp is not closed when all data is read:

urllib3/src/urllib3/response.py

Line 1022 in 4ae4b71

while not is_fp_closed(self._fp) or len(self._decoded_buffer) > 0:

smason · 2023-11-30T13:45:14Z

@illia-v I was only running the few tests I cared about locally as the whole suite takes a long time and missed that unintended breakage, should be fixed now.

I tried to rework stream to look more like a conventional read loop, but the behavior of read auto-closing the stream in a caller observable manner seems to go against this. AFAIK the modern idiomatic way would look like:

while data := fp.read(amt):
    yield data

but this raises on the final read due to the response having been implicitly closed in the read before, which seems awkward.

Am wondering whether this is worth trying to clean up closed handling separately?

src/urllib3/response.py

illia-v · 2023-12-03T16:31:39Z

src/urllib3/response.py

+                if not data:
+                    break


When the loop is broken here after read1 was used, self._fp remains not closed even though all data has been read. Closing it may be a better way to break the loop. What do you think about checking self.length_remaining after calling read1 and, if it is 0 and not self.closed, calling self._fp.close()?

When the loop is broken here after read1 was used, self._fp remains not closed even though all data has been read.

Hadn't realised the semantics around what gets closed and when, have done a bit more reading now. Have had a bit of a refactor of the change.

Closing it may be a better way to break the loop. What do you think about checking self.length_remaining after calling read1 and, if it is 0 and not self.closed, calling self._fp.close()?

It feels as though it could raise an IncompleteRead if it finishes and length_remaining indicates there's missing data (as _raw_read does), any preferences on this / other changes

It feels as though it could raise an IncompleteRead if it finishes and length_remaining indicates there's missing data (as _raw_read does), any preferences on this / other changes

Good point!
Looking at CPython's code, it's possible to get in the situation at least if self._fp is closed between read1 calls. However, you have to respect self.enforce_content_length as _raw_read does.

smason · 2023-12-06T20:00:31Z

rebased due to all the bumping in dependencies and test changes

src/urllib3/response.py

illia-v · 2023-12-07T15:22:26Z

src/urllib3/response.py

+                if not data:
+                    break


It feels as though it could raise an IncompleteRead if it finishes and length_remaining indicates there's missing data (as _raw_read does), any preferences on this / other changes

Good point!
Looking at CPython's code, it's possible to get in the situation at least if self._fp is closed between read1 calls. However, you have to respect self.enforce_content_length as _raw_read does.

illia-v · 2023-12-15T22:06:27Z

Since we discovered some problems with http.client.HTTPResponse.read1 while working on this PR, more confidence needs to be gained to make the change to HTTPResponse.stream.

In the meanwhile, I created #3235 and #3236 to fix two blocking issues.

smason force-pushed the stream-via-read1 branch from f6fa97b to 043edc8 Compare November 30, 2023 12:21

illia-v reviewed Dec 3, 2023

View reviewed changes

smason force-pushed the stream-via-read1 branch from 043edc8 to 24fb044 Compare December 4, 2023 23:05

smason added 3 commits December 6, 2023 19:57

make HTTPResponse.stream use read1 when amt=None

a06307e

add changelog entry

ab4619c

Allow HTTPResponse.stream to use read1 when amt=None

83e4ab8

smason force-pushed the stream-via-read1 branch from 24fb044 to 83e4ab8 Compare December 6, 2023 19:58

illia-v reviewed Dec 7, 2023

View reviewed changes

Test IncompleteRead is raised and add comments

4ce6a9b

smason force-pushed the stream-via-read1 branch from 9417e22 to 4ce6a9b Compare December 7, 2023 17:20

illia-v mentioned this pull request Dec 15, 2023

Make HTTPResponse.read1 close response when all data is read #3235

Merged

illia-v mentioned this pull request Jan 3, 2024

void #3250

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make HTTPResponse.stream use read1 when amt=None #3216

make HTTPResponse.stream use read1 when amt=None #3216

smason commented Nov 27, 2023

smason commented Nov 27, 2023

illia-v commented Nov 29, 2023

smason commented Nov 30, 2023

illia-v Dec 3, 2023

smason Dec 4, 2023

illia-v Dec 7, 2023

smason commented Dec 6, 2023

illia-v Dec 7, 2023

illia-v commented Dec 15, 2023

make HTTPResponse.stream use read1 when amt=None #3216

Are you sure you want to change the base?

make HTTPResponse.stream use read1 when amt=None #3216

Conversation

smason commented Nov 27, 2023

smason commented Nov 27, 2023

illia-v commented Nov 29, 2023

smason commented Nov 30, 2023

illia-v Dec 3, 2023

Choose a reason for hiding this comment

smason Dec 4, 2023

Choose a reason for hiding this comment

illia-v Dec 7, 2023

Choose a reason for hiding this comment

smason commented Dec 6, 2023

illia-v Dec 7, 2023

Choose a reason for hiding this comment

illia-v commented Dec 15, 2023