add `.chunk()` associated function to `blocking::Response`, add `.json_chunk()` method to `Response` and `blocking::Response` #2000

LoganDark · 2023-10-16T22:49:18Z

error::decode method is private, so it's impossible to implement this outside of the crate.

Used for https://github.com/huggingface/text-generation-inference

`error::decode` method is private, so it's impossible to implement this outside of the crate. Used for https://github.com/huggingface/text-generation-inference

seanmonstar · 2023-10-19T15:15:54Z

src/async_impl/response.rs

+    #[cfg_attr(docsrs, doc(cfg(feature = "json")))]
+    pub async fn json_chunk<T: DeserializeOwned>(&mut self) -> crate::Result<Option<T>> {
+        if let Some(full) = self.chunk().await? {
+            Ok(Some(serde_json::from_slice(&full).map_err(crate::error::decode)?))


So, this is assuming that every "chunk", which is basically a read() call on the socket, is a full JSON message? It's a fragile assumption, since the data could be combined into a single read, or it could be too large and be broken up into multiple reads...

So, this is assuming that every "chunk" [...] is a full JSON message?

Yes

It's a fragile assumption, since the data could be combined into a single read, or it could be too large and be broken up into multiple reads...

The assumption is that chunk() corresponds to one chunk from Transfer-Encoding: chunked. Not that it corresponds to whatever the OS decides read() is.

Since the chunk size is dictated by the server, they should be complete JSON messages when you are using an endpoint that returns a complete JSON message each chunk.

If this assumption is incorrect, then reqwest probably needs to be extended to support Transfer-Encoding: chunked, because otherwise I can't consume token streams from text-generation-inference.

AFAICT, reqwest has tests that expect .body_mut().next() to be an entire chunk (from Transfer-Encoding: chunked), and the implementation of chunk() defers to that exact same method call, so if that correspondence is not true then I don't know what is even going on.

reqwest (and hyper) does support chunked transfer-encoding. But it doesn't buffer up every "chunk" that way. The decoder in hyper will do 1 OS read, and then pass on either up to the chunked delimiter, or the full thing if the delimiter is not yet reached.

reqwest (and hyper) does support chunked transfer-encoding. But it doesn't buffer up every "chunk" that way. The decoder in hyper will do 1 OS read, and then pass on either up to the chunked delimiter, or the full thing if the delimiter is not yet reached.

How do you suggest I should tell when a delimiter is reached, then? (Is there a method for this?)

Chunked delimiters have no semantic significance, so they can also be changed as they go through gateways/proxies. I'm not familiar with the project you linked. Most often, JSON streaming is done by delimiting JSON objects with newlines. Then what you do, is read and buffer until you get a newline, and then you can decode the object.

Chunked delimiters have no semantic significance, so they can also be changed as they go through gateways/proxies.

This is a server-side issue, isn't it? If the server hosts an endpoint that doesn't use the encoding properly, that's a bug on their end.

If reqwest needs to be resilient to this sort of issue, I would still want to support endpoints that actually work properly - so we'd end up having two families of functions, one that uses chunks and one that uses newlines.

In that case, I'll still need a way to tell the end of a chunk apart from an arbitrary read boundary.

Is there any way to test for this?

No, there isn't a way to test for it. The properties of the transfer are something that hyper handles, they are not considered part of the content (vs Content-Encoding).

It's sort of similar to how calling read() on an OS TcpStream won't tell you which bytes were in each segment, since the OS will combine segments into a single buffer if it receives multiple in-between calls to read().

I'm going to perform a couple tests and report back.

OK, it looks like this server actually uses Server-Sent Events, not just sending raw chunks. So it is delimited by two newlines between each message. That's a bit better.

In that case, this will probably need a better name. What do you say to this: in addition to the chunk family of functions, there's event that strips the data: prefix and reads until \n\n - then have json_event that decodes from there?

There wouldn't be any need for json_chunk at all then, or I could just make it read until a single newline like you suggested, just in case there is an API out there that doesn't use SSE.

Here's my plan - let's remove the new json_chunk methods from this PR and narrow the scope to just adding that blocking chunk method - and then I'll work on some Server-Sent Events support in a separate PR.

It looks like reqwest's Decoder doesn't support it - we can read chunks until \n\n, but we can't just give portions of the chunk back to the Decoder if the server happens to send multiple events in a single chunk; if chunking is an implementation detail and we have to be agnostic, then we have to support that scenario.

Maybe we can provide an iterator over each event in an SSE stream? Then the iterator can keep track of whether the last event ended in the middle of some chunk.

LoganDark added 2 commits October 16, 2023 15:44

add .chunk() associated function to blocking::Response

a825864

add .json_chunk() method to Response and blocking::Response

ead4ac7

`error::decode` method is private, so it's impossible to implement this outside of the crate. Used for https://github.com/huggingface/text-generation-inference

seanmonstar reviewed Oct 19, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `.chunk()` associated function to `blocking::Response`, add `.json_chunk()` method to `Response` and `blocking::Response` #2000

add `.chunk()` associated function to `blocking::Response`, add `.json_chunk()` method to `Response` and `blocking::Response` #2000

LoganDark commented Oct 16, 2023

seanmonstar Oct 19, 2023

LoganDark Oct 19, 2023 •

edited

seanmonstar Oct 19, 2023

LoganDark Oct 19, 2023 •

edited

seanmonstar Oct 19, 2023

LoganDark Oct 19, 2023

seanmonstar Oct 19, 2023

LoganDark Oct 19, 2023

LoganDark Oct 19, 2023 •

edited

LoganDark Nov 9, 2023

add .chunk() associated function to blocking::Response, add .json_chunk() method to Response and blocking::Response #2000

Are you sure you want to change the base?

add .chunk() associated function to blocking::Response, add .json_chunk() method to Response and blocking::Response #2000

Conversation

LoganDark commented Oct 16, 2023

seanmonstar Oct 19, 2023

Choose a reason for hiding this comment

LoganDark Oct 19, 2023 • edited

Choose a reason for hiding this comment

seanmonstar Oct 19, 2023

Choose a reason for hiding this comment

LoganDark Oct 19, 2023 • edited

Choose a reason for hiding this comment

seanmonstar Oct 19, 2023

Choose a reason for hiding this comment

LoganDark Oct 19, 2023

Choose a reason for hiding this comment

seanmonstar Oct 19, 2023

Choose a reason for hiding this comment

LoganDark Oct 19, 2023

Choose a reason for hiding this comment

LoganDark Oct 19, 2023 • edited

Choose a reason for hiding this comment

LoganDark Nov 9, 2023

Choose a reason for hiding this comment

add `.chunk()` associated function to `blocking::Response`, add `.json_chunk()` method to `Response` and `blocking::Response` #2000

add `.chunk()` associated function to `blocking::Response`, add `.json_chunk()` method to `Response` and `blocking::Response` #2000

LoganDark Oct 19, 2023 •

edited

LoganDark Oct 19, 2023 •

edited

LoganDark Oct 19, 2023 •

edited