New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for streaming a large JSON array #526
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Can you come up with some other possible ways to expose this? I am concerned that ArrayDeserializer is not good for handling arrays with mixed element type. StreamDeserializer is designed as a convenience for the common case of uniform element type, but mixed streams are supported without it:
let mut de = serde_json::Deserializer::from_reader(...);
let t1 = T1::deserialize(&mut de)?;
let t2 = T2::deserialize(&mut de)?;
I would like to have some story for supporting non-uniform streaming arrays before accepting something like this.
Thanks for the feedback! I hadn't considered that use case. Unfortunately, that may complicate the API a little bit. Here's what I have in mind:
How does this sound? |
This mimics the StreamDeserializer API and implements issue serde-rs#404. Unlike the StreamDeserializer, the ArrayDeserializer struct itself does not keep track of the type of the array's elements, instead the next() itself is generic to support deserialization of arrays with values of different types. Unfortunately, this means we can't implement the Iterator trait.
2094291
to
55f5929
Compare
Just pushed a new patch which implements points (1) and (2) of my proposal. I've not implemented the separate Iterator object (3) - this API is not likely to be used very often, so keeping the overall API small and simple may be preferable over the bit of extra convenience. |
Really interested in this PR! Is there anything preventing it from being merged at the moment? |
I wanted to brainstorm some alternatives but haven't been able to make time yet. Mainly I wonder how this API would be different if we wanted to support streaming nested arrays inside of arrays, arrays inside of objects, objects inside of objects, etc. Almost like a deserialization version of serde_json::ser::Formatter. Maybe this could be experimented with in its own crate outside of serde_json, then considered for a PR? |
I've considered that use case, and think my proposed API can be extended quite naturally to support it. Fortunately, JSON only has two nested types, so it ought to be pretty simple (albeit a bit crude):
To make this work, But I won't have time to work this out in the foreseeable future. |
This one should be the best approach. Actually as our ingestion pipeline we are handling JSON's like the one that you are talking here. But also we have cases where there is a simple BIG Json. Big Json-Array proposed here:
Alternative big JSON-Object used to transfer multiple data with the same file:
JSON-Lines that can be parsed easy reading lines with a byte buffer and send to the lib.
Actually the lib always expects that the data fits in memory or it forces to the programmer build a custom deserializer and pop the data in the nested elements using a visitor pattern. |
Is it possible to implement the code in this PR outside of (Sorry if there's an easy solution to this. It's only my second day using Rust.) |
Closing because it doesn't look like there is still design work ongoing and I don't plan to accept the current design. I would recommend pursuing a design that takes into account #526 (review) and #526 (comment), whether in an external crate that builds on serde_json or in a fork. The examples in #526 (comment) would be a good test case to work toward getting working. |
My recent excursion into EOF error reporting made me realize that my hack (that happens to rely on
is_eof()
) to incrementally parse a large JSON array isn't the most robust approach, so here's my initial attempt in implementing a proper streaming array parser.This implements issue #404 by adding an ArrayDeserializer API that mimics the StreamDeserializer. I did investigate extending the StreamDeserializer API itself, but a new struct seemed more natural.