New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON object stream loading support #520
Comments
That doesn't sound too hard. I'd definitely want this to be a separate function like When you say stream though - that seems to hint to me that this multi-json might be coming in chunks and that those chunks may start or stop midway through an object. Is that likely to happen? It would certainly be a lot messier. Would you expect |
Thank you for reply, @bwoodsend |
FWIW @eugene-bright: I ended up using this for a similar situation: https://github.com/rickardp/splitstream |
Thanks for sharing, @kibiz0r |
It does make me wonder why on earth AWS doesn't just write them as [{"msg": "first messge"}, {"msg": "second message"}] |
With the proper Firehouse configuration it could be possible I believe. But... |
@bwoodsend The downside is that with such a notation, you will always need to load the entire log into memory for decoding (without special trickery) rather than looping over the entries. Same reason why JSONL exists. Although a separator (such as LF in JSONL, or rarely RS for record-separator-delimited JSON) being omitted makes it a pain again in my opinion. Concatenated JSON has its advantages as well though, in particular you can pretty-print JSON and it'll still work with concatenation, which isn't the case with JSONL for example. |
What did you do?
Parsing of AWS S3 bucket content that contains aggregated stream of the log objects.
The JSON objects are being written continiously without any delimiters e.g.:
What did you expect to happen?
I know that it's not a part of the JSON spec, but I expect something like that:
What actually happened?
When I'm parsing such an stream the error arises:
>>> ujson.loads("{}{}") ValueError: Trailing data
What versions are you using?
The text was updated successfully, but these errors were encountered: