Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling very large json array #403

Closed
PSeitz opened this issue Jan 12, 2018 · 6 comments
Closed

Handling very large json array #403

PSeitz opened this issue Jan 12, 2018 · 6 comments
Labels

Comments

@PSeitz
Copy link

PSeitz commented Jan 12, 2018

I have a quite large 17gb json file, which is an array of json objects [ {a:2} ... {b:2} ]
When I try to deserialize the whole object, my machine with 96GB RAM gets an OOM, which is i a little odd. Memory consumption for serde_json::from_str(&s) seems to be quite high.

I saw there is an StreamDeserializer, although it seems it handles only data in the form of {a:2} {b:2} ...
https://docs.serde.rs/serde_json/de/struct.StreamDeserializer.html

@oli-obk
Copy link
Member

oli-obk commented Jan 12, 2018

Can you post a backtrace with gdb and the type you're deserializing into?

@PSeitz
Copy link
Author

PSeitz commented Jan 12, 2018

I don't know the structure upfront, so serde_json::Value.

Does RUST_BACKTRACE=full produce a gdb backtrace?

@oli-obk
Copy link
Member

oli-obk commented Jan 12, 2018

@PSeitz no, you need to use gdb in the case of oom or overflows.

serde_json::Value is very wasteful, you might be getting multiple times the size you had on disk. Although a factor 5 increase seems excessive.

@dtolnay
Copy link
Member

dtolnay commented Jan 12, 2018

This example code shows one way to process an array of values without having them all in memory at the same time.

@PSeitz
Copy link
Author

PSeitz commented Jan 12, 2018

The program gets killed by a SIGKILL, is it possible to get a stacktrace there? An attached gdb didn't help.

@dtolnay
Thanks, I saw that before, although the example makes some assumptions about the structure(id and values) and it's not clear to me how to remove those for my case.

@PSeitz
Copy link
Author

PSeitz commented Jan 25, 2018

I'll close this in favor of #404

@PSeitz PSeitz closed this as completed Jan 25, 2018
@dtolnay dtolnay added the support label Mar 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants