Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When dumping to IO, dump directly #538

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

headius
Copy link
Contributor

@headius headius commented Aug 15, 2023

Note, this PR is a proof-of-concept of direct IO dumping in the json library. It works, but is not as fast as it could be, and there's no CRuby implementation yet.

Json.dump allows you to pass an IO to which the dump output will be sent, but it still buffers the entire output in memory before sending it to the given IO. This leads to issues on JRuby like jruby/jruby#6265 when it tries to create a byte[] that exceeds the maximum size of a signed int (JVM's array size limit).

This commit plumbs the IO all the way through the generation logic so that it can be written to directly without filling a temporary memory buffer first. This allow JRuby to dump object graphs that would normally produce more content than the JVM can hold in a single array, providing a workaround for jruby/jruby#6265.

It is unfortunately a bit slow to dump directly to IO due to the many small writes that all acquire locks and participate in the IO encoding subsystem. A more direct path that can skip some of these pieces could be more competitive with the in-memory version, but functionally it expands the size of graphs that cana be dumped when using JRuby.

See #524

Json.dump allows you to pass an IO to which the dump output will
be sent, but it still buffers the entire output in memory before
sending it to the given IO. This leads to issues on JRuby like
jruby/jruby#6265 when it tries to create a byte[] that exceeds the
maximum size of a signed int (JVM's array size limit).

This commit plumbs the IO all the way through the generation logic
so that it can be written to directly without filling a temporary
memory buffer first. This allow JRuby to dump object graphs that
would normally produce more content than the JVM can hold in a
single array, providing a workaround for jruby/jruby#6265.

It is unfortunately a bit slow to dump directly to IO due to the
many small writes that all acquire locks and participate in the
IO encoding subsystem. A more direct path that can skip some of
these pieces could be more competitive with the in-memory version,
but functionally it expands the size of graphs that cana be dumped
when using JRuby.

See flori#524
@headius headius marked this pull request as draft August 15, 2023 00:31
@headius
Copy link
Contributor Author

headius commented Aug 15, 2023

Note that prior to this patch, the script provided in jruby/jruby#6265 required 2-3GB of memory to run. Afterwards it varies between 200-600MB and completes successfully.

@segiddins
Copy link

For performance, it might help to have a buffered proxy to the underlying IO? That way, you only incur the locking/encoding overhead every buffer block size, vs every substring that gets written?

@headius
Copy link
Contributor Author

headius commented Aug 19, 2023

@segiddins Yeah I hoped that internal buffering in IO would be sufficient for this but that buffer may simply be too small or there's enough overhead with character encoding checks that we lose the benefit. I'd like to play with some other buffering strategies, and some of the larger methods that do many tiny writes could do those rights into a temporary string that gets dumped in one go.

@headius
Copy link
Contributor Author

headius commented Aug 19, 2023

Oh, I did run a profile of this on JRuby and the other cost we have using IO as the buffer is all the locking that's required to keep it thread safe. So if you're writing individual characters, that's a lot of lock acquisition and releasing. Batching up those rights into coarser chunks would make a big difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants