Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements when encoding very large hashes with symbol keys #163

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

MatmaRex
Copy link

I'm currently working on a project which involves processing a huge hash (a couple gigabytes in memory) and occasionally dumping it as JSON (about 1 GB of it). (The project is MatmaRex/commons-media-views, please don't ask why I didn't do something saner, this seemed like a good idea at the time and now it's an interesting mental exercise.) I switched from built-in JSON parser/encode to YAJL for streamed encoding, but the performance seemed not quite as good as I expected. I did some digging and here are the results.

This set of patches should improve encoding performance across the board, but particularly when encoding hashes, and particularly when they have symbolic keys, and especially when they're really large. It looks like most of it is thanks to reduced number of object allocations, and thus fewer GC pauses while encoding. The only potential drawback is that some monkey-patched methods on builtin classes that previously were respected will no longer be. I don't think that's something you're aiming to support.

Testing with this large file: https://dl.dropboxusercontent.com/u/10983006/tmp/big.json (~110 MB) parsed with Yajl::Parser.new(symbolize_keys: true), I get a 2x performance improvement when encoding the parsed data back into JSON.

Previously, symbols would fall into the 'default' case and be stringified
through rb_funcall(obj, intern_to_s, 0). That has the added overhead of
function call and String object allocation.
rb_funcall has the added overhead of function call and, for Symbols,
also String object allocation. String and Symbols keys are the common
case here.
A Fixnum can not be NaN, Infinity or -Infinity. Also, use rb_fix2str
rather than rb_funcall(obj, intern_to_s, 0) to avoid function call
overhead.
@MatmaRex
Copy link
Author

Some benchmarks:

a) Without these patches:

benchmark/encode.rb benchmark/subjects/ohai.json 10000

                                         user     system      total        real
Yajl::Encoder#encode (to an IO)      5.881000   0.406000   6.287000 (  6.795389)
Yajl::Encoder#encode (to a String)   5.148000   0.047000   5.195000 (  5.460313)

benchmark/encode.rb benchmark/subjects/big.json 5

                                         user     system      total        real
Yajl::Encoder#encode (to an IO)     17.129000   0.406000  17.535000 ( 18.351049)
Yajl::Encoder#encode (to a String)  16.084000   0.140000  16.224000 ( 16.613950)

benchmark/encode.rb benchmark/subjects/big.json 5 (with symbolize_keys: true)

                                         user     system      total        real
Yajl::Encoder#encode (to an IO)    113.179000   0.297000 113.476000 (117.212704)
Yajl::Encoder#encode (to a String) 114.801000   0.171000 114.972000 (118.737791)

b) With these patches:

benchmark/encode.rb benchmark/subjects/ohai.json 10000

                                         user     system      total        real
Yajl::Encoder#encode (to an IO)      5.367000   0.234000   5.601000 (  5.910338)
Yajl::Encoder#encode (to a String)   4.555000   0.015000   4.570000 (  4.727270)

benchmark/encode.rb benchmark/subjects/big.json 5

                                         user     system      total        real
Yajl::Encoder#encode (to an IO)     13.775000   0.140000  13.915000 ( 14.898852)
Yajl::Encoder#encode (to a String)  12.308000   0.031000  12.339000 ( 13.032745)

benchmark/encode.rb benchmark/subjects/big.json 5 (with symbolize_keys: true)

                                         user     system      total        real
Yajl::Encoder#encode (to an IO)     60.856000   0.328000  61.184000 ( 62.784591)
Yajl::Encoder#encode (to a String)  59.717000   0.202000  59.919000 ( 61.520518)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant