The strings are not optimized on nodejs #85

nikils · 2015-01-16T23:38:11Z

Hi,

The strings in josn5 are formed using character concatenation, this results in large amount of memory needed to store each in nodejs. v8 uses 'first' and 'second' pointer store concatenated strings. Since we are using character concatenation it ends up storing large tree instead of single string. One of the work around is to call string.charAt(0) on the resulting string so that v8 then converts the large tree into single string.

Here is the work around we are using.

function parse_json5(text, def) {
    var json;
    try {
        json = json5.parse(text);
        v8optimze(json);
    } catch (ex) {
        json = isUndefined(def) ? {} : def;
    }

    return json;
}

function v8optimze(jsonObj) {
    var tempStr, key;

    if (!jsonObj) {
        return null;
    }

    switch (typeof jsonObj) {
        case 'string':
            if (jsonObj.length > 0) {
                tempStr += jsonObj.charAt(0);
            }
            break;
        case 'object':
            if (Array.isArray(jsonObj)) {
                for (key = 0; key < jsonObj.length; key++) {
                    v8optimze(jsonObj[key]);
                }
            } else {
                for (key in jsonObj) {
                    if (jsonObj.hasOwnProperty(key)) {
                        v8optimze(key);
                        v8optimze(jsonObj[key]);
                    }
                }
            }
            break;
    }
}

The screenshot shows how a small string (25 bytes) is using 1284 bytes instead.
You can use node-heapdump (https://github.com/bnoordhuis/node-heapdump) to get heapdump in nodejs and analyze it in chrome dev tools.
Here is the code to get heapdump:

var json5 = require('json5');
var heapdump = require('heapdump');

global.test = json5.parse('{ foo: "hello world"}');
heapdump.writeSnapshot('/tmp/' + Date.now() + '.heapsnapshot', function (err, fil) {
console.log(err);
console.log('saved');
console.log(global.test);
});

The text was updated successfully, but these errors were encountered:

aseemk · 2015-01-17T07:45:51Z

Fascinating! Thank you.

Does anyone else have any thoughts on this? Should we simply do this on all objects we return? Or if, say, the input string was over a certain size?

Or perhaps we can do this as part of the parse process inline. E.g. on every close quote.

rlidwka · 2015-01-19T05:34:31Z

Thoughts are pretty skeptic, and the published workaround isn't a very good one.

However, char-by-char concatenation in general isn't efficient. Maybe instead of var str = ''; str += 'x'; return str we could do var arr = []; arr.push('x'); return arr.join('') or something.

Also, any benchmarks?

bnjmnt4n · 2015-01-19T08:42:38Z

Array#join is faster in IE6/7, but is slower for modern browsers. Using += causes modern browsers to store references to the strings being concatenated, but the whole string is joined only when actually being used, unlike Array#join, which joins strings immediately. That said, char-by-char concatenation is extremely slow. There should be a better way to do that.

aseemk added enhancement 👍 pull-request-welcome labels Jan 17, 2015

jordanbtucker self-assigned this Sep 20, 2017

This was referenced Jun 15, 2020

Very long string crash nodejs #232

Open

Support for long strings #233

Open

jordanbtucker added this to the v3.0.0 milestone Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The strings are not optimized on nodejs #85

The strings are not optimized on nodejs #85

nikils commented Jan 16, 2015

aseemk commented Jan 17, 2015

rlidwka commented Jan 19, 2015

bnjmnt4n commented Jan 19, 2015

The strings are not optimized on nodejs #85

The strings are not optimized on nodejs #85

Comments

nikils commented Jan 16, 2015

aseemk commented Jan 17, 2015

rlidwka commented Jan 19, 2015

bnjmnt4n commented Jan 19, 2015