Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speedup array work by using push instead of concat #185

Closed
wants to merge 4 commits into from

Conversation

elidoran
Copy link
Contributor

Current implementation creates "throw away" (intermediary) arrays with two concat() calls on an empty initial new array.

It could instead be obj[key] = [].concat(obj[key], val); which puts both values in the same call to concat(). This is only marginally faster.

The fastest way is using push() when it's an array and creating the new array with the values otherwise. When testing with 10 million iterations this way is an order of magnitude faster than the two above methods. It also avoids creating a lot of "throw away" arrays.

Travic CI failure is due to new linting seeing unnecessary escapes. I fixed those in a new PR #184.

@ljharb
Copy link
Owner

ljharb commented Dec 17, 2016

Which benchmarks are you using to determine "faster"? Due to the JIT in JS, if it's "intuition" or "theory", it's invariably a wrong assumption.

Separately, since nobody has 10 million query string values, that's an irrelevant comparison. I'd need to see proof that it was actually faster on practical, real-world test cases, before wanting to use mutation, merely for performance.

obj[key] = [].concat(obj[key]).concat(val);
var cur = obj[key];
if (Array.isArray(cur)) {
cur.push(val);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also worth noting that this behaves differently when val is an array or not, because of the concat behavior. Since the decoder can be user-provided, it becomes very important to retain that auto-spreading behavior (as well as the lack of mutation).

@elidoran
Copy link
Contributor Author

I spent time setting up performance testing runs and comparing four different implementations to see which ones work better. I didn't imagine it :) I also don't expect you to take my word for it. I figured you'd run some tests yourself to see. I can share some simple perf testing scripts if you'd like. I didn't get into the details in the PR because, basically, when contributing to a new project I'm never sure if the maintainer is still active or cares for PR's. :) So, I throw some in there to see how they respond.

I spotted this section because the concat function is one of those things on my mental list to look out for because performance testing shows avoiding it is often faster.

I think running the tests many times is relevant because it causes runtime optimizations to kick in and give more real results. A server will eventually run a while and parse millions of query strings.

I tested the implementations by giving it the permutations:

  1. two non-array values
  2. two array values
  3. array and non-array
  4. non-array and array

The implementations I tested:

  1. the current one: [].concat(obj[key]).concat(val)

  2. a variant: [].concat(obj[key], val)

  3. a crazy one: Array.prototype.concat(obj[key], val) (It’s supposed to be less than ideal, and it is)

  4. the one I submitted

  5. and the one which allows val to be an array:

    var cur = obj[key];
    if (Array.isArray(cur)) {
        if (Array.isArray(val)) {
            cur.push.apply(cur, val);
        } else {
            cur.push(val);
        }
    } else if (Array.isArray(val)) {
        val.unshift(cur);
        obj[key] = val;
    } else {
        obj[key] = [cur, val];
    }

I left it the simpler way expecting val to always be a non-array because the val is pulled from a string via string.slice() right above there. I didn't look at the optional decoder and think "hmm, they may return an array there.” That’s my mistake.

I'd be happy to change the PR to use that implementation, if you're interested. And, I can provide some performance testing

@ljharb
Copy link
Owner

ljharb commented Dec 17, 2016

Thanks, I'm glad to hear you're being thorough :-)

So, to reiterate, my concerns are:

  1. first and foremost, the code should be clear - "fast" is the least important consideration.
  2. observable mutations must be avoided, and non-observable mutations should be avoided when possible (for clarity)
  3. if there is a compelling performance case to non-observably mutate, then we can do that, but it needs to be done cautiously, and with comments and metrics detailing why the mutation was permitted.

One way to mitigate the code clarity concern is to move the current implementation into a helper function first - leaving the main function clean - and then optimize the performance of that helper function. In other words, if we stick [].concat(obj[key]).concat(val) in a function, and then swap out the implementation of that function for testing. That also means that even your 5th implementation, which is more complex and larger, could still live in an internally tested abstraction, away from the main function.

Please do update the PR; it's exceedingly rare that a performance claim is actually worth optimizing for when evidence is requested, so it'd be fun to see this one bear out :-)

@elidoran
Copy link
Contributor Author

I don’t consider speed the least consideration. If it’s very clear code but, takes forever, then it’s of no use.

I agree about clarity. Code should be as readable as possible. JavaScript has a tough time with that. All the extra braces and parenthesis and stuff make it messy. Also, this code is lacking comments, so, I didn’t put my own in, otherwise I would document what it’s doing. I think comments are required for clarity. I added a few to the new version of this.

I’m wary of mutation where appropriate. I’m not wholly opposed to its use. If the array came from outside the library then “defensive copying” would be a good idea at the start. In this case, the qs library is creating the array itself, using it repeatedly, and then providing it afterwards back to the user. So, it’s okay for it to mutate the arrays it creates for itself while processing the query string.

I’m also a fan of moving code blocks out into their own functions. Monolithic functions are tough to deal with. I moved this one out to the already require()’d “utils”. I also altered its structure for greater readability.

I’m providing the benchmark info via a separate PR; #187. That way it isn’t part of a code change PR.

Below is an image of the results I get when running the benchmark. The column headers show the type of the two inputs given to the implementations. The first column is the row headers showing which implementation it is. The numbers are what's produced by BenchmarkJS which are the average number of operations during a run and the +- stuff is the variance of the result from run to run.

qs-arraycombine-benchmark


// mutate `a` to append `b` and then return it
if (firstIsArray) {
secondIsArray ? a.push.apply(a, b) : a.push(b);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a good idea to merge in utils.combine with concat first into master, and then have the optimization PR be separate.

@elidoran elidoran mentioned this pull request Dec 22, 2016
@elidoran elidoran closed this Dec 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants