Optimization: Remove unneeded lodash functions and replace polymorphic _.each #199

STRML · 2015-10-17T00:57:09Z

This gives us another 40% over the #198, bringing the simple creation benchmark down from 140ms to 60ms with the two combined.

I've removed some lodash modules that were superfluous and introduced complicated dependency chains.

Fast.js does a good job of staying small and light and will introduce minimal extra bundling size for browsers while providing significant speed. I see a very significant deopt in both 'lodash.foreach' and 'fast.js/forEach' due to the Object/Array switch, which is why I require the object version directly. Lodash unfortunately does not expose an Object version directly.

Still more to do for speed, expect more PRs. I will likely reduce the test suite somehow into a benchmark. Please merge #198 first before merging this.

pgilad · 2015-10-17T06:58:14Z

Hi, thanks again for the great work. Can you:

Rebase this off master (Your previous PR was merged so the first commit here is redundant)
Explain why lodash.defaults causes deopt
Why not use native array forEach instead?
Is this node.js benchmarking? Shouldn't we also benchmark in browsers? I know that john-dalton highly optimizes his code, and has the browser benchmarking for it.

Fast.js offers an object-only forEach which can be better optimized.

STRML · 2015-10-17T13:18:32Z

Hey @pgilad,

Rebased.
Re: defaults, it appears to be because of its default use of a customizer function. Removing that removes the deopt notice and gives us about 10%.
We do use native array forEach as much as possible, but for objects you either need to use a library, or do a fiddly Object.keys().forEach(function(key) { var value = obj[key]; ... }); construct. Happy to just make a small helper if you'd like to remove the fastjs dep entirely.
We could do some browser benchmarking but it would be considerably more work. By targeting V8 specifically we get Node + Chrome + Safari (somewhat), but optimizations that work for V8 (such as reducing polymorphism) should improve optimization in all browsers. It is very unlikely that the work would slow down code on any other browsers.

wraithgar · 2015-10-17T16:14:12Z

Here's a wild thought: should we implement the forEach for objects at amp-object-forEach?
Or, can we maybe convince jdd to add this to / expose this from lodash?
Is that even worth it for (looks at actual code that fast.js adds) 14 lines of very optimized code?
Do we care that fast.js is <1.0.0?

Sorry that it's all questions instead of actual feedback so far. I'm really excited about the energy you're putting into these PRs for optimization @STRML.

I read through the diff and it all looks ok to me. Array.isArray is IE9+ so we aren't giving up any compatibility. Kind of wish we'd have caught on to that before, dang.

STRML · 2015-10-17T18:06:21Z

There was actually already an Array.isArray() in unset so I figured the incompatibility with IE8- had already happened.

Re: amp-object-forEach, I don't think it's worth reinventing the wheel. By importing the methods directly from Fast.js we save a lot when bundling and save a lengthy npm install (which can be quite annoying with all the lodash subdeps). I would argue we should do the same with the lodash modules as the dependency graph (at least in npm 2) is quite long and tangled.

I don't care about fast.js < 1.0.0. The version number doesn't mean anything substantial, just like how Node was pre-1.0 for so many years. Fast.js is stable and popular.

HenrikJoreteg · 2015-10-19T15:47:02Z

A few, somewhat disjointed thoughts:

I think since npm 3 is released, I think we shouldn't worry too terribly much about npm 2. The other idea between using a shared set of utils was to save code when using multiple ampersand modules together.

All that said, I think it may be worth looking at total weight of resulting code w/ dependencies as another metric. I've done this in other cases by browserify-ing and minifying the resulting package.

I'd be curious to see how these compared in that regard. I know when we switched to lodash we gained some weight :-/

Related... what does @AmpersandJS/core-team think about adding @STRML? I find his contributions hugely valuable and want to get out of his way :)

STRML · 2015-10-19T15:54:38Z

@HenrikJoreteg I would very much like to forget about npm 2, but the simple truth is that npm 3 is still slow, even though they closed the issue. I and many like me still can't use it in production.

In any case, I agree, npm install bloat is far less of a big deal than simple bundle size. And webpack loaders can be used to replace lodash modules so it's not the end of the world.

We win overall with this (note that ampersand-events and its deps come with):

> webpack ampersand-state.js --output-file=foo.js --optimize-minimize

fastjs branch:
50kb minified
11kb gzipped

master:
60kb minified
12kb gzipped

HenrikJoreteg · 2015-10-19T15:59:56Z

@jdalton, any input on how lodash may help here? @STRML is looking for a fast/light object-only forEach and ampersand only supports IE9+.

STRML · 2015-10-19T16:01:26Z

@HenrikJoreteg I'll check, we could possibly use lodash/object/forOwn to the same effect.

wraithgar · 2015-10-19T16:22:42Z

I'm all for adding @STRML to the ampersand team.

jdalton · 2015-10-19T17:03:27Z

There is lodash.forown as well. I'm not a fan of the micro-benchmarky v8-only focus thing though but on that note I'm confused at wanting to use Array#forEach, which doesn't perform as well as lodash.foreach and friends in many environments because of inlining and other engine hurdles.

I think doing something like !!Object.keys(this._changed).length rather than isEmpty(this._changed) is pretty gross.

Lodash v4 (not out yet) is for modern browsers only, drops a lot of the Object.keys weight associated with it, and allows for much smaller builds. Here's an example of it being used with redux to keep their small 2kb build, reduce reliance on duplicated utilities and the test weight that comes with it.

HenrikJoreteg · 2015-10-19T17:05:26Z

Cool, thanks for chiming in @jdalton.

jdalton · 2015-10-19T17:11:17Z

Btw when lodash v4 release you all could update versions without having to do all these rewiring-things and get file-size wins. I expect to release lodash v4 at the same time as jQuery 3.0. Either Jan 12 or 16. It's either on the day Microsoft drops IE8 support or on jQuery's 10th birthday.

cdaringe · 2015-10-19T17:24:34Z

cool stuff, @STRML. i'm definitely in support of the team add :)

STRML · 2015-10-19T18:39:37Z

Thanks @jdalton - I agree that the benchmark needs work and I'm almost done with one that will take it through many more complicated use scenarios so we get a better picture of overall performance. My initial focus was on easy deopt wins & model creation speed because we create and destroy a very large volume of models in my day job.

I found that the major hurdle in using _.each in performance-sensitive code was its Object/Array forOwn/forEach switch. Switching off of it (as this PR has done) gave us a pretty healthy boost in model creation. In general I'd like to stay with natives when possible as performance is getting better and it reduces bundle size.

I agree that v8-only is not an ideal focus but it the simplest for now. Full browser-suite benchmarks are in my roadmap but are considerably more effort than a Node script with time/timeEnd.

jdalton · 2015-10-19T23:02:02Z

My initial focus was on easy deopt wins & model creation speed because we create and destroy a very large volume of models in my day job.

Do you have link to the benchmark you used or a results summary?
If it's a de-opt do you have the v8 bailout or de-opt reason?

as performance is getting better

Kinda sorta (some have optimized a few bits and pieces), but not really. These ES5 methods have been around for ~10yrs now and are largely unoptimized. And now there's ES6 methods with similar treatment, e.g. Object.assign in v8 is slower than lib alternatives. When I was the performance PM for Chakra, the JS engine in MS Edge, I pushed for improving built-in performance but as far as I know there's no effort of substance across engines, so there's still a long way to go.

I agree that v8-only is not an ideal focus but it the simplest for now. Full browser-suite benchmarks are in my roadmap but are considerably more effort than a Node script with time/timeEnd.

Be wary of micro-opts. It's easy to fall into the trap of modifying code paths because one is 4 million ops/sec vs. 2 million ops/sec when either are good enough. Try to go for the bigger wins, such as the lodash optimization to avoid linear searches in methods like _.uniq.

STRML · 2015-10-20T01:02:09Z

Thanks @jdalton , I appreciate your input here.

The deopt is usually very similar to:

[deoptimizing (DEOPT eager): begin 0x27cdf67fcb9 <JS Function (SharedFunctionInfo 0x2e9d3dae5541)> (opt #79) @55, FP to SP delta: 232]
            ;;; deoptimize at 884: value mismatch
  reading input frame  => node=4, args=51, height=2; inputs:
      0: 0x27cdf67fcb9 ; (frame function) 0x27cdf67fcb9 <JS Function (SharedFunctionInfo 0x2e9d3dae5541)>
      1: 0x3d30b3c04131 ; [fp + 40] 0x3d30b3c04131 <undefined>
      2: 0x1301141bc721 ; rdi 0x1301141bc721 <JS Array[0]>
      3: 0x1301141bca91 ; r9 0x1301141bca91 <JS Function (SharedFunctionInfo 0x27cdf6c4851)>
      4: 0x3d30b3c04131 ; [fp + 16] 0x3d30b3c04131 <undefined>
      5: 0x27cdf684589 ; rax 0x27cdf684589 <FixedArray[6]>
      6: 0x27cdf685109 ; rbx 0x27cdf685109 <JS Function arrayEach (SharedFunctionInfo 0x2e9d3dadb0e1)>
  reading input frame arrayEach => node=3, args=39, height=3; inputs:
      0: 0x27cdf685109 ; (literal 31) 0x27cdf685109 <JS Function arrayEach (SharedFunctionInfo 0x2e9d3dadb0e1)>
      1: 0x3d30b3cc9701 ; (literal 16) 0x3d30b3cc9701 <JS Global Object>
      2: 0x1301141bc721 ; rdi 0x1301141bc721 <JS Array[0]>
      3: 0x1301141bca91 ; r9 0x1301141bca91 <JS Function (SharedFunctionInfo 0x27cdf6c4851)>
      4: 0x3d30b3c401c1 ; (literal 32) 0x3d30b3c401c1 <FixedArray[137]>
      5: 0xffffffff00000000 ; (literal 33) -1
      6: 0 ; r8
  translating frame  => node=51, height=8
    0x7fff5fbfedd8: [top + 64] <- 0x3d30b3c04131 ;  0x3d30b3c04131 <undefined>  (input #1)
    0x7fff5fbfedd0: [top + 56] <- 0x1301141bc721 ;  0x1301141bc721 <JS Array[0]>  (input #2)
    0x7fff5fbfedc8: [top + 48] <- 0x1301141bca91 ;  0x1301141bca91 <JS Function (SharedFunctionInfo 0x27cdf6c4851)>  (input #3)
    0x7fff5fbfedc0: [top + 40] <- 0x3d30b3c04131 ;  0x3d30b3c04131 <undefined>  (input #4)
    0x7fff5fbfedb8: [top + 32] <- 0x11ab99619774 ;  caller's pc
    0x7fff5fbfedb0: [top + 24] <- 0x7fff5fbfedf8 ;  caller's fp
    0x7fff5fbfeda8: [top + 16] <- 0x27cdf684589 ;  context    0x27cdf684589 <FixedArray[6]>  (input #5)
    0x7fff5fbfeda0: [top + 8] <- 0x27cdf67fcb9 ;  function    0x27cdf67fcb9 <JS Function (SharedFunctionInfo 0x2e9d3dae5541)>  (input #0)
    0x7fff5fbfed98: [top + 0] <- 0x27cdf685109 ;  0x27cdf685109 <JS Function arrayEach (SharedFunctionInfo 0x2e9d3dadb0e1)>  (input #6)
  translating frame arrayEach => node=39, height=16
    0x7fff5fbfed90: [top + 64] <- 0x3d30b3cc9701 ;  0x3d30b3cc9701 <JS Global Object>  (input #1)
    0x7fff5fbfed88: [top + 56] <- 0x1301141bc721 ;  0x1301141bc721 <JS Array[0]>  (input #2)
    0x7fff5fbfed80: [top + 48] <- 0x1301141bca91 ;  0x1301141bca91 <JS Function (SharedFunctionInfo 0x27cdf6c4851)>  (input #3)
    0x7fff5fbfed78: [top + 40] <- 0x11ab99712b44 ;  caller's pc
    0x7fff5fbfed70: [top + 32] <- 0x7fff5fbfedb0 ;  caller's fp
    0x7fff5fbfed68: [top + 24] <- 0x3d30b3c401c1 ;  context    0x3d30b3c401c1 <FixedArray[137]>  (input #4)
    0x7fff5fbfed60: [top + 16] <- 0x27cdf685109 ;  function    0x27cdf685109 <JS Function arrayEach (SharedFunctionInfo 0x2e9d3dadb0e1)>  (input #0)
    0x7fff5fbfed58: [top + 8] <- 0xffffffff00000000 ;  -1  (input #5)
    0x7fff5fbfed50: [top + 0] <- 0x00000000 ;  0  (input #6)
[deoptimizing (eager): end 0x27cdf67fcb9 <JS Function (SharedFunctionInfo 0x2e9d3dae5541)> @55 => node=39, pc=0x11ab99715142, state=NO_REGISTERS, alignment=no padding, took 0.680 ms]

The deopt is caused by the object/array switch in _.each(). If I only use _.each() on arrays, and _.forOwn() on objects, it doesn't happen. Additionally, if I only use _.each() on objects, and use native Array#forEach(), the deopt also doesn't occur.

I'm aware my method isn't perfect - trying to save time while I work on many many projects. I figured this was an easy win without any behavior changes.

jdalton · 2015-10-20T01:15:37Z

The deopt is caused by the object/array switch in _.each(). If I only use _.each() on arrays, and _.forOwn() on objects, it doesn't happen. Additionally, if I only use _.each() on objects, and use native Array#forEach(), the deopt also doesn't occur.

Ah, so not due to lodash itself. From the Chakra side we'll bail-out when the jitted code hits the different type (so from array to object) but keep the previously jitted array path code around. If the mixed path becomes hot then the method will be re-jitted with more generic optimizations applied and eventually the array-jitted form will be gc'ed.

Do you have any perf numbers for these calls or some kind of context to weigh the wins?

STRML · 2015-10-20T19:11:04Z

So, this is very far from comprehensive, but I run a simple creation / derived getter bench here. This is by far the most common use case in my application, and it directly affects startup speed (we have a lot of models).

I'm almost done adapting the test suite so we get a clearer picture of all operations.

In any case, with this branch, I see results between 67-77ms. On master I see 92-100. Before the last PR I wrote optimizing this (#198), about 140ms. Not a massive difference (certainly not like _.uniq) but a pretty easy win and doesn't break any tests.

Latest commit reverts back to lodash, using forOwn directly accomplishes the same goal and we don't have to bring in such a large module.

pgilad · 2015-10-23T20:05:10Z

@AmpersandJS/core-team Are we good with merging this?

wraithgar · 2015-10-23T21:09:03Z

👍 I'm good with this.

kamilogorek · 2015-10-26T12:11:35Z

👍 from me as well

Optimization: Remove unneeded lodash functions and replace polymorphic _.each

pgilad · 2015-10-26T16:23:13Z

@STRML Thank you very much for the continued support!

STRML added 2 commits October 17, 2015 08:07

Remove lodash.defaults, causes deopt

fe43ad8

Opt: Remove unneeded lodash functions and replace polymorphic _.each

94e427a

Fast.js offers an object-only forEach which can be better optimized.

STRML force-pushed the fastjs branch from 00b4165 to 94e427a Compare October 17, 2015 13:07

Use lodash.forOwn instead of fastjs

1dfff8f

STRML force-pushed the fastjs branch from 56eff7d to 1dfff8f Compare October 20, 2015 19:08

pgilad added a commit that referenced this pull request Oct 26, 2015

Merge pull request #199 from STRML/fastjs

b7c60c7

Optimization: Remove unneeded lodash functions and replace polymorphic _.each

pgilad merged commit b7c60c7 into AmpersandJS:master Oct 26, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization: Remove unneeded lodash functions and replace polymorphic _.each #199

Optimization: Remove unneeded lodash functions and replace polymorphic _.each #199

STRML commented Oct 17, 2015

pgilad commented Oct 17, 2015

STRML commented Oct 17, 2015

wraithgar commented Oct 17, 2015

STRML commented Oct 17, 2015

HenrikJoreteg commented Oct 19, 2015

STRML commented Oct 19, 2015

HenrikJoreteg commented Oct 19, 2015

STRML commented Oct 19, 2015

wraithgar commented Oct 19, 2015

jdalton commented Oct 19, 2015

HenrikJoreteg commented Oct 19, 2015

jdalton commented Oct 19, 2015

cdaringe commented Oct 19, 2015

STRML commented Oct 19, 2015

jdalton commented Oct 19, 2015

STRML commented Oct 20, 2015

jdalton commented Oct 20, 2015

STRML commented Oct 20, 2015

pgilad commented Oct 23, 2015

wraithgar commented Oct 23, 2015

kamilogorek commented Oct 26, 2015

pgilad commented Oct 26, 2015

Optimization: Remove unneeded lodash functions and replace polymorphic _.each #199

Optimization: Remove unneeded lodash functions and replace polymorphic _.each #199

Conversation

STRML commented Oct 17, 2015

pgilad commented Oct 17, 2015

STRML commented Oct 17, 2015

wraithgar commented Oct 17, 2015

STRML commented Oct 17, 2015

HenrikJoreteg commented Oct 19, 2015

STRML commented Oct 19, 2015

HenrikJoreteg commented Oct 19, 2015

STRML commented Oct 19, 2015

wraithgar commented Oct 19, 2015

jdalton commented Oct 19, 2015

HenrikJoreteg commented Oct 19, 2015

jdalton commented Oct 19, 2015

cdaringe commented Oct 19, 2015

STRML commented Oct 19, 2015

jdalton commented Oct 19, 2015

STRML commented Oct 20, 2015

jdalton commented Oct 20, 2015

STRML commented Oct 20, 2015

pgilad commented Oct 23, 2015

wraithgar commented Oct 23, 2015

kamilogorek commented Oct 26, 2015

pgilad commented Oct 26, 2015