Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: Remove unneeded lodash functions and replace polymorphic _.each #199

Merged
merged 3 commits into from Oct 26, 2015

Conversation

STRML
Copy link
Contributor

@STRML STRML commented Oct 17, 2015

This gives us another 40% over the #198, bringing the simple creation benchmark down from 140ms to 60ms with the two combined.

I've removed some lodash modules that were superfluous and introduced complicated dependency chains.

Fast.js does a good job of staying small and light and will introduce minimal extra bundling size for browsers while providing significant speed. I see a very significant deopt in both 'lodash.foreach' and 'fast.js/forEach' due to the Object/Array switch, which is why I require the object version directly. Lodash unfortunately does not expose an Object version directly.

Still more to do for speed, expect more PRs. I will likely reduce the test suite somehow into a benchmark. Please merge #198 first before merging this.

@pgilad
Copy link
Member

pgilad commented Oct 17, 2015

Hi, thanks again for the great work. Can you:

  1. Rebase this off master (Your previous PR was merged so the first commit here is redundant)
  2. Explain why lodash.defaults causes deopt
  3. Why not use native array forEach instead?
  4. Is this node.js benchmarking? Shouldn't we also benchmark in browsers? I know that john-dalton highly optimizes his code, and has the browser benchmarking for it.

@STRML
Copy link
Contributor Author

STRML commented Oct 17, 2015

Hey @pgilad,

  1. Rebased.
  2. Re: defaults, it appears to be because of its default use of a customizer function. Removing that removes the deopt notice and gives us about 10%.
  3. We do use native array forEach as much as possible, but for objects you either need to use a library, or do a fiddly Object.keys().forEach(function(key) { var value = obj[key]; ... }); construct. Happy to just make a small helper if you'd like to remove the fastjs dep entirely.
  4. We could do some browser benchmarking but it would be considerably more work. By targeting V8 specifically we get Node + Chrome + Safari (somewhat), but optimizations that work for V8 (such as reducing polymorphism) should improve optimization in all browsers. It is very unlikely that the work would slow down code on any other browsers.

@wraithgar
Copy link
Contributor

  • Here's a wild thought: should we implement the forEach for objects at amp-object-forEach?
  • Or, can we maybe convince jdd to add this to / expose this from lodash?
  • Is that even worth it for (looks at actual code that fast.js adds) 14 lines of very optimized code?
  • Do we care that fast.js is <1.0.0?

Sorry that it's all questions instead of actual feedback so far. I'm really excited about the energy you're putting into these PRs for optimization @STRML.

I read through the diff and it all looks ok to me. Array.isArray is IE9+ so we aren't giving up any compatibility. Kind of wish we'd have caught on to that before, dang.

@STRML
Copy link
Contributor Author

STRML commented Oct 17, 2015

There was actually already an Array.isArray() in unset so I figured the incompatibility with IE8- had already happened.

Re: amp-object-forEach, I don't think it's worth reinventing the wheel. By importing the methods directly from Fast.js we save a lot when bundling and save a lengthy npm install (which can be quite annoying with all the lodash subdeps). I would argue we should do the same with the lodash modules as the dependency graph (at least in npm 2) is quite long and tangled.

I don't care about fast.js < 1.0.0. The version number doesn't mean anything substantial, just like how Node was pre-1.0 for so many years. Fast.js is stable and popular.

@HenrikJoreteg
Copy link
Member

A few, somewhat disjointed thoughts:

I think since npm 3 is released, I think we shouldn't worry too terribly much about npm 2. The other idea between using a shared set of utils was to save code when using multiple ampersand modules together.

All that said, I think it may be worth looking at total weight of resulting code w/ dependencies as another metric. I've done this in other cases by browserify-ing and minifying the resulting package.

I'd be curious to see how these compared in that regard. I know when we switched to lodash we gained some weight :-/

Related... what does @AmpersandJS/core-team think about adding @STRML? I find his contributions hugely valuable and want to get out of his way :)

@STRML
Copy link
Contributor Author

STRML commented Oct 19, 2015

@HenrikJoreteg I would very much like to forget about npm 2, but the simple truth is that npm 3 is still slow, even though they closed the issue. I and many like me still can't use it in production.

In any case, I agree, npm install bloat is far less of a big deal than simple bundle size. And webpack loaders can be used to replace lodash modules so it's not the end of the world.

We win overall with this (note that ampersand-events and its deps come with):

> webpack ampersand-state.js --output-file=foo.js --optimize-minimize

fastjs branch:
50kb minified
11kb gzipped

master:
60kb minified
12kb gzipped

@HenrikJoreteg
Copy link
Member

@jdalton, any input on how lodash may help here? @STRML is looking for a fast/light object-only forEach and ampersand only supports IE9+.

@STRML
Copy link
Contributor Author

STRML commented Oct 19, 2015

@HenrikJoreteg I'll check, we could possibly use lodash/object/forOwn to the same effect.

@wraithgar
Copy link
Contributor

I'm all for adding @STRML to the ampersand team.

@jdalton
Copy link

jdalton commented Oct 19, 2015

There is lodash.forown as well. I'm not a fan of the micro-benchmarky v8-only focus thing though but on that note I'm confused at wanting to use Array#forEach, which doesn't perform as well as lodash.foreach and friends in many environments because of inlining and other engine hurdles.

I think doing something like !!Object.keys(this._changed).length rather than isEmpty(this._changed) is pretty gross.

Lodash v4 (not out yet) is for modern browsers only, drops a lot of the Object.keys weight associated with it, and allows for much smaller builds. Here's an example of it being used with redux to keep their small 2kb build, reduce reliance on duplicated utilities and the test weight that comes with it.

@HenrikJoreteg
Copy link
Member

Cool, thanks for chiming in @jdalton.

@jdalton
Copy link

jdalton commented Oct 19, 2015

Btw when lodash v4 release you all could update versions without having to do all these rewiring-things and get file-size wins. I expect to release lodash v4 at the same time as jQuery 3.0. Either Jan 12 or 16. It's either on the day Microsoft drops IE8 support or on jQuery's 10th birthday.

@cdaringe
Copy link
Member

cool stuff, @STRML. i'm definitely in support of the team add :)

@STRML
Copy link
Contributor Author

STRML commented Oct 19, 2015

Thanks @jdalton - I agree that the benchmark needs work and I'm almost done with one that will take it through many more complicated use scenarios so we get a better picture of overall performance. My initial focus was on easy deopt wins & model creation speed because we create and destroy a very large volume of models in my day job.

I found that the major hurdle in using _.each in performance-sensitive code was its Object/Array forOwn/forEach switch. Switching off of it (as this PR has done) gave us a pretty healthy boost in model creation. In general I'd like to stay with natives when possible as performance is getting better and it reduces bundle size.

I agree that v8-only is not an ideal focus but it the simplest for now. Full browser-suite benchmarks are in my roadmap but are considerably more effort than a Node script with time/timeEnd.

@jdalton
Copy link

jdalton commented Oct 19, 2015

My initial focus was on easy deopt wins & model creation speed because we create and destroy a very large volume of models in my day job.

Do you have link to the benchmark you used or a results summary?
If it's a de-opt do you have the v8 bailout or de-opt reason?

as performance is getting better

Kinda sorta (some have optimized a few bits and pieces), but not really. These ES5 methods have been around for ~10yrs now and are largely unoptimized. And now there's ES6 methods with similar treatment, e.g. Object.assign in v8 is slower than lib alternatives. When I was the performance PM for Chakra, the JS engine in MS Edge, I pushed for improving built-in performance but as far as I know there's no effort of substance across engines, so there's still a long way to go.

I agree that v8-only is not an ideal focus but it the simplest for now. Full browser-suite benchmarks are in my roadmap but are considerably more effort than a Node script with time/timeEnd.

Be wary of micro-opts. It's easy to fall into the trap of modifying code paths because one is 4 million ops/sec vs. 2 million ops/sec when either are good enough. Try to go for the bigger wins, such as the lodash optimization to avoid linear searches in methods like _.uniq.

@STRML
Copy link
Contributor Author

STRML commented Oct 20, 2015

Thanks @jdalton , I appreciate your input here.

The deopt is usually very similar to:

[deoptimizing (DEOPT eager): begin 0x27cdf67fcb9 <JS Function (SharedFunctionInfo 0x2e9d3dae5541)> (opt #79) @55, FP to SP delta: 232]
            ;;; deoptimize at 884: value mismatch
  reading input frame  => node=4, args=51, height=2; inputs:
      0: 0x27cdf67fcb9 ; (frame function) 0x27cdf67fcb9 <JS Function (SharedFunctionInfo 0x2e9d3dae5541)>
      1: 0x3d30b3c04131 ; [fp + 40] 0x3d30b3c04131 <undefined>
      2: 0x1301141bc721 ; rdi 0x1301141bc721 <JS Array[0]>
      3: 0x1301141bca91 ; r9 0x1301141bca91 <JS Function (SharedFunctionInfo 0x27cdf6c4851)>
      4: 0x3d30b3c04131 ; [fp + 16] 0x3d30b3c04131 <undefined>
      5: 0x27cdf684589 ; rax 0x27cdf684589 <FixedArray[6]>
      6: 0x27cdf685109 ; rbx 0x27cdf685109 <JS Function arrayEach (SharedFunctionInfo 0x2e9d3dadb0e1)>
  reading input frame arrayEach => node=3, args=39, height=3; inputs:
      0: 0x27cdf685109 ; (literal 31) 0x27cdf685109 <JS Function arrayEach (SharedFunctionInfo 0x2e9d3dadb0e1)>
      1: 0x3d30b3cc9701 ; (literal 16) 0x3d30b3cc9701 <JS Global Object>
      2: 0x1301141bc721 ; rdi 0x1301141bc721 <JS Array[0]>
      3: 0x1301141bca91 ; r9 0x1301141bca91 <JS Function (SharedFunctionInfo 0x27cdf6c4851)>
      4: 0x3d30b3c401c1 ; (literal 32) 0x3d30b3c401c1 <FixedArray[137]>
      5: 0xffffffff00000000 ; (literal 33) -1
      6: 0 ; r8
  translating frame  => node=51, height=8
    0x7fff5fbfedd8: [top + 64] <- 0x3d30b3c04131 ;  0x3d30b3c04131 <undefined>  (input #1)
    0x7fff5fbfedd0: [top + 56] <- 0x1301141bc721 ;  0x1301141bc721 <JS Array[0]>  (input #2)
    0x7fff5fbfedc8: [top + 48] <- 0x1301141bca91 ;  0x1301141bca91 <JS Function (SharedFunctionInfo 0x27cdf6c4851)>  (input #3)
    0x7fff5fbfedc0: [top + 40] <- 0x3d30b3c04131 ;  0x3d30b3c04131 <undefined>  (input #4)
    0x7fff5fbfedb8: [top + 32] <- 0x11ab99619774 ;  caller's pc
    0x7fff5fbfedb0: [top + 24] <- 0x7fff5fbfedf8 ;  caller's fp
    0x7fff5fbfeda8: [top + 16] <- 0x27cdf684589 ;  context    0x27cdf684589 <FixedArray[6]>  (input #5)
    0x7fff5fbfeda0: [top + 8] <- 0x27cdf67fcb9 ;  function    0x27cdf67fcb9 <JS Function (SharedFunctionInfo 0x2e9d3dae5541)>  (input #0)
    0x7fff5fbfed98: [top + 0] <- 0x27cdf685109 ;  0x27cdf685109 <JS Function arrayEach (SharedFunctionInfo 0x2e9d3dadb0e1)>  (input #6)
  translating frame arrayEach => node=39, height=16
    0x7fff5fbfed90: [top + 64] <- 0x3d30b3cc9701 ;  0x3d30b3cc9701 <JS Global Object>  (input #1)
    0x7fff5fbfed88: [top + 56] <- 0x1301141bc721 ;  0x1301141bc721 <JS Array[0]>  (input #2)
    0x7fff5fbfed80: [top + 48] <- 0x1301141bca91 ;  0x1301141bca91 <JS Function (SharedFunctionInfo 0x27cdf6c4851)>  (input #3)
    0x7fff5fbfed78: [top + 40] <- 0x11ab99712b44 ;  caller's pc
    0x7fff5fbfed70: [top + 32] <- 0x7fff5fbfedb0 ;  caller's fp
    0x7fff5fbfed68: [top + 24] <- 0x3d30b3c401c1 ;  context    0x3d30b3c401c1 <FixedArray[137]>  (input #4)
    0x7fff5fbfed60: [top + 16] <- 0x27cdf685109 ;  function    0x27cdf685109 <JS Function arrayEach (SharedFunctionInfo 0x2e9d3dadb0e1)>  (input #0)
    0x7fff5fbfed58: [top + 8] <- 0xffffffff00000000 ;  -1  (input #5)
    0x7fff5fbfed50: [top + 0] <- 0x00000000 ;  0  (input #6)
[deoptimizing (eager): end 0x27cdf67fcb9 <JS Function (SharedFunctionInfo 0x2e9d3dae5541)> @55 => node=39, pc=0x11ab99715142, state=NO_REGISTERS, alignment=no padding, took 0.680 ms]

The deopt is caused by the object/array switch in _.each(). If I only use _.each() on arrays, and _.forOwn() on objects, it doesn't happen. Additionally, if I only use _.each() on objects, and use native Array#forEach(), the deopt also doesn't occur.

I'm aware my method isn't perfect - trying to save time while I work on many many projects. I figured this was an easy win without any behavior changes.

@jdalton
Copy link

jdalton commented Oct 20, 2015

The deopt is caused by the object/array switch in _.each(). If I only use _.each() on arrays, and _.forOwn() on objects, it doesn't happen. Additionally, if I only use _.each() on objects, and use native Array#forEach(), the deopt also doesn't occur.

Ah, so not due to lodash itself. From the Chakra side we'll bail-out when the jitted code hits the different type (so from array to object) but keep the previously jitted array path code around. If the mixed path becomes hot then the method will be re-jitted with more generic optimizations applied and eventually the array-jitted form will be gc'ed.

Do you have any perf numbers for these calls or some kind of context to weigh the wins?

@STRML
Copy link
Contributor Author

STRML commented Oct 20, 2015

So, this is very far from comprehensive, but I run a simple creation / derived getter bench here. This is by far the most common use case in my application, and it directly affects startup speed (we have a lot of models).

I'm almost done adapting the test suite so we get a clearer picture of all operations.

In any case, with this branch, I see results between 67-77ms. On master I see 92-100. Before the last PR I wrote optimizing this (#198), about 140ms. Not a massive difference (certainly not like _.uniq) but a pretty easy win and doesn't break any tests.

Latest commit reverts back to lodash, using forOwn directly accomplishes the same goal and we don't have to bring in such a large module.

@pgilad
Copy link
Member

pgilad commented Oct 23, 2015

@AmpersandJS/core-team Are we good with merging this?

@wraithgar
Copy link
Contributor

👍 I'm good with this.

@kamilogorek
Copy link
Contributor

👍 from me as well

pgilad added a commit that referenced this pull request Oct 26, 2015
Optimization: Remove unneeded lodash functions and replace polymorphic _.each
@pgilad pgilad merged commit b7c60c7 into AmpersandJS:master Oct 26, 2015
@pgilad
Copy link
Member

pgilad commented Oct 26, 2015

@STRML Thank you very much for the continued support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants