Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async modularized #984

Closed
wants to merge 46 commits into from
Closed

async modularized #984

wants to merge 46 commits into from

Conversation

Kikobeats
Copy link
Contributor

Async modularized

What we are

If you check the size of the library (36KB) maybe you think that modularized all methods to works serapately is not necessary. This is because you are think that the only benefit is modualize something is get a less build size, but actually is more a question related with flexibility and have options!.

Async is very useful for traditional node backends. It's use callback pattern, but nowadays promises is in the roadmap of whatever node programmer as more simple pattern.

Should Async support promises and callback by default?

I'm sure that then the size of the library would be more higher than 36KB.

Again flexibility is the question!

Ideally, yes, make possible use both pattern could be nice, but in the real life I didnt find scenario that mixin both patterns.

On the other hand, is very popular packet your node bouild using browserify or something similar... and in this key the size is important. Yes, you think that async is not a browser library, but think about WebWorkers and you will unleash the beast.

How others libraries resolve the problem

If you check others libraries highly modularized (for example, lodash) you can see the value of follow this approach:

  • You can build your own build. In the case of async we could select between callback, promises or both!
  • You can select just the method that you need.
  • If you don't need a build, you can directly require the method lib.

How I try to resolve the problem

Internally async use a set of method (currently declared in the main file) that are used for async main function.

Basically more low levels method make more high level methods, so just we want to do is modularize from low to high level.

I created async-js organization and I forked async 1.5.0 code and I started modularizing with this approach:

  • Low level methods are grouped into async.util repository. This repository is more a entry point to know all the methods and necessary scripts to make easy create one repository for each method and that kind of things to avoid manual work. It's modularized into one method for repository following the naming pattern async.util.*.
  • High level methods are grouped into async.* repositories. Because we have low levels methods modularized, high level methods just uses the low level method that need in the logic of the method. Check and example.

The new scenario

If you check the new main file basically reference all async.* repositories and this repositories references the low level methods.

This is in the case that you need all async methods, but in other case you can require the methods that you need like:

var async = require('async');
var each = require('async.each');
var waterfall = require('async.waterfall');

// async.waterfall === waterfall
// async.each === each

The only difference is that async main file have the backward compatibility function names, as async.forEach === async.each.

If you need async.each in your frontend but no more methods, instead of 36KB you have a build of 4KB.

Stuff for the future

It exists a set of things to do to create a more solid ecosystem:

  • Need to discuss about semver. Now I use 0.x semver for experiment and testing. Should be equal to 1.5.0 or the current async version? should be independent and the main file group all the versions of each module?
  • Similar but with testing files. Good moment to move testing per each repository.
  • Ideally, move main async lib into the organization to have all under one roof. I think that exists a lot of people contributing of this project and I feel that need a independent space. Also add Authors and collaborators to package.json.
  • I started this project focused into browser build, but I wanted to create a first approach PR and later extend the point to browser build. So it's pending create the code necessary to create the browser builds.
  • We can release this under v2.0.0. I checked that you have other PR that improve the async code but break the current interface. Could be good integrate together with this modularized approach.
  • Centralized scripts to create the repositories scaffold into a CLI. This CLI could be useful to create a build using Promises style or both. Pify could be useful.

What do you think? :-)

Fix #720
Related #748

@jdalton
Copy link
Contributor

jdalton commented Dec 20, 2015

Hi!

I'm pinging in at the request of @Kikobeats.

First things first, wow this is an awesome amount of work you've done 👏!

I total dig modularization 👍, though I prefer the modularized approach of something like

var series = require('async/series');

because it simplifies things. You don't have to worry about managing issues of or syncing across all repos for simple scaffolding changes in linting, style, or editor configs.

I've had folks express concern with the load cost, in time, of synchronously loading 100+ modules for the monolithic build. The cost is then multiplied by the number of different versions of your package, for me lodash, in anothers dependency tree. Which is why I opt for a pre-bundled monolithic build for the default kitchen sink.

var async = require('async'); // loads 1 file

The modularized single package also favors bundlers like webpack, browserify, & rollup which are gaining in use. This is because they don't have to worry about duplication. With individual packages I've had developers push back on things being too modular. Devs tend to tolerate only a few package dependencies (the fewer the better). Which means for the individual packages a bit of inlining is required to keep the dependency count low. However, duplication leads to larger builds.

Also, when reporting the size of Async look to its gzipped minified size which is only 4.26 kB.
That's even smaller than the specialized lodash v4 core build which is 4.95 kB.

I would also dampen the idea that offering a build tool will solve the problem. In my experience folks don't want to build something. They want something that already works for their package manger of choice. Which means pre-built and packaged versions of things.

@mikermcneil
Copy link
Contributor

Got to agree with @jdalton on the approach. @kikobeats- tremendous work man! Lodash and async are the two most important libraries in the node ecosystem right now (IMO) and the idea of seeing them modular used in a similar fashion is really awesome.

@aearly
Copy link
Collaborator

aearly commented Dec 21, 2015

First off, this is an impressive amount of work!

Its a bit different that what we were planning, but then I also realize that the latest plan only exists in a chat between me and @megawac and in my head. We also talked about this a little in #914 .

There are many goals for async in regards to modularization.

  1. A monolithic CJS file for node/browserify usage (currently lib/async.js)
  2. A monolithic UMD file for use with the browser and by Bower (currently dist/async[.min].js)
  3. A way to use some lodash methods internally, while still supporting 2. (leverage lodash #914)
  4. A way to only include the async methods you actually use in a project ( e.g. require("async/waterfall"), or import waterfall from async)
  5. Break up our docs, using JSDoc and moving the docs closer to the source (Breakup readme #859)
  6. Support all of the above while keeping file-sizes low
  7. Support all of the above while keeping things easy to maintain

Since our earlier discussion in #720 / #748 , two things have changed: The ES6 spec and its modules were finalized, and @Rich-Harris has put a lot of work in to rollup, an ES6 bundler that supports tree-shaking (removing unused imports).

Here is the plan I've been envisioning:

We author async as a set of small ES6 modules, one per method. This is similar do what you've done, except it would use ES6 imports rather than require(). We would create src/collection/each.js, src/util/ensureAsync.js, etc.. Each file would have a single export (the method). We can also pull some of the common private utility functions into a single module, e.g. src/_common.js. Then you can import _withoutIndex from "../_common" and so forth.

To support #1, we would have a src/index.js that is very similar to the main file of lodash-es. We would use the CJS output of Rollup to create lib/index.js (analagous to lib/async today). We might also need some massaging so the main CJS export is an object (need to experiment with rollup).

To support #2, we take the CJS output, and browserify --standalone it to create a UMD distribution. (assuming we have a couple CJS npm dependencies). This creates dist/async.js for use with bower. We'd check this file in once per version to support Bower.

To support #3, we'd take advantage of lodash-es. We can import just the methods we need, or import * from lodash and rely on tree-shaking to omit the methods we don't use.

To support #4, we do two things. First, we individually Rollup each module associated with a public method, and place them in appropriate folders as CJS modules. For example, src/util/asyncify.js would become a CJS module as util/asyncify.js, alloting you to require("async/util/asyncify") from another project. This is similar to lodash's strategy. We would update these individual files once per version. We can also publish the ES6 src/* modules, so they can be used with other ES6 bundlers. We would rely on the new "esnext:main": "src/index.js" convention in the package.json so people could import eachSeries from async and have their bundler do tree-shaking to omit the whole kitchen sink.

For #5, we can break up the readme, either now or later. Each public module would have a large JSDoc block with the signature and prose-form docs, and we can work out the doc publishing strategy later.

Regarding #6, initially we were thinking of using browserify, but the ES6 module strategy will prove to be much more lightweight. Async uses its own methods a lot internally, and there are several helper functions that are used in many places. If each module has to re-import these as CJS, then there are a lot of var _restParams = require("../util/restParams") duplicated all over the place. Since ES6 exports are bindings, rather than references, all of these internal imports can be collapsed into a single declaration with Rollup in the final bundle. When bundling CJS with browserify, each module needs its own closure (function (module, exports, require) { ... }) in order to work, adding a lot of overhead, but with ES6/Rollup, every module can be combined into the same scope (with some name collision resolution when it occurs).

Regarding #7, im a fan of keeping everything in a single repository. I don't thin we need an org yet. Since async uses it's own methods internally, and we occasionally make large sweeping changes to how things are implemented, it's much easier to keep things together. I also think people would rather require("async/collection/filter") rather than require("async.filter"), just because its easier to maintain a single module's version in your package.json, rather than a whole ecosystem of modules. Also, versioning becomes less ambiguous. (Do i really need to update lodash.get@3.10.0 to lodash.get@3.10.1?) @jdalton can probably speak better to the problems and statistics with publishing a kitchen-sink module as well as a constellation of micro-modules. I also don't think we need a CLI with this overall strategy.

Overall, I think we are going to see the ES6 module + tree-shaking strategy being used more and more, and I'd like to see async support it. It's a nice compromise between small modules and discoverability/convenience.

@Rich-Harris
Copy link

Just to be upfront – right now, if you import * from lodash then Rollup doesn't do a great job of removing unused stuff from lodash, because there are a lot of statements in there that it's not yet smart enough to guarantee are unused. (@jdalton I owe you a response to rollup/rollup#45 – bear with me!) At the present time, import chunk from 'lodash/array/chunk' is still the better option.

First, we individually Rollup each module associated with a public method, and place them in appropriate folders as CJS modules

This would work, but you would end up in a situation where someone who required, say, asyncify and waterfall would end up with two copies of e.g. _restParams. It may be that you're better off just doing 1:1 ES6:CJS conversions for that use case, though depending on how tree-shakeable the library is (haven't had a look yet) that could be to the detriment of people doing import { asyncify, waterfall } from 'async' and relying on Webpack 2 or Rollup to remove everything else. (Then again it might be fine! Will have to investigate.)

To support #2, we take the CJS output, and browserify --standalone it to create a UMD distribution

Rollup does allow you to import from CommonJS modules, using https://github.com/rollup/rollup-plugin-npm and https://github.com/rollup/rollup-plugin-commonjs – this might allow you to generate a UMD build directly with Rollup and forgo the Browserify step.

@Kikobeats
Copy link
Contributor Author

Was a good idea create the PR instead of continuing coding behind the dark, now I see more problems associated that my mind avoid 👍.

I associate a package per each util/async method because I thought it would be easier to maintain: If N modules depend of the X package and X have a bug, releasing a new version X and using the last X version fix the problem.

But the point is that I'm fixing a problem creating another one: is it necessary have too much repositories for simple code? Create a request to get noop implementation is a little useless (in time terms and CPU por resolve the dependency reference)

At the beginning I was worried about the quantity of time since install async dependencies (divided in modules) to be ready to code. I testing and need around 15s and I was that is ok.

But now I see another better point: Instead of fragment the library into repositories, we can fragment the function into files. Instead of create a file and export the method, like:

'use strict';
module.exports = function noop () {};

Just define the method:

function noop () {};

Doing this per each function and having a map of functions references that each method need (for async.util and async) we can inject the function in the build, like copy/paste the function into the build file.

Just need to do is control the functions to inject to avoid duplicate code and use the same variables name to keep the naming reference.

I think that this is the approach following by lodash-cli because I see that each module have the same code that the full library (including documentation!)

Also I think that this new approach resolve the problem to load the library using ES6 modules interface

@megawac
Copy link
Collaborator

megawac commented Dec 21, 2015

I just want to prefice with great start and thanks for attempting to do this. As @aearly mentioned there has been quite a bit of planning on how this should be done (I'll elab when I'm off my phone). anyway, the most significant amount of work here is developing a build tool to do some grunt work for rollup to minimize compiled asyncjs code. A publish script for npm should also be configured which would enable users to do require(async/each). Managing custom npm repos would be a possibility as well. Overall, the build yool would be responsible for significantly less than the scope of lodashs cli and probably remain internal

I started doing some of this in October but have been very busy in school and work. If you want to continue the PR I'll message you on gitter about some specific changes. I will also have some time to assist/lead on this over the next week or 2. The most significant change off the bat is that all the code should live in this repo, npm modules can be made on build (possibly).

I'm a bit torn on how documentation should be done. Splitting out readme into modular folders is probably alright but I'd prefer jsdoc generated documentation.

Take a look at ramda/ramda repo, which has src file organization similar to what I was proposing.

megawac added a commit that referenced this pull request Dec 23, 2015
megawac added a commit that referenced this pull request Dec 29, 2015
@megawac
Copy link
Collaborator

megawac commented Dec 29, 2015

My WIP is occurring on this branch(https://github.com/caolan/async/tree/modularization) - an example of the bundles are here https://github.com/caolan/async/tree/modularization/build. Still need to develop scripts to publish the bundles to npm/git tags.

@aearly
Copy link
Collaborator

aearly commented Jan 2, 2016

Looks good @megawac . I'm really happy with how the main bundle looks. 😮

One thing I noticed -- the new build/async-bundle.js is about 7kb more minified than the old dist/async.js. Most of it seems to be coming from lodash methods -- looks like its a consequence of handling many more edge cases. (e.g. range calls toNumber, which is very robust -- but just +num might be good enough.)

I also noticed that the modularized methods use exports.default = foo meaning you'd have to var each = require("async/each").default. Is there a way to make it so a export default function each () {...} gets compiled to module.exports = function each() {...}?

@jdalton
Copy link
Contributor

jdalton commented Jan 2, 2016

@aearly

One thing I noticed -- the new build/async-bundle.js is about 7kb more minified than the old dist/async.js.

With webpack and its webpack.NormalModuleReplacementPlugin you can swap modules like toNumber out with identity or a custom mocked module if needed. This is how the Redux is able to stay ~2kb. For example it replaces:

plugins: [
    ['lodash/internal/baseIteratee.js', './toFunction.js'],
    ['lodash/object/keys.js',           '../internal/baseKeys.js'],
    ['lodash/object/keysIn.js',         '../internal/baseKeysIn.js']
  ].map(function(pair) {
    var resourceRegExp = RegExp('\\b' + _.escapeRegExp(pair[0]) + '$');
    return new webpack.NormalModuleReplacementPlugin(resourceRegExp, pair[1]);
  }).concat(
    new webpack.optimize.OccurenceOrderPlugin()
  )

In the case of Redux the big win was replacing baseIteratee with toFunction as that removed iteratee shorthand support which carries with it isEqual (super heavy).

The modularized bundle is 7.36 kB vs. its current 4.27 kB so that's a 3.09 kB diff to work on.

It looks like a big chunk is coming from lodash v4's toArray which supports emojis, surrogate pairs, region joiners, etc. Replacing lodash/lang/toArray in reduceRight with lodash/array/slice reduces the size to 6.64 kB gzipped so that's a 2.37 kB diff. Or use Array#slice instead of lodash/array/slice to get a size of 6.55 kB gzipped so that's 2.2 kB diff...

@megawac
Copy link
Collaborator

megawac commented Jan 3, 2016

Yeah, iteratee is already avoided being included the bundle as we import the base array methods and property rather than the core ones which automagically map the iteratee function. In the future it may be worth importing iteratee?

Some other things small things which can be avoided are baseRange over range, definitely dropping toArray, and importing baseProperty over property (to avoid deep support)

I'll open a WIP pull request to move discussion to the relevant issue.

@megawac megawac mentioned this pull request Jan 3, 2016
6 tasks
@Kikobeats
Copy link
Contributor Author

@megawac in order to have a unique point of view, should I close this PR and move the conversation into #996?

@megawac
Copy link
Collaborator

megawac commented Jan 10, 2016

Thanks @Kikobeats, this was a helpful stepping stone and quite useful for begining the other pr; we're going to close now in favour #996.

@megawac megawac closed this Jan 10, 2016
@aearly
Copy link
Collaborator

aearly commented Jan 10, 2016

Yeah, thanks for all you've done @Kikobeats ! If you want to help out more, you can submit PRs to the modularization branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

suggestion: internally split into separate module files to make optimised builds easier
6 participants