Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide setup/bootstrap API for workers #136

Closed
pieh opened this issue Jun 16, 2021 · 5 comments
Closed

Provide setup/bootstrap API for workers #136

pieh opened this issue Jun 16, 2021 · 5 comments

Comments

@pieh
Copy link

pieh commented Jun 16, 2021

As far as I can tell currently there is no builtin way/API to execute something on all created so far workers.

I have use case where tasks that should be distrubuted to workers require some setup ahead of time. Doing this setup on first actual task is not ideal because setup takes some time, so it would delay results. If I could trigger some setup function at arbitrary times that would execute on all existing workers this would mean that I can execute actual tasks right away - there is additional delay from when I could bootstrap workers to when I can distribute actual tasks so that delay could be nicely utilized so workers instead of being idle can already start preparing for tasks.

Some pseudo-code in main thread to illustrate it:

const workerPool = new Piscina(args)

// we are ready to "warmup workers"
workerPool.runAll(setupArgs1, { name: `setupStep1` })

// main thread continues with other work needed to prepare actual tasks
// this might/will take some time

// main process progressed and now workers can execute another part of setup
workerPool.runAll(setupArgs2, { name: `setupStep2` }) 

// more work on main thread while workers are bootstrapping

// finally we have tasks ready and we can distribute them to workers 
// (that might be already bootstraped or they are still bootstraping 
// in that case bootstrap need to finish on worker before it can execute task
for (const task of tasks) {
  workerPool.run(task, { name: `runActualTask`})
}

Without some API for this I do think it's possible using BroadcastChannel or MessagePort per worker (not sure yet how, just noticed mention of this in #104) so I could message all workers to do some work + actual task handlers would need to ensure they would wait for that custom work to finish before they start executing, but this seems a bit messy or at least contrasting to otherwise nice API surface that is already exposed by priscina.

What are your thoughts on supporting setups like that?

@jasnell
Copy link
Collaborator

jasnell commented Jun 17, 2021

One approach for this would be to pass in workerData when the Piscina instance is created. The workerData is cloned to each worker when it is created and can be used while the worker is initializing. Combined with the worker's ability to initialize asynchronously that should give you what you need.

// main thread
const workerPool = new Piscina({ filename: '...', workerData: setupArgs });
// worker
const { workerData } = require('worker_threads');

async function init() {
  // Do setup with workerData
  return (task) => { /** worker handler **/ };
}

module.exports = init();

@pieh
Copy link
Author

pieh commented Jun 17, 2021

Hey @jasnell, thanks for reply!

I think your suggested snippet would work for quite well for "single step setup" quite nicely (and also handle wokers created at later time already set up by default).

My main ask was however about "multi step setup", so I could do parts of setup work in workers as soon as possible (arguments for those steps are not constants, they are being generated as pipeline move forwards. With single step I would have to wait with any setup until I have all needed pieces already available. The main problem with this for me is that setup/bootstrap steps in general are not "instant" or even reasonably quick (and annoyingly enough things that setup does ideally would be just shared, but they are just not serializable ... sigh)

Silly illustration (that is approximation) of how my pipeline would look like with single setup step:

  • blue border - setup that needs to execute on each worker before it can run actual tasks
  • green border - actual tasks

Untitled Diagram-Page-2 (1)

And here's how I see being able to exec setup in multiple steps (at least for minWorkers amount of workers, if more workers are created later then of course those would need to exec all the setup steps when created serially similar as single step setup would work):
Untitled Diagram-Page-1 (1)

The timing of each step is completely arbitrary in those charts - we don't have absolute control of it as we allow user's configuration and to extend things - so in some scenarios it's quite feasible that there would be no real benefits with multi step setup (however we do try to optimize for those "heavy"/demanding projects right now

@pieh
Copy link
Author

pieh commented Jun 25, 2021

Closing this issue, as we decided that for our needs it will be just easier to roll our own solution.

@pieh pieh closed this as completed Jun 25, 2021
@rubiin
Copy link

rubiin commented Jun 25, 2021

@pieh please also share your solution here along with some code so it will be easier for people who come across similar situation

@pieh
Copy link
Author

pieh commented Jun 29, 2021

@rubiin Our solution just doesn't include use of piscina - by "roll our own solution" I meant writing alternative utility that does expose functionality we need, so not sure if code snippets on how to write "worker pool/farm" will be helpful for other users. Checking source code of piscina (or other packages that perform similar function) would be probably better suggestion for someone looking to write "their own". We also didn't use piscina yet - we were checking available solutions to see if there is one that would fit our needs.

In any case gatsbyjs/gatsby#32120 is currently worked on piece of code for anyone interested tho it is far from piscina as it has a bit different API, is not using worker_threads and doesn't have "on demand" spawning of workers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants