Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: worker pool that can execute tasks on all workers #32120

Merged
merged 13 commits into from Jun 29, 2021
Merged
2 changes: 1 addition & 1 deletion integration-tests/gatsby-cli/package.json
Expand Up @@ -6,7 +6,7 @@
},
"license": "MIT",
"scripts": {
"test": "jest"
"test": "jest -w 1"
},
"devDependencies": {
"babel-jest": "^24.0.0",
Expand Down
3 changes: 3 additions & 0 deletions packages/gatsby-worker/.babelrc
@@ -0,0 +1,3 @@
{
"presets": [["babel-preset-gatsby-package"]]
}
2 changes: 2 additions & 0 deletions packages/gatsby-worker/.gitignore
@@ -0,0 +1,2 @@
/node_modules
/dist
36 changes: 36 additions & 0 deletions packages/gatsby-worker/.npmignore
@@ -0,0 +1,36 @@
# Logs
logs
*.log

# Runtime data
pids
*.pid
*.seed

# Directory for instrumented libs generated by jscoverage/JSCover
lib-cov

# Coverage directory used by tools like istanbul
coverage

# Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files)
.grunt

# node-waf configuration
.lock-wscript

# Compiled binary addons (http://nodejs.org/api/addons.html)
build/Release

# Dependency directory
# https://www.npmjs.org/doc/misc/npm-faq.html#should-i-check-my-node_modules-folder-into-git
node_modules
*.un~
yarn.lock
src
flow-typed
coverage
decls
examples
.babelrc
tsconfig.json
125 changes: 125 additions & 0 deletions packages/gatsby-worker/README.md
@@ -0,0 +1,125 @@
# gatsby-worker

Utility to execute tasks in forked processes. Highly inspired by [`jest-worker`](https://www.npmjs.com/package/jest-worker).

## Example

File `worker.ts`:

```ts
export async function heavyTask(param: string): Promise<string> {
// using workers is ideal for CPU intensive tasks
return await heavyProcessing(param)
}

export async function setupStep(param: string): Promise<void> {
await heavySetup(param)
}
```

File `parent.ts`

```ts
import { WorkerPool } from "gatsby-worker"

const workerPool = new WorkerPool<typeof import("./worker")>(
require.resolve(`./worker`),
{
numWorkers: 5,
env: {
CUSTOM_ENV_VAR_TO_SET_IN_WORKER: `foo`,
},
}
)

// queue a task on all workers
const arrayOfPromises = workerPool.all.setupStep(`bar`)

// queue a task on single worker
const singlePromise = workerPool.single.heavyTask(`baz`)
```

## API

### Constructor

```ts
// TypeOfWorkerModule allows to type exposed functions ensuring type safety.
// It will convert sync methods to async and discard/disallow usage of exports
// that are not functions. Recommended to use with `<typeof import("path_to_worker_module")>`.
const workerPool = new WorkerPool<TypeOfWorkerModule>(
// Absolute path to worker module. Recommended to use with `require.resolve`
workerPath: string,
// Not required options
options?: {
// Number of workers to spawn. Defaults to `1` if not defined.
numWorkers?: number
// Additional env vars to set in worker. Worker will inherit env vars of parent process
// as well as additional `GATSBY_WORKER_ID` env var (starting with "1" for first worker)
env?: Record<string, string>
}
)
```

### `.single`

```ts
// Exports of the worker module become available under `.single` property of `WorkerPool` instance.
// Calling those will either start executing immediately if there are any idle workers or queue them
// to be executed once a worker become idle.
const singlePromise = workerPool.single.heavyTask(`baz`)
```

### `.all`

```ts
// Exports of the worker module become available under `.all` property of `WorkerPool` instance.
// Calling those will ensure a function is executed on all workers. Best usage for this is performing
// setup/bootstrap of workers.
const arrayOfPromises = workerPool.all.setupStep(`baz`)
```

### `.end`

```ts
// Used to shutdown `WorkerPool`. If there are any in progress or queued tasks, promises for those will be rejected as they won't be able to complete.
const arrayOfPromises = workerPool.end()
```

### `isWorker`

```ts
// Determine if current context is executed in worker context. Useful for conditional handling depending on context.
import { isWorker } from "gatsby-worker"

if (isWorker) {
// this is executed in worker context
} else {
// this is NOT executed in worker context
}
```

## Usage with unit tests

If you are working with source files that need transpilation, you will need to make it possible to load untranspiled modules in child processes.
This can be done with `@babel/register` (or similar depending on your build toolchain). Example setup:

```ts
const testWorkerPool = new WorkerPool<WorkerModuleType>(workerModule, {
numWorkers,
env: {
NODE_OPTIONS: `--require ${require.resolve(`./ts-register`)}`,
},
})
```

This will execute additional module before allowing adding runtime support for new JavaScript syntax or support for TypeScript. Example `ts-register.js`:

```js
// spawned process won't use jest config (or other testing framework equivalent) to support TS, so we need to add support ourselves
require(`@babel/register`)({
extensions: [`.js`, `.ts`],
configFile: require.resolve(relativePathToYourBabelConfig),
ignore: [/node_modules/],
})
```
41 changes: 41 additions & 0 deletions packages/gatsby-worker/package.json
@@ -0,0 +1,41 @@
{
"name": "gatsby-worker",
"description": "Utility to create worker pools",
"version": "0.0.0-next.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want to start this at 1.0.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early on I would prefer to keep it <1 as we might quickly introduce breaking changes as we start using worker pool and discover new needs or APIs (that's what <1 is for). 1 is when API becomes actually stable enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how this will play with lerna minor bump - I think it should do 0.1.0-next.0 for the next branch cut? If so, this sounds good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/gatsbyjs/gatsby/commits/master/packages/gatsby-plugin-graphql-config/package.json is example of 0.X.Y package we have and it does bump minor for each release. We want gatsby-worker to be actual dep only of gatsby core package - if we will need it in gatsby-cli for structured logs, we will import it from gatsby/internal there. Messaging (but this will only need messaging and not queueing up tasks)

"author": "Michal Piechowiak<misiek.piechowiak@gmail.com>",
"bugs": {
"url": "https://github.com/gatsbyjs/gatsby/issues"
},
"dependencies": {
"@babel/core": "^7.14.0"
},
"devDependencies": {
"@babel/cli": "^7.14.0",
"@babel/register": "^7.14.0",
"babel-preset-gatsby-package": "^1.9.0-next.0",
"cross-env": "^7.0.3",
"rimraf": "^3.0.2",
"typescript": "^4.1.5"
},
"homepage": "https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby-worker#readme",
"keywords": [
"gatsby",
"worker"
],
"license": "MIT",
"main": "dist/index.js",
"repository": {
"type": "git",
"url": "https://github.com/gatsbyjs/gatsby.git",
"directory": "packages/gatsby-worker"
},
"scripts": {
"build": "babel src --out-dir dist/ --ignore \"**/__tests__\" --extensions \".ts,.js\"",
"prepare": "cross-env NODE_ENV=production npm run build && npm run typegen",
"watch": "babel -w src --out-dir dist/ --ignore \"**/__tests__\" --extensions \".ts,.js\"",
"typegen": "rimraf \"dist/**/*.d.ts\" && tsc --emitDeclarationOnly --declaration --declarationDir dist/"
},
"engines": {
"node": ">=12.13.0"
}
}
34 changes: 34 additions & 0 deletions packages/gatsby-worker/src/__tests__/fixtures/test-child.ts
@@ -0,0 +1,34 @@
export function sync(a: string, opts?: { addWorkerId?: boolean }): string {
return `foo ${a}${opts?.addWorkerId ? ` (worker #${process.env.GATSBY_WORKER_ID})` : ``}`
}

export async function async(a: string, opts?: { addWorkerId?: boolean }): Promise<string> {
return `foo ${a}${opts?.addWorkerId ? ` (worker #${process.env.GATSBY_WORKER_ID})` : ``}`
}

export function neverEnding(): Promise<string> {
return new Promise<string>(() => {})
}

export const notAFunction = `string`

export function syncThrow(a: string, opts?: { addWorkerId?: boolean, throwOnWorker?: number }): string {
if (!opts?.throwOnWorker || opts?.throwOnWorker?.toString() === process.env.GATSBY_WORKER_ID) {
throw new Error(`sync throw${opts?.addWorkerId ? ` (worker #${process.env.GATSBY_WORKER_ID})` : ``}`)
}

return `foo ${a}${opts?.addWorkerId ? ` (worker #${process.env.GATSBY_WORKER_ID})` : ``}`
}

export async function asyncThrow(a: string, opts?: { addWorkerId?: boolean, throwOnWorker?: number }): Promise<string> {
if (!opts?.throwOnWorker || opts?.throwOnWorker?.toString() === process.env.GATSBY_WORKER_ID) {
throw new Error(`async throw${opts?.addWorkerId ? ` (worker #${process.env.GATSBY_WORKER_ID})` : ``}`)
}

return `foo ${a}${opts?.addWorkerId ? ` (worker #${process.env.GATSBY_WORKER_ID})` : ``}`
}

// used in task queue as previous functions would be too often too fast
export async function async100ms(taskId: number, opts?: { addWorkerId?: boolean }): Promise<{taskId: number, workerId: string}> {
return new Promise(resolve => setTimeout(resolve, 100, {taskId, workerId: opts?.addWorkerId ? process.env.GATSBY_WORKER_ID : undefined}))
}
6 changes: 6 additions & 0 deletions packages/gatsby-worker/src/__tests__/fixtures/ts-register.js
@@ -0,0 +1,6 @@
// spawned process won't use jest config to support TS, so we need to add support ourselves
require(`@babel/register`)({
extensions: [`.js`, `.ts`],
configFile: require.resolve(`../../../.babelrc`),
ignore: [/node_modules/],
})