Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow passing generators to the std* options #693

Merged
merged 12 commits into from
Jan 16, 2024
Merged

Allow passing generators to the std* options #693

merged 12 commits into from
Jan 16, 2024

Conversation

ehmicky
Copy link
Collaborator

@ehmicky ehmicky commented Jan 16, 2024

Fixes #20 (second most upvoted issue). Also fixes #21 and fixes #121 (third most upvoted issue).

This allows transforming a child process' input and ouput by passing an async generator to the stdin/stdout/stderr/stdio option.

const transform = async function * (chunks) {
	for await (const chunk of chunks) {
		yield chunk.toUpperCase();
	}
};

const {stdout} = await execa('echo', ['hello'], {stdout: transform});
console.log(stdout); // HELLO

Using generators is much simpler for users than creating web or Node.js Transform streams (which are used under-the-hood). Unlike regular functions, it still enables init and flushing logic (before and after the for await loop, which can be particularly helpful (and necessary for issues like #121).

IMHO There are many use cases for this. I have been needed this feature myself in production several times. For example, I've been needed to use stdout: 'inherit' but filtering secret values. This is now easy by using the { stdout: [filterSecretTransform, 'inherit'] } option. I think the fact that those are among our most upvoted issues is telling.

This PR intentionally currently leaves out two points for the moment:

For a fuller description of the feature from a user standpoint, I have created the following documentation.

index.d.ts Show resolved Hide resolved
@sindresorhus
Copy link
Owner

This is super cool and very useful 🎉

index.d.ts Show resolved Hide resolved
index.d.ts Outdated Show resolved Hide resolved
// Therefore, there is no need to allow Node.js or web transform streams.
// The `highWaterMark` is kept as the default value, since this is what `childProcess.std*` uses.
// We ensure `objectMode` is `false` for better buffering.
// Chunks are currently processed serially. We could add a `concurrency` option to parallelize in the future.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a /* */ comment as it makes it easier to edit it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in daf5b9c

lib/stdio/generator.js Show resolved Hide resolved
// - The readable side propagate the generator's error
// In order for the later to win that race, we need to wait one microtask.
const destroyPassThrough = callbackify(async error => {
await setTimeout(0);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use queueMicrotask?

Copy link
Collaborator Author

@ehmicky ehmicky Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for one microtask (queueMicrotask()) works.

I am waiting for one macrotask instead (setTimeout()) to be on the safe side, in case the Node.js core logic changes in the future. The race condition this is solving relies on multiple core Node.js functions' implementation (I had to dig in to understand what was going on :) ), specifically:

The first two rely on async/await, so if the core Node.js source code added an additional await in that logic for unrelated reasons, it is possible that it would impact that race condition.

To clarify the underlying issue: when the user-supplied generator throws, two errors are propagated in a race condition:

  • The Duplex writable part (the PassThrough) is aborted, which throws an AbortError that is very generic and unhelpful for users
  • The Duplex readable part (wrapped by Readable.from()) propagated the original error

So we are slowing down the first part by adding some delay to the PassThrough's destroy() method. This delays destroying the generator's underlying PassThrough slightly longer, but that should not be an issue.

This PR includes a unit test for this specific case. However, I think we might want to be on the safe side and wait for one macrotask instead of a microtask? What do you think?

Side note: I have changed the code to use callbacks instead of async/await. I always avoid this in general, but in this specific case, the destroy() method is callback-based, and all we are doing is adding a setTimeout(), with no possible uncaught exceptions, so it does not seem necessary to introduce the indirection of util.callbackify().

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order for the later to win that race, we need to wait one microtask.

The code comment is incorrect then, since it says "microtask". I would expand a little bit on the comment (as succinctly as possible) there on why a macrosecond is used.

Copy link
Collaborator Author

@ehmicky ehmicky Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 1f9205b 👍

The transform can `yield` either a `string` or an `Uint8Array`, regardless of the `chunks` argument's type.

## Filtering

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users must be careful hereful here. Their secret could potentially be split over multiple chunks. This example would work better if the chunk was a line.

Copy link
Collaborator Author

@ehmicky ehmicky Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely, great catch!
I did think of it as well. I have an upcoming PR which will provide with line-wise splitting for the generators chunks. This will solve that problem.
In general, mapping text (which is most likely the main use case) is quite unhelpful without proper line-splitting. But I'm about to fix this :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the following issue for it: #695

docs/transform.md Show resolved Hide resolved
ehmicky and others added 4 commits January 16, 2024 17:49
Co-authored-by: Sindre Sorhus <sindresorhus@gmail.com>
Co-authored-by: Sindre Sorhus <sindresorhus@gmail.com>
import {setTimeout} from 'node:timers/promises';

/*
* Generators can be used to transform/filter standard streams.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the * prefixes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 295f63a

@ehmicky
Copy link
Collaborator Author

ehmicky commented Jan 16, 2024

Ready for another review! 👍
That PR was lots of fun, and taught me a few things :)

@sindresorhus
Copy link
Owner

#693 (comment)

@ehmicky
Copy link
Collaborator Author

ehmicky commented Jan 16, 2024

Fixed 👍

@sindresorhus sindresorhus merged commit e8f6bbd into main Jan 16, 2024
14 checks passed
@sindresorhus sindresorhus deleted the transforms branch January 16, 2024 20:03
@ehmicky ehmicky mentioned this pull request Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants