Proper way to handle the return value of `push()` in the `_transform()` implementation of a transform stream #1791

kimamula · 2019-02-23T11:28:07Z

Node.js Version: any
OS: any
Scope (install, code, runtime, meta, other?): code
Module (and version) (if relevant): stream

The document for transform._transform() says that "The transform.push() method may be called zero or more times" in it.

The backpressure document says that you must respect the return value of .push() and should stop calling it when it returns false.

However, it is not clear what I should do if .push() returns false when I have more data to push in the transform._transform() implementation.

The text was updated successfully, but these errors were encountered:

gireeshpunathil · 2019-06-21T08:15:28Z

I guess given that transformers are always intermediary streams (connected with neither the source nor the sink), the back pressure management does not come under its purview, instead a considerations for the connected streams at either end of the transformer.

cc @nodejs/streams to get an opinion.

MHebes · 2020-12-15T02:42:05Z

However, it is not clear what I should do if .push() returns false when I have more data to push in the transform._transform() implementation.

You should wait for a "drain" event before pushing the remainder of your data. Ensure you don't get more data by not calling the callback you get in _transform until you're finished pushing.

See this comment for a good example.

mcollina · 2020-12-15T10:19:22Z

However, it is not clear what I should do if .push() returns false when I have more data to push in the transform._transform() implementation.

I recommend to just call .push() for all the data that is produced within ._transform()

brucedjones · 2021-01-13T13:04:24Z

@mcollina this is directly contradictory to the comment from @MHebes, could you elaborate?

mcollina · 2021-01-13T18:05:20Z

@mcollina this is directly contradictory to the comment from @MHebes, could you elaborate?

A transform stream sits between two other streams. Essentially you are moving data: readable -> transform -> writable. Transform implements buffering on both sides. You have data coming in from readable inside _transform(), the alternative to using .push() is to buffer manually: however why implement one more level of buffering? Just use the one provided by node core. The return value of .push() is for Readable direct implementors, not Transform users.

I don't know how much there is to elaborate here. FYI, I'm one of the maintainers of streams themselves.

sfriesel · 2021-01-13T18:49:57Z

however why implement one more level of buffering?

@mcollina Because a transform does not just move data, it can also inflate it. A decompressor can increase the amount of data by orders of magnitude, so by having the writing side control the flow, backpressure control goes out the window. See for example this bug: regular/unbzip2-stream#17. The transformer buffers ~1MB of input to be sure to get at least one compressed, full block. But when the stream is ending that last megabyte may turn out to be hundreds of MB worth of compressed data, all of which has to be pushed.

sfriesel · 2021-01-13T19:14:57Z

Or as another example: decompression bomb detection. The internal zlib transformer gained the ability to limit output size (nodejs/node@27253, nodejs/node#33516). If backpressure on the Readable side was properly bounded, the consumer of any type of Transform stream could protect itself against decompression bombs. In zlib the maximum compression ratio is ~1:1000; it is way higher in more effective compression formats.

mcollina · 2021-01-13T20:34:07Z

@sfriesel I stand by my recommendation in the generic case. Unless you have some very specific problem you want to solve, stay away from adding another level of buffering because you are likely to implement it wrongly. In the few cases this is desperately needed, there is .unshift() to push back data upstream and avoid implementing yet another level of buffering.

Given your expertise on the subject, would you like to work on a few PRs to help with this?

dobesv · 2023-12-06T02:09:08Z

Another case where this seems to be a problem: adaltas/node-csv#408

It doesn't seem like there's actually any way to implement the back pressure yourself without reimplementing Transform completely.

I wonder if transform could somehow be augmented to call _transform again if it previously returned false from push. It would have to work after _flush as well somehow.

mcollina closed this as completed Jan 13, 2021

Zip753 mentioned this issue Jun 14, 2021

Bugfix/pl 1797 aedes persistence redis scalability fix bolteu/aedes-persistence-redis#1

Merged

dobesv mentioned this issue Dec 6, 2023

csv-parse stream will process the whole buffer even with back pressure. adaltas/node-csv#408

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper way to handle the return value of `push()` in the `_transform()` implementation of a transform stream #1791

Proper way to handle the return value of `push()` in the `_transform()` implementation of a transform stream #1791

kimamula commented Feb 23, 2019

gireeshpunathil commented Jun 21, 2019

MHebes commented Dec 15, 2020

mcollina commented Dec 15, 2020

brucedjones commented Jan 13, 2021

mcollina commented Jan 13, 2021

sfriesel commented Jan 13, 2021

sfriesel commented Jan 13, 2021

mcollina commented Jan 13, 2021

dobesv commented Dec 6, 2023

Proper way to handle the return value of push() in the _transform() implementation of a transform stream #1791

Proper way to handle the return value of push() in the _transform() implementation of a transform stream #1791

Comments

kimamula commented Feb 23, 2019

gireeshpunathil commented Jun 21, 2019

MHebes commented Dec 15, 2020

mcollina commented Dec 15, 2020

brucedjones commented Jan 13, 2021

mcollina commented Jan 13, 2021

sfriesel commented Jan 13, 2021

sfriesel commented Jan 13, 2021

mcollina commented Jan 13, 2021

dobesv commented Dec 6, 2023

Proper way to handle the return value of `push()` in the `_transform()` implementation of a transform stream #1791

Proper way to handle the return value of `push()` in the `_transform()` implementation of a transform stream #1791