New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage issue with stream-transform #361
Comments
Did you resolve this? I'm seeing similar issues. |
I didn't have the time to look at the issue yet. |
I have on my fork a minimum working example of this bug. It looks like the transform is breaking Node's built-in backpressure logic by ignoring the return value from To see this in action, check out the branch linked above, change directory to |
I reproduce your code sample. With the latest source code, the memory usage stays between 20MB and 30MB on a 30GB generated CSV file. Maybe some change since your report fixed the issue. Do you confirm? |
Still not working on my repro: import {transform} from 'stream-transform';
import {pipeline, Readable, Writable} from "stream";
class DummyData extends Readable {
constructor() {
super();
this.numReads = 0;
}
_read() {
// Push incrementing values forever
this.push(JSON.stringify({'string': 'read_' + this.numReads}));
this.numReads++;
}
}
class Stopper extends Writable {
constructor() {
super({
objectMode: true,
highWaterMark: 1, // Allow just one item in buffer; apply backpressure otherwise
});
}
// Accept chunks extremely slowly; discard the chunk data
_write(chunk, encoding, callback) {
console.log('wrote one out');
setTimeout(callback, 1000);
}
}
pipeline(
new DummyData(),
transform(data => data), // Comment out this line, the test runs forever. Leave it in, run out of memory pretty quick.
new Stopper(),
() => { },
); |
@dmurvihill Could you have a look at the latest release of stream-transform version 3.2.10. It now takes the return value from |
Hey, thanks for being so attentive to this issue! It looks like that does respect backpressure and pause the stream, but how does it get unpaused? |
I didn't reproduce the pausing behavior. I will need to dig more into it. Any change you could reproduce the pausing in my sample ? |
Describe the bug
When using
stream-transform
for processing large datasets and theparallel
option is a value greater than1
, we're seeing high memory usage.To Reproduce
Additional context
this.push
was returningfalse
to indicate that the stream should pause reading, yet stream-transform asks for more input over here regardless.The text was updated successfully, but these errors were encountered: