New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak when stream is not read to the end. #2313
Comments
Thanks for opening an issue @erik-induro. A few followup questions when you say 'pop' are you seeing an OOM problem? How large are these files you are reading and what is the typical number of lines you are reading? What are the resources of the underlying machine you are running this on? Edit: Have you tried turning off validation when creating the read stream and see if that changes what you are seeing? When using a ranged read like your second example, validation is automatically disabled. |
@ddelgrosso1 When I say 'pop' I just mean that it got it a weird state and wouldn't report telemetry anymore and was fairly unresponsive and new instances would spin up and I have to manually terminate it. We are running in app engine. Hard to describe but it isn't particularly relevant as the symptom could change depending on how people are using the library. The machine has 4cpu and ~ 26 GB of ram but the machine dies at ~ 5 GB. We use this flag I have not tried turning off validation. I can give that a shot. |
@ddelgrosso1 I tested turning off validation with the original approach and still saw the memory grow and not release. |
@erik-induro thanks for the update. I will dig into this and try and figure out what is going on. |
Dug into this a bit today and was not successful in recreating the behavior. First, I created a file containing 750,000 lines and uploaded it to my bucket. I ran the script below which reads a random number of lines on each iteration. I collected metrics utilizing @erik-induro is there anything else you might be able to provide to recreate the behavior you are seeing? import { Storage } from "@google-cloud/storage";
import readline from 'readline';
const s = new Storage({projectId: 'my-project'});
const b = s.bucket('my-bucket');
const f = b.file('really-large-txt');
function readLines(file, numLines) {
const readStream = file.createReadStream();
const lineReader = readline.createInterface(readStream);
let count = 0;
const lines = [];
return new Promise((res, rej) => {
lineReader.on('line', line => {
count++;
lines.push(line);
if (count >= numLines) {
lineReader.close();
}
});
lineReader.on('error', e => {
readStream.destroy();
rej(e);
});
lineReader.on('close', () => {
readStream.destroy();
res(lines);
})
});
}
async function testMemory() {
for(let i = 0; i < 500; i++) {
console.log(`Iteration: ${i}`);
const rand = Math.floor(Math.random() * 750001);
const r = await readLines(f, rand);
}
console.log('Done!');
}
await testMemory(); |
@ddelgrosso1 thanks for taking a look at it. Also I like that I copied your test and ran it locally. I am using a ~17k line file. I removed the I think it has less to do with the number of lines read and more to do with just opening and not reading to completion. I wonder if your spikes and drops are based on the lines being read into memory and that array being dumped and that is distracting from the slow growth that still appears in the baseline of the test. I ran the test for 2k iterations (which took a hot minute btw...). Here is the chart from that run: |
@erik-induro thanks for the update. I will modify and try your suggestion tomorrow and let you know what turns up. |
I believe I know what is going on but still trying to definitively confirm with further tests. When |
So my previous comment was partially wrong. In this situation when the stream isn't fully consumed and is destroyed it does indeed cause an I believe that this is the same issue raised here node-fetch/node-fetch#1762. In order to fix this I added code to the pipeline handler in After adding the code to destroy the agent I ran another 2000 iteration test. Now both memory and active handles look much more reasonable. |
Environment details
@google-cloud/storage
version: 6.9.5Steps to reproduce
The following script reads the first N lines of a file and returns them. I rewrote this same function with multiple 3rd party libraries and other native approaches. The issues is not the
lineReader
.Even though I call
destroy()
it does not sufficiently clean up the resources. The heap size continues to increase on each execution till the memory on the server pops. I profiled this usingnode --inspect
and sawTyped Arrays
increase on each run and never decrease.My workaround was to specify a smaller portion of the file to pull in, read it to completion, and extract the lines I care about.
When I did this and profiled I saw that
Typed Arrays
stayed a constant size.Expected
I expect
destroy()
to clean up resources for the stream when it is not read to the end so that it does not cause a memory leak.The text was updated successfully, but these errors were encountered: