Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JavaScript heap out of memory on version 5.10.0 and above #9864

Closed
xeroxoid opened this issue Jan 26, 2021 · 15 comments
Closed

JavaScript heap out of memory on version 5.10.0 and above #9864

xeroxoid opened this issue Jan 26, 2021 · 15 comments
Labels
confirmed-bug We've confirmed this is a bug in Mongoose and will fix it.
Milestone

Comments

@xeroxoid
Copy link

Do you want to request a feature or report a bug?

bug

What is the current behavior?

On operations that use a cursor, after some time the following error occurs:
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

If the current behavior is a bug, please provide the steps to reproduce.

function queryCursor() {
  return Answer.find({})
    .sort({ _id: -1 })
    .populate({
      path: 'business',
      select: 'name pools tags',
      model: Business,
      populate: [{ path: 'tags', select: 'name', model: Tag }],
    })
    .populate({ path: 'views', select: 'role', model: User })
    .populate({
      path: 'createdBy',
      select: 'name username role tags pools',
      model: User,
      populate: [
        { path: 'tags', select: 'value type', model: Tag },
        { path: 'pools', select: 'name', model: Pool },
      ],
    })
    .populate({ path: 'topic', select: 'subject', model: Topic })
    .populate({ path: 'question', select: 'views type', model: Question })
    .cursor();
}

As seen on the snippet above many document properties have to be populated (this is used to generate a specific report).
After some time the error noted above is generated and the process quits.

What is the expected behavior?

The process should complete.
The thing is that with any version below 5.10.0, i.e. 5.9.29 (which we have reverted to) the same code works fine.
I would guess it has to do with the "upgrade to MongoDB driver 3.6 for full MongoDB 4.4 support", but have no proof on that.

What are the versions of Node.js, Mongoose and MongoDB you are using? Note that "latest" is not a version.

NodeJS: v14.15.4
Mongoose: 5.10.x (any version starting at 5.10.x)
MongoDB: v4.2.9

@vkarpov15
Copy link
Collaborator

It'll be difficult for us to try to repro this without a data dump. But one way you can work around this is to reduce the cursor's batchSize():

return Answer.find({}).batchSize(10).sort(...).populate(...).cursor();

That will reduce the memory usage

@vkarpov15 vkarpov15 added the help This issue can likely be resolved in GitHub issues. No bug fixes, features, or docs necessary label Jan 28, 2021
@xeroxoid
Copy link
Author

xeroxoid commented Feb 3, 2021

Thanks. But the issue is not the code itself, it is that the exact same codebase works (i.e. does not throw an Out of Memory error) with versions up to 5.9.29 but not on 5.10.x

I assume that something might have changed on how embedded documents are handled (i.e. population of those) or it is an issue with the updated driver (i.e. feat: upgrade to MongoDB driver 3.6 for full MongoDB 4.4 support) of 5.10.0. I do not see anything else that might be relevant in the changelog between those 2 version, but I might be wrong.

Unfortunately the only help I can provide is pinpointing the issue to changes between these exact 2 versions.

@vkarpov15
Copy link
Collaborator

Can you confirm whether reducing the batchSize removes the error? That would help us narrow down this issue.

One possibility is that this is caused by #8380

@vkarpov15 vkarpov15 added this to the 5.11.16 milestone Feb 8, 2021
@vkarpov15 vkarpov15 added needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue and removed help This issue can likely be resolved in GitHub issues. No bug fixes, features, or docs necessary labels Feb 8, 2021
@rsnr
Copy link

rsnr commented Feb 9, 2021

I encountered a potentially similar issue:
The following code works fine for mongoose 5.4.11
once mongoose is updated to 5.11.15 the app crashes with heap out of memory:
"FATAL ERROR: MarkCompactCollector: young object promotion failed Allocation failed - JavaScript heap out of memory"
Adding .batchSize(10) as requested by @vkarpov15 doesn't help: still get heap out of memory.

let position=0;
let roundSize = 1000;
let readCounter = 0;
MyModel.countDocuments({}, function(err, totalRecords) {
     async.whilst(
        function() { return position < totalRecords; }, // async v2.x format
        function iter(callback) {
            MyModel.find({}, 'field1 field2 field3').sort({timestamp: -1})
            .skip(position).limit(roundSize).lean().exec(function(err, allFound) {
                if (err) { console.log(err) }
                else {
                    /// do something
                }
                callback(null, position);
            });
            position += roundSize;
        },
        function (err, n) {
            if (err) {}
            // ...
        }
    );
});

@xeroxoid
Copy link
Author

xeroxoid commented Feb 10, 2021

@vkarpov15 I tried with the latest mongoose version i.e. 5.11.15 using .batchSize(10) as suggested but still get the same error after a while.


[48906:0x1046d0000]   165322 ms: Mark-sweep (reduce) 4094.0 (4104.0) -> 4093.1 (4104.2) MB, 2550.2 / 0.0 ms  (average mu = 0.061, current mu = 0.001) allocation failure scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

At the moment we have reverted back to 5.9.29 in production where this and other similar report scripts work flawlessly.

@vkarpov15 vkarpov15 modified the milestones: 5.11.16, 5.11.17, 5.11.18 Feb 12, 2021
@vkarpov15
Copy link
Collaborator

@xeroxoid how are you iterating through the cursor? async iterator, eachAsync(), something else?

@rsnr can you please provide more info on what /// do something looks like? It is possible that you're running out of memory because you're not waiting for /// do something to finish before calling callback()

@vkarpov15 vkarpov15 added needs clarification This issue doesn't have enough information to be actionable. Close after 14 days of inactivity and removed needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue labels Feb 22, 2021
@vkarpov15 vkarpov15 removed this from the 5.11.18 milestone Feb 22, 2021
@xeroxoid
Copy link
Author

xeroxoid commented Mar 2, 2021

@vkarpov15 we .pipe() the stream and transform each row with further data.

i.e.

/**
 * Report runner - processes big datasets row by row (via streams)
 * @param {Report} report the report to process.
 * @param {function} toOutput callback to pass processed data as it becomes ready
 * @param {function} progressTracker (optional) callback that gets invoked every 100 processed rows
 * @return {promise} Resolved when process is complete unless rejected by some error throughout the pipeline.
 */
async function streamPipelineRunner(report, toOutput, progressTracker = () => {}) {
  let sendHeaderRow = true;

  return new Promise((resolve, reject) => {
    const queryResultsStream = report.queryCursor();  //.  <---   this is the cursor returned by mongoose
    queryResultsStream.on('error', reject);

    const transformStream = applyTransformStream(report.rowTransform);
    transformStream.on('error', reject);

    queryResultsStream.pipe(pacer).pipe(transformStream); // <--- pacer is a setTimeout function to delay each batch and reduce CPU usage

    let rowIndex = 0;
    transformStream.on('data', row => {
      if (sendHeaderRow) {
        toOutput(Object.keys(row));
        sendHeaderRow = false;
      }
      toOutput(Object.values(row));
      if (rowIndex % triggerProgressCallbackEveryXLines === 0) {
        progressTracker(rowIndex);
      }
      rowIndex += 1;
    });

    transformStream.on('finish', () => {
      progressTracker(rowIndex);
      resolve();
    });
  });
}

....

Please note that transformStream can itself contain further mongoose queries or aggregations to enrich the data of each row returned.

@rsnr
Copy link

rsnr commented Mar 4, 2021

@xeroxoid how are you iterating through the cursor? async iterator, eachAsync(), something else?

@rsnr can you please provide more info on what /// do something looks like? It is possible that you're running out of memory because you're not waiting for /// do something to finish before calling callback()

Hi @vkarpov15 , the // do something section iterates the mongoose query results and adds some of them to global objects.
These objects get quite big. However, the exact same code has worked and still works perfectly fine for mongoose 5.4.11, and only crashes on heap out of memory when I update mongoose to 5.11.15.
Thanks for looking into it.

UPDATE:
I commented out "// do something" and tested it with mongoose latest version (5.11.18): same heap out of memory.
I then started backtracking mongoose versions (with the "// do something" commented out) and discovered that everything works fine at mongoose 5.7.14, and fails on heap out of memory as of mongoose 5.8.0

@vkarpov15 vkarpov15 added needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue and removed needs clarification This issue doesn't have enough information to be actionable. Close after 14 days of inactivity labels Mar 8, 2021
@vkarpov15 vkarpov15 modified the milestones: 5.11.20, 5.12.1 Mar 8, 2021
@vkarpov15 vkarpov15 modified the milestones: 5.12.1, 5.12.2, 5.12.3 Mar 18, 2021
@o-b-one
Copy link

o-b-one commented Mar 31, 2021

Issue identified on version 5.11.19, query is as following

const cursor = await MyModel.find({name}).populate('user').cursor({batchSize: 500});
for await(const data of cursor){console.log(data);}

reproduced also while using eachAsync

Edit:

We needed to revert back to version 5.7.14 in order to avoid this issue

@vkarpov15 vkarpov15 removed this from the 5.12.3 milestone Mar 31, 2021
@vkarpov15 vkarpov15 added this to the 5.12.4 milestone Mar 31, 2021
@rsnr
Copy link

rsnr commented Apr 12, 2021

@o-b-one : aligned with my issue: works fine at mongoose 5.7.14, and fails on heap out of memory as of mongoose 5.8.0 and onward.

@vkarpov15 vkarpov15 modified the milestones: 5.12.4, 5.12.5 Apr 15, 2021
@vkarpov15
Copy link
Collaborator

I did some digging and found that this is an issue with the fix for #8092.

mod.options._childDocs.push(val);
leads to Mongoose being unable to clean up populated documents in a cursor.

Below is the script I used to repro this:

'use strict';

const mongoose = require('mongoose');

mongoose.set('useFindAndModify', false);

const { Schema } = mongoose;

run().catch(err => console.log(err));

async function run() {
  await mongoose.connect('mongodb://localhost:27017/test', {
    useNewUrlParser: true,
    useUnifiedTopology: true
  });

  const Answer = mongoose.model('Answer', Schema({
    choice: String,
    question: { type: 'ObjectId', ref: 'Question' }
  }));

  const Question = mongoose.model('Question', Schema({
    text: String
  }));

  const numQs = await Question.countDocuments();
  if (numQs < 10000) {
    await Question.deleteMany();
    await Answer.deleteMany();
    for (let i = 0; i < 100000; ++i) {
      const q = await Question.create({ text: 'my question' });
      const answers = [];
      for (let j = 0; j < 10; ++j) {
        answers.push({ choice: 'B', question: q });
      }
      await Answer.create(answers);
      if (i % 100 === 0) console.log(i);
    }
  }

let i = 0;
  await Answer.find().populate('question').cursor().eachAsync((doc) => {
    if (i++ % 100 === 0) {
      console.log('Iterated', i);
    }
  });

  console.log('Done');
}

Sorry for the trouble, the fix will be in v5.12.5 👍

@vkarpov15 vkarpov15 added confirmed-bug We've confirmed this is a bug in Mongoose and will fix it. and removed needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue labels Apr 17, 2021
@vkarpov15 vkarpov15 reopened this Apr 28, 2021
@vkarpov15 vkarpov15 modified the milestones: 5.12.5, 5.12.7 Apr 28, 2021
@vkarpov15
Copy link
Collaborator

vkarpov15 commented Apr 28, 2021

@rsnr

let count = 0;
MyModel.find({}, 'field1 field2 field3').sort({timestamp: -1}).lean().cursor().eachAsync(doc => { ++count; ... }).then(() => {
  console.log('Done');
});

@vkarpov15
Copy link
Collaborator

vkarpov15 commented Apr 28, 2021

@rsnr I managed to confirm there's an issue where the below script succeeds on Mongoose 5.7.14, but crashes with OOM after about 31k documents in Mongoose 5.8.0

'use strict';

const async = require('async');
const mongoose = require('mongoose');

mongoose.set('useFindAndModify', false);

const { Schema } = mongoose;

run().catch(err => console.log(err));

async function run() {
  await mongoose.connect('mongodb://localhost:27017/test', {
    useNewUrlParser: true,
    useUnifiedTopology: true,
  });

  const Answer = mongoose.model('Answer', Schema({
    choice: String,
    question: { type: 'ObjectId', ref: 'Question', index: true }
  }, { timestamps: true }));
  
  const Question = mongoose.model('Question', Schema({
    text: String
  }));
  
  const numQs = await Question.countDocuments();
  if (numQs < 10000) {
    await Question.deleteMany();
    await Answer.deleteMany();
    for (let i = 0; i < 100000; ++i) {
      const q = await Question.create({ text: 'my question' });
      const answers = []; 
      for (let j = 0; j < 10; ++j) {
        answers.push({ choice: 'B'.repeat(1000), question: q });
      }
      await Answer.create(answers);
      if (i % 100 === 0) console.log(i);
    }
  }

    const numDocs = await Answer.countDocuments();
  let position = 0;
  let roundSize = 1000;
  async.whilst(
        function() { return position < numDocs; }, // async v2.x format
        function iter(callback) {
            Answer.find({}, 'choice question').sort({createdAt: -1})
            .skip(position).limit(roundSize).lean().exec(function(err, allFound) {
                if (err) { console.log(err) }
                else {
                    /// do something
                }
                console.log('Iterated', position);
                callback(null, position);
            });
            position += roundSize;
        },
        function (err, n) {
            console.log('Done', err, n);
        }
    );
}

This issue seems to only affect callbacks. You can work around this memory leak by using promises instead as shown below.

      async.whilst(
        function() { return position < numDocs; }, // async v2.x format
        function iter(callback) {
            Answer.find({}, 'choice question').sort({createdAt: -1})
            .skip(position).limit(roundSize).lean().then(function(allFound) {
                //if (err) { console.log(err) }
                //else {
                    /// do something
                //}
                console.log('Iterated', position);
                callback(null, position);
            }).catch(err => console.log(err));
            position += roundSize;
        },
        function (err, n) {
            console.log('Done', err, n);
        }
    );

It looks like it is caused by the changes in #8259, will keep digging and see what I can find 👍

@vkarpov15
Copy link
Collaborator

It has something to do with clearing timeouts. Wrapping callback(null, position); in process.nextTick() gets rid of the issue too, it must be that somehow clean stack traces are preventing previous iterations from getting cleaned up unless you clear the stack before every call. The issue is that, when iter calls callback(), it kicks off the next iter synchronously, which seems to cause problems. Will dig into this some more.

@rsnr
Copy link

rsnr commented Apr 29, 2021

Thanks so much @vkarpov15 for researching this issue and suggesting these two alternative code changes.
I confirm that both alternatives you suggested (options "a" and "b" below) prevent the Out Of Memory state when I use mongoose v5.12.5 on a low-memory Heroku configuration:
a) cursor().eachAsync()
b) keeping async.whilst but replacing Model.find().exec(function(err, result) {...}) by: Model.find().then(function(result) {...}).catch(err => ...)
In my specific case the cursor option was about x15 slower than the async.whilst with then(), so I'll stick with the latter.

@Automattic Automattic locked as resolved and limited conversation to collaborators May 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
confirmed-bug We've confirmed this is a bug in Mongoose and will fix it.
Projects
None yet
Development

No branches or pull requests

4 participants