Skip to content

findOne with nested refpath in large array of subdocuments causes out of memory exception #10983

Closed
@jessgoldq4

Description

@jessgoldq4

Do you want to request a feature or report a bug?

Bug

What is the current behavior?

A findOne() invocation, paired with a dynamic populate() call against a document with an array of subdocuments that contain a nested refpath, 'arrayName.typeField', causes an out of memory exception. This particularly happens when the subdocument array contains a large number of records (i.e. 2100). I don't fully understand the logic, so I will just list my findings

------

During the Model.populate('arrayName.idField') -> getModelsMapForPopulate() -> _getModelNames() method invocation, the array of modelNames returned contains 2100 elements, each of which with a string value of the referred Model . Down the line, this causes the addModelNamesToMap() method to create a map with a very large memory footprint, where an individual model in the map has a total of 2.1 million records in it's allIds property, which just seems wrong:

count undefined

Each of the arrays in allIds are copies of each other, which brings into question why we need them all. In this scenario, we ultimately see an out of memory exception during an invocation of Model.populate('arrayName.idField') -> _done() -> _assign -> utils.clone(mod.allIds):

RUNNING

This does not seem right. If we return the modelNames array from _getModelNames() with just 3 elements, 'Model1', 'Model2', and 'Model3', populate('arrayName.idField') seems to produce the correct results without this huge amount of memory usage. Maybe I don't really know what I'm talking about, but this seems like overkill


If the current behavior is a bug, please provide the steps to reproduce.

Pre-requisite: 1000+ elements exist in the document's 'items' array

const SampleSchema = new mongoose.Schema(
  {
    name: String,
    items: [{
      itemId: {
        type: mongoose.Schema.Types.ObjectId,
        required: true,
        refPath: 'items.type'
      },
      type: {
        type: String,
        required: true,
        enum: ['Model1, Model2, Model3']
      }
    }],
 },
 {
    timestamps: {
      createdAt: 'create_date',
      updatedAt: 'update_date'
    }
 }
)

const SampleModel = mongoose.model('Sample', SampleSchema)

async.waterfall([
      (cb) => {
        SampleModel
          .findOne({
            _id: mongoose.Types.ObjectId(id),
            _organization: mongoose.Types.ObjectId(params.organizationId)
          })
          .populate('links.item')
          .lean()
          .exec(cb)
     },
     (sample, cb) => {
	// process data here
     }
], completedCallback)

What is the expected behavior?

The above code executed with the pre-requisite number of subdocuments does not blow up the heap

What are the versions of Node.js, Mongoose and MongoDB you are using? Note that "latest" is not a version.

NodeJs: 12.22.6
Mongoose: 6.0.13
MongoDB: 5.0.2

Activity

added this to the 6.0.16 milestone on Nov 18, 2021
vkarpov15

vkarpov15 commented on Dec 19, 2021

@vkarpov15
Collaborator

I can confirm that the below script is much slower and takes much more memory than expected. We're investigating but haven't figured anything out yet.

'use strict';
  
const mongoose = require('mongoose');

const { Schema } = mongoose;

run().catch(err => console.log(err));

async function run() {
  await mongoose.connect('mongodb://localhost:27017/test', {
    useNewUrlParser: true,
    useUnifiedTopology: true
  });
  
  await mongoose.connection.dropDatabase();
  
  const SampleSchema = new mongoose.Schema({
    name: String,
    items: [{ 
      itemId: {
        type: mongoose.Schema.Types.ObjectId,
        required: true,
        refPath: 'items.type'
      },
      type: { 
        type: String,
        required: true,
        enum: ['Model1', 'Model2', 'Model3']
      }
    }],
  });
  
  const SampleModel = mongoose.model('Sample', SampleSchema);
  const Model1 = mongoose.model('Model1', Schema({ name: String }));
  
  const doc = { name: 'test', items: [] };
  for (let i = 0; i < 2100; ++i) {
    const { _id } = await Model1.create({ name: 'test' + i });
    doc.items.push({ itemId: _id, type: 'Model1' });
  }

  await SampleModel.create(doc);
  console.log('Created');

  await SampleModel.findOne().populate('items.itemId');
  console.log('Memory Usage:', process.memoryUsage().heapUsed / (1024 ** 2));
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @vkarpov15@jessgoldq4

        Issue actions

          findOne with nested refpath in large array of subdocuments causes out of memory exception · Issue #10983 · Automattic/mongoose