Description
Do you want to request a feature or report a bug?
Bug
What is the current behavior?
A findOne() invocation, paired with a dynamic populate() call against a document with an array of subdocuments that contain a nested refpath, 'arrayName.typeField', causes an out of memory exception. This particularly happens when the subdocument array contains a large number of records (i.e. 2100). I don't fully understand the logic, so I will just list my findings
------
During the Model.populate('arrayName.idField') -> getModelsMapForPopulate() -> _getModelNames() method invocation, the array of modelNames returned contains 2100 elements, each of which with a string value of the referred Model . Down the line, this causes the addModelNamesToMap() method to create a map with a very large memory footprint, where an individual model in the map has a total of 2.1 million records in it's allIds property, which just seems wrong:
Each of the arrays in allIds are copies of each other, which brings into question why we need them all. In this scenario, we ultimately see an out of memory exception during an invocation of Model.populate('arrayName.idField') -> _done() -> _assign -> utils.clone(mod.allIds):
This does not seem right. If we return the modelNames array from _getModelNames() with just 3 elements, 'Model1', 'Model2', and 'Model3', populate('arrayName.idField') seems to produce the correct results without this huge amount of memory usage. Maybe I don't really know what I'm talking about, but this seems like overkill
If the current behavior is a bug, please provide the steps to reproduce.
Pre-requisite: 1000+ elements exist in the document's 'items' array
const SampleSchema = new mongoose.Schema(
{
name: String,
items: [{
itemId: {
type: mongoose.Schema.Types.ObjectId,
required: true,
refPath: 'items.type'
},
type: {
type: String,
required: true,
enum: ['Model1, Model2, Model3']
}
}],
},
{
timestamps: {
createdAt: 'create_date',
updatedAt: 'update_date'
}
}
)
const SampleModel = mongoose.model('Sample', SampleSchema)
async.waterfall([
(cb) => {
SampleModel
.findOne({
_id: mongoose.Types.ObjectId(id),
_organization: mongoose.Types.ObjectId(params.organizationId)
})
.populate('links.item')
.lean()
.exec(cb)
},
(sample, cb) => {
// process data here
}
], completedCallback)
What is the expected behavior?
The above code executed with the pre-requisite number of subdocuments does not blow up the heap
What are the versions of Node.js, Mongoose and MongoDB you are using? Note that "latest" is not a version.
NodeJs: 12.22.6
Mongoose: 6.0.13
MongoDB: 5.0.2
Activity
vkarpov15 commentedon Dec 19, 2021
I can confirm that the below script is much slower and takes much more memory than expected. We're investigating but haven't figured anything out yet.
fix(populate): handle array of ids with parent refPath