Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory spike when using find() #10400

Open
almeidx opened this issue Jun 28, 2021 · 12 comments
Open

Memory spike when using find() #10400

almeidx opened this issue Jun 28, 2021 · 12 comments

Comments

@almeidx
Copy link

almeidx commented Jun 28, 2021

Do you want to request a feature or report a bug?
Report a bug

What is the current behavior?
When I make a Model#find call with an $or operator, not only does the operation take ~5 minutes, it also takes up ~1 GB memory.

If the current behavior is a bug, please provide the steps to reproduce.
My use-case:

  const result = await Guild.find({
    $or: [
      { premium: { $eq: true } },
      { prefix: { $ne: 'p!' } },
      { blacklistedChannels: { $exists: true, $ne: [], $type: 'array' } },
    ],
  });

Reproduction repo: https://github.com/almeidx/mongoose-memory-leak-repro

When I run node index.js on this repository:
image

Not sure if relevant but, here are my collection details:

  • documents: 8360
  • size: 38.71MB

What is the expected behavior?
I expected it to not take ~5 minutes to process, but I can understand why it could take that long to process.
What I really didn't expect is the operation using ~1 GB of memory.

What are the versions of Node.js, Mongoose and MongoDB you are using? Note that "latest" is not a version.
14.17.1, 5.12.15 and 3.6.8, respectively
(I'm not sure if by 'MongoDB' you meant the mongodb npm module or the version of the database. I am using MongoDB Atlas version 4.4.6, mongodb npm module version 3.6.8)

@almeidx
Copy link
Author

almeidx commented Jun 28, 2021

UPDATE:

I tried using return fields on the find method, and these were the results (that I'm totally okay with):

  const result = await Guild.find(
    {
      $or: [
        { premium: { $eq: true } },
        { prefix: { $ne: 'p!' } },
        { blacklistedChannels: { $exists: true, $ne: [], $type: 'array' } },
      ],
    },
    ['_id', 'premium', 'prefix', 'blacklistedChannels']
  );

image

@vkarpov15 vkarpov15 added this to the 5.13.2 milestone Jul 2, 2021
@vkarpov15 vkarpov15 added has repro script There is a repro script, the Mongoose devs need to confirm that it reproduces the issue and removed performance labels Jul 2, 2021
@vkarpov15
Copy link
Collaborator

Hard to tell without taking a closer look at your data. Based on the fact that using a projection helps, my best guess is that one of the other fields is either unexpectedly massive or there's some issue with how Mongoose is handling one of the data types in your schema.

Are you able to determine which field you need to add in order to trigger the spike in memory usage? Also, does using lean() make a difference?

I've been able to trigger similar memory spikes, but I get the same memory spike using lean(), so it doesn't look like it's due to an issue with Mongoose schema types. Below is the latest iteration of the script I've been using to try to create data to repro this.

'use strict';

const mongoose = require('mongoose');
const Guild = require('./database/guild');

run().catch(err => console.log(err));

async function run() {
  await mongoose.connect('mongodb://localhost:27017/test', {
    useCreateIndex: true,
    useFindAndModify: false,
    useNewUrlParser: true,
    useUnifiedTopology: true,
  });

  await mongoose.connection.dropDatabase();

  for (let i = 0; i < 8360; ++i) {
    const xpRoles = {};
    for (let j = 0; j < 100; ++j) {
      xpRoles[`test${i}_${j}`] = [`test${i}`, `test${j}`];
    }

    await Guild.create({
      _id: `guild${i}`,
      autoPublishChannels: ['test'],
      autoResetLevels: i,
      autoRole: ['test'],
      autoRoleTimeout: i,
      blacklistedChannels: ['foo', 'bar'],
      counts: [],
      emojiList: true,
      emojiListChannel: 'test',
      leftAt: i,
      levels: true,
      mentionCooldown: 17 * i,
      mentionCooldownRoles: ['test'],
      mentionedRoles: [{ _id: `test${i}`, date: new Date() }],
      milestones: [{ count: 1, date: new Date(), member: 'test' }],
      milestonesChannel: 'test',
      milestonesInterval: 42,
      milestonesMessage: 'test',
      milestonesRoles: ['test'],
      noXpRoles: ['test'],
      prefix: 'foo',
      premium: true,
      prioritiseMultiplierRoleHierarchy: true,
      stackXpRoles: true,
      storeCounts: true,
      storeMilestones: true,
      topXp: 'test',
      topXpRole: 'test',
      xpBlacklistedChannels: ['test'],
      xpMessage: 'test',
      xpMultipliers: [{ multiplier: 2, targets: ['test'], type: 'test' }],
      xpResponseType: 'test',
      xpRoles,
      xpWhitelistedChannels: ['test']           
    });
    console.log(i);
  }
}

@vkarpov15 vkarpov15 removed this from the 5.13.2 milestone Jul 3, 2021
@vkarpov15 vkarpov15 added can't reproduce Mongoose devs have been unable to reproduce this issue. Close after 14 days of inactivity. and removed has repro script There is a repro script, the Mongoose devs need to confirm that it reproduces the issue labels Jul 3, 2021
@vkarpov15 vkarpov15 changed the title [bug] Memory leak when using $or query operator Memory spike when using find() Jul 3, 2021
@almeidx
Copy link
Author

almeidx commented Jul 3, 2021

@vkarpov15 I won't be able to test anything for the time being. But here are some stats that might help to answer your question:

  • the counts array could have anywhere from 0 to 200 items per document
  • the milestones array could have anywhere from 0 to 50 items per document

I'm guessing those are the only fields that could be causing some trouble. I don't know if that is they are considered exceedingly large

If they are problematic, would you suggest I use a separate collection for each mapped by the guild id instead, each count/milestone being its own document?

@vkarpov15
Copy link
Collaborator

@almeidx a separate collection could help if large arrays arrays or maps are causing this issue. I'll take another shot at repro-ing this later this week.

@vkarpov15 vkarpov15 added this to the 5.13.3 milestone Jul 6, 2021
@vkarpov15 vkarpov15 added needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue and removed can't reproduce Mongoose devs have been unable to reproduce this issue. Close after 14 days of inactivity. labels Jul 6, 2021
@vkarpov15 vkarpov15 modified the milestones: 5.13.3, 5.13.4 Jul 16, 2021
@vkarpov15 vkarpov15 modified the milestones: 5.13.4, 5.13.5, 5.13.6 Jul 28, 2021
@vkarpov15 vkarpov15 modified the milestones: 5.13.6, 5.13.7, 5.13.8 Aug 9, 2021
@vkarpov15 vkarpov15 modified the milestones: 5.13.8, 5.13.9 Aug 23, 2021
@vkarpov15 vkarpov15 modified the milestones: 5.13.9, 6.0.6 Sep 2, 2021
@vkarpov15
Copy link
Collaborator

We took a closer look using data generated via the below script and it looks like counts is causing a disproportionately large spike in memory usage. We're investigating why.

'use strict';
  
const mongoose = require('mongoose');
const Guild = require('./database/guild');

run().catch(err => console.log(err));

async function run() {
  await mongoose.connect('mongodb://localhost:27017/test', {
    useCreateIndex: true,
    useFindAndModify: false,
    useNewUrlParser: true,
    useUnifiedTopology: true,
  });

  await mongoose.connection.dropDatabase();

  for (let i = 0; i < 8360; ++i) {
    const xpRoles = {};
    for (let j = 0; j < 100; ++j) {
      xpRoles[`test${i}_${j}`] = [`test${i}`, `test${j}`];
    }

    const counts = [];
    for (let j = 0; j < 200; ++j) {
      counts.push([{ date: new Date(), count: j * 17 }]);
    }

    const milestones = [];
    for (let j = 0; j < 50; ++j) {
      milestones.push([{ count: j * 17, date: new Date(), member: '0'.repeat(100) }]);
    }

       await Guild.create({
      _id: `guild${i}`,
      autoPublishChannels: ['test'],
      autoResetLevels: i,
      autoRole: ['test'],
      autoRoleTimeout: i,
      blacklistedChannels: ['foo', 'bar'],
      counts,
      emojiList: true,
      emojiListChannel: 'test',
      leftAt: i,
      levels: true,
      mentionCooldown: 17 * i,
      mentionCooldownRoles: ['test'],
      mentionedRoles: [{ _id: `test${i}`, date: new Date() }],
      milestones,
      milestonesChannel: 'test',
      milestonesInterval: 42,
      milestonesMessage: 'test',
      milestonesRoles: ['test'],
      noXpRoles: ['test'],
      prefix: 'foo',
      premium: true,
      prioritiseMultiplierRoleHierarchy: true,
      stackXpRoles: true,
      storeCounts: true,
      storeMilestones: true,
      topXp: 'test',
      topXpRole: 'test',
      xpBlacklistedChannels: ['test'],
      xpMessage: 'test',
      xpMultipliers: [{ multiplier: 2, targets: ['test'], type: 'test' }],
      xpResponseType: 'test',
      xpRoles,
      xpWhitelistedChannels: ['test']
    });
    console.log(i);
  }
}

For some reason .select('-counts').limit(3500) uses almost 1 GB less memory than .limit(3500) or even .select('-milestones').limit(3500)

@vkarpov15 vkarpov15 modified the milestones: 6.0.10, 6.0.13 Oct 8, 2021
@vkarpov15 vkarpov15 modified the milestones: 6.0.13, 6.0.15 Oct 24, 2021
vkarpov15 added a commit that referenced this issue Dec 19, 2021
@vkarpov15 vkarpov15 modified the milestones: 6.1.3, 6.1.6 Dec 19, 2021
vkarpov15 added a commit that referenced this issue Jan 8, 2022
…mprove memory usage, several small improvements to improve initing docs with large arrays

Re: #10400
@vkarpov15 vkarpov15 modified the milestones: 6.1.6, 6.1.7, 6.1.9 Jan 10, 2022
@vkarpov15 vkarpov15 modified the milestones: 6.2.2, 6.2.4 Feb 7, 2022
@Uzlopak
Copy link
Collaborator

Uzlopak commented Feb 13, 2022

I run this in the mongoose benchmark folder

'use strict';
  
const mongoose = require('../lib/index');

const { Schema, model } = require('../lib/index');

const MilestoneSchema = new Schema({
  count: Number,
  date: Date,
  member: String,
});

const CountSchema = new Schema({
  count: Number,
  date: Date,
});

const MultiplierSchema = new Schema({
  multiplier: Number,
  targets: [String],
  type: String,
});

const MentionedRoleSchema = new Schema({
  _id: String,
  date: Date,
});

const GuildSchema = new Schema({
  _id: { required: true, type: String },
  autoPublishChannels: { of: String, type: Array },
  autoResetLevels: Number,
  autoRole: { of: String, type: Array },
  autoRoleTimeout: Number,
  blacklistedChannels: { of: String, type: Array },
  counts: [CountSchema],
  emojiList: Boolean,
  emojiListChannel: String,
  leftAt: Number,
  levels: Boolean,
  mentionCooldown: Number,
  mentionCooldownRoles: { of: String, type: Array },
  mentionedRoles: { of: MentionedRoleSchema, type: Array },
  milestones: [MilestoneSchema],
  milestonesChannel: String,
  milestonesInterval: Number,
  milestonesMessage: String,
  milestonesRoles: { of: String, type: Array },
  noXpRoles: { of: String, type: Array },
  prefix: String,
  premium: Boolean,
  prioritiseMultiplierRoleHierarchy: Boolean,
  stackXpRoles: Boolean,
  storeCounts: Boolean,
  storeMilestones: Boolean,
  topXp: String,
  topXpRole: String,
  xpBlacklistedChannels: { of: String, type: Array },
  xpMessage: String,
  xpMultipliers: { of: MultiplierSchema, type: Array },
  xpResponseType: String,
  xpRoles: { of: [String], type: Map },
  xpWhitelistedChannels: { of: String, type: Array },
},  {versionKey: false });
const Guild = model('guilds', GuildSchema)

run().catch(err => console.log(err));

async function run() {
  await mongoose.connect('mongodb://localhost:27017/test', {
    useNewUrlParser: true,
    useUnifiedTopology: true,
  });

  await mongoose.connection.dropDatabase();

  setInterval(() => {
    console.log('[Timer] Memory usage:', process.memoryUsage().heapUsed / (1024 ** 2));
  }, 10_000);

  for (let i = 0; i < 8360; ++i) {
    const xpRoles = {};
    for (let j = 0; j < 100; ++j) {
      xpRoles[`test${i}_${j}`] = [`test${i}`, `test${j}`];
    }

    const counts = [];
    for (let j = 0; j < 200; ++j) {
      counts.push([{ date: new Date(), count: j * 17 }]);
    }

    const milestones = [];
    for (let j = 0; j < 50; ++j) {
      milestones.push([{ count: j * 17, date: new Date(), member: '0'.repeat(100) }]);
    }

       await Guild.create({
      _id: `guild${i}`,
      autoPublishChannels: ['test'],
      autoResetLevels: i,
      autoRole: ['test'],
      autoRoleTimeout: i,
      blacklistedChannels: ['foo', 'bar'],
      counts,
      emojiList: true,
      emojiListChannel: 'test',
      leftAt: i,
      levels: true,
      mentionCooldown: 17 * i,
      mentionCooldownRoles: ['test'],
      mentionedRoles: [{ _id: `test${i}`, date: new Date() }],
      milestones,
      milestonesChannel: 'test',
      milestonesInterval: 42,
      milestonesMessage: 'test',
      milestonesRoles: ['test'],
      noXpRoles: ['test'],
      prefix: 'foo',
      premium: true,
      prioritiseMultiplierRoleHierarchy: true,
      stackXpRoles: true,
      storeCounts: true,
      storeMilestones: true,
      topXp: 'test',
      topXpRole: 'test',
      xpBlacklistedChannels: ['test'],
      xpMessage: 'test',
      xpMultipliers: [{ multiplier: 2, targets: ['test'], type: 'test' }],
      xpResponseType: 'test',
      xpRoles,
      xpWhitelistedChannels: ['test']
    });
  }
}

It seems that the memory leak now is just the many many strings we are creating on the fly

@vkarpov15
Copy link
Collaborator

We made a few minor improvements in 1a1d3cc. We also found that removing _id in countSchema, or creating your own _id without a default (_id: 'ObjectId') improves memory usage significantly: shaves off over 10% in our benchmark.

We're looking to see if there's a way to avoid running extra unnecessary _id defaults.

@vkarpov15
Copy link
Collaborator

Shaved off another 10% in 4c85c70 by properly passing skipId for array subdocuments. That means we've reduced the memory usage by about 50% overall since we've started working on this 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

5 participants