Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cursors are slow #8310

Closed
simllll opened this issue Nov 7, 2019 · 6 comments
Closed

Cursors are slow #8310

simllll opened this issue Nov 7, 2019 · 6 comments
Milestone

Comments

@simllll
Copy link
Contributor

simllll commented Nov 7, 2019

Do you want to request a feature or report a bug?
Bug

What is the current behavior?
I think the original issue #8235 is solved, but somehow my executions take now way longer... e.g. I iterate over a big collection without any conditions ({}) and I "only" get arround 250 entries/seconds. With mongoose 5.6 I got around 2000-3000 per second. I tried playing around with batchSize, but it seems it does not make any difference. Unsure how to dbeug this correclty, but could it be that batchSize does not apply anymore?

What is the expected behavior?
Faster query results.

What are the versions of Node.js, Mongoose and MongoDB you are using? Note that "latest" is not a version.

tested with v12.13.0
mongoose: v5.7.6
mongodb: 4.2.1

@vkarpov15 vkarpov15 added this to the 5.7.10 milestone Nov 9, 2019
@vkarpov15 vkarpov15 added the needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue label Nov 9, 2019
@vkarpov15
Copy link
Collaborator

I looked into this and I can confirm the issue is what appears to be an unnecessary setTimeout(). When running the below script:

'use strict';

const mongoose = require('mongoose');
mongoose.set('debug', true);

const Schema = mongoose.Schema;

run().catch(err => console.log(err));

async function run() {
  await mongoose.connect('mongodb://localhost:27017/test', { useNewUrlParser: true, useUnifiedTopology: true });

  const Model = mongoose.model('Test', Schema({ name: String }));
  const count = await Model.countDocuments();

  const numDocs = 50000;
  if (count < numDocs) {
    const docs = [];
    for (let i = count; i < numDocs; ++i) {
      docs.push({ name: `doc${i}` });
    }
    await Model.create(docs);
  }

    console.log('Executing query...');

  let numProcessed = 0;
  const start = Date.now();
  await Model.
    find().
    lean().
    cursor({ batchSize: 1500 }).
    eachAsync(async item => {
      console.log(item.name);
      ++numProcessed;
    }, { parallel: 5 });

  console.log('Time elapsed:', Date.now() - start);
  console.log('NumProcessed:', numProcessed);
}

There's a massive impact if I remove the setTimeout():

  • Without setTimeout(): Time elapsed: 15899
  • With setTimeout(): Time elapsed: 71066

The setTimeout() looks unnecessary because the function runs in the callback of cursor.next(), which should clear the stack appropriately. Without the setTimeout() the performance is equivalent to Mongoose 5.6.

@vkarpov15 vkarpov15 added performance and removed needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue labels Nov 9, 2019
@simllll
Copy link
Contributor Author

simllll commented Nov 9, 2019

Nice one, thanks!

By talking about cleaning the stack you are referring to "non blocking" behaviour for the event loop? I was just running into something similar recently, where I had to use setimmediate after some iterations and switched some stuff back to bluebird instead of native promises (bluebird has setimmediate calls built in) to "unblock" the evnet queue. Guess the setTimeout could come from there? As a side node, I only call setImmediate after every n-iterations to still gain all the speed beneftis of not returning to the event queue, but sitll not blocking the event queue forever.

@simllll
Copy link
Contributor Author

simllll commented Nov 9, 2019

just tested this somewhat, seems good. no blocking behaviour in my case 💃

@vkarpov15
Copy link
Collaborator

vkarpov15 commented Nov 11, 2019

Cleaning the stack means that, since eachAsync() uses recursion under the hood, we need to make sure we're not triggering a stack overflow if the recursion goes too deep. setTimeout() or setImmediate() is how you do that.

We try to avoid using setImmediate() where possible because certain awful testing libraries (ahem, Jest ) stub it out for some reason.

@simllll
Copy link
Contributor Author

simllll commented Nov 11, 2019

Thanks, didn't know about the issue with jest! (we use jest too and I didn't experience any issues so far, need to check that some day again)

@vkarpov15
Copy link
Collaborator

I know it's hard to switch testing libraries, but I'd ask you to consider it, both for the sake of maintainer sanity and for the sake of making your tests reliable. I've spent probably 20 hours debugging Mongoose issues that only happen in Jest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants