Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eachAsync: Cursor already in use #8235

Closed
simllll opened this issue Oct 10, 2019 · 9 comments
Closed

eachAsync: Cursor already in use #8235

simllll opened this issue Oct 10, 2019 · 9 comments
Labels
confirmed-bug We've confirmed this is a bug in Mongoose and will fix it.
Milestone

Comments

@simllll
Copy link
Contributor

simllll commented Oct 10, 2019

Do you want to request a feature or report a bug?
bug

What is the current behavior?
Since the last update of mongoose I experience a lot of "Cursor already in use" in long running queries where a cursor is used in combination with eachAsync().

If the current behavior is a bug, please provide the steps to reproduce.
Run a long runnign query and use eachASync to iterate over it. It's not 100% reproducable, but it happens significantely often.

What is the expected behavior?
No cursor already in use error.

What are the versions of Node.js, Mongoose and MongoDB you are using? Note that "latest" is not a version.

  • tested with v12.6.0 and v10.16.3 (LTS).
  • mongoose: v5.7.3
  • mongodb: 4.0.10
@simllll
Copy link
Contributor Author

simllll commented Oct 10, 2019

const stream = DbModel.find({
				createdAt: { $lt: new Date() },
			})
				.lean()
				.cursor({ batchSize: 1500 });

await stream.eachAsync(myObject => asyncDataProcessing(), { parallel: 5 });

@simllll
Copy link
Contributor Author

simllll commented Oct 10, 2019

One more thing: It seems sometimes if there is no error, the script just finishes earlier without all processed data sets. Sometimes a lot (e.g. supposed to iterate over 3500 sets, finished only around 700).

@simllll
Copy link
Contributor Author

simllll commented Oct 10, 2019

Backtrace:

MongoError: Cursor already in use (namespace: 'aDatabase.aCollection', id: 5482770079154774044).
  File "/var/app/current/node_modules/mongoose/node_modules/mongodb/lib/core/connection/pool.js", line 466, col 61, in Connection.<anonymous>
    return handleOperationCallback(self, workItem.cb, new MongoError(responseDoc));
  File "events.js", line 198, col 13, in Connection.emit
  File "domain.js", line 448, col 20, in Connection.EventEmitter.emit
  File "/var/app/current/node_modules/mongoose/node_modules/mongodb/lib/core/connection/connection.js", line 364, col 10, in processMessage
    conn.emit(
  File "/var/app/current/node_modules/mongoose/node_modules/mongodb/lib/core/connection/connection.js", line 566, col 15, in Socket.<anonymous>
    processMessage(conn, emitBuffer);
  File "events.js", line 198, col 13, in Socket.emit
  File "domain.js", line 448, col 20, in Socket.EventEmitter.emit
  File "_stream_readable.js", line 288, col 12, in addChunk
  File "_stream_readable.js", line 269, col 11, in readableAddChunk
  File "_stream_readable.js", line 224, col 10, in Socket.Readable.push
  File "internal/stream_base_commons.js", line 94, col 17, in Pipe.onStreamRead [as onread]

@simllll
Copy link
Contributor Author

simllll commented Oct 12, 2019

It turns out this issues is even more critical.
I downgraded to 5.6.13 (before 5.7 where the new driver has been added) and a lot of things changed since then: I get even more results for a simple find() query, eachAsync works again etc etc.

Details to my setup: I have 2 shards with 1 replace and 1 arbitor each. I believe the problems only happen on sharded collections, but couldn't verify this yet.

Anyway BEWARE: the current version of mongoose / mongodb driver does not work correctly!

@vkarpov15 vkarpov15 added this to the 5.7.6 milestone Oct 17, 2019
@vkarpov15 vkarpov15 added the needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue label Oct 17, 2019
@AndrewBarba
Copy link

AndrewBarba commented Oct 18, 2019

After running some tests I've found that this is definitely an issue with eachAsync and the parallel option:

  1. Using .on('data') and .on('end') always succeeds (tested on 50k docs batch size 1000)
  2. Using eachAsync({ parallel: 1 }) always succeeds (tested on 50k docs batch size 1000)

For reference this is my test function:

model
  .find()
  .read('sp')
  .batchSize(1000)
  .cursor()
  .eachAsync(async item => console.log(item._id), { parallel: 8 })

Perhaps most interesting is that parallel <= 8 always succeeds, where as parallel >= 9 always fails at some point with the error:

{ MongoError: cursor id 8862375401742707437 not found
    at Connection.<anonymous> (/Users/andrewbarba/Code/Barstool/bets-api/node_modules/mongoose/node_modules/mongodb/lib/core/connection/pool.js:466:61)
    at Connection.emit (events.js:198:13)
    at processMessage (/Users/andrewbarba/Code/Barstool/bets-api/node_modules/mongoose/node_modules/mongodb/lib/core/connection/connection.js:364:10)
    at TLSSocket.<anonymous> (/Users/andrewbarba/Code/Barstool/bets-api/node_modules/mongoose/node_modules/mongodb/lib/core/connection/connection.js:533:15)
    at TLSSocket.emit (events.js:198:13)
    at addChunk (_stream_readable.js:288:12)
    at readableAddChunk (_stream_readable.js:269:11)
    at TLSSocket.Readable.push (_stream_readable.js:224:10)
    at TLSWrap.onStreamRead [as onread] (internal/stream_base_commons.js:94:17)
  operationTime:
   Timestamp { _bsontype: 'Timestamp', low_: 3, high_: 1571375722 },
  ok: 0,
  errmsg: 'cursor id 8862375401742707437 not found',
  code: 43,
  codeName: 'CursorNotFound',
  '$clusterTime':
   { clusterTime:
      Timestamp { _bsontype: 'Timestamp', low_: 3, high_: 1571375722 },
     signature: { hash: [Binary], keyId: [Long] } },
  name: 'MongoError',
  [Symbol(mongoErrorContextSymbol)]: { isGetMore: true } }

I doubt the 8 has too much significance, seeing as original thread was using 5, but its incredibly consistent for the data set I'm testing on.

Mongoose v5.7.5
Node v10.16.3

@vkarpov15 vkarpov15 added confirmed-bug We've confirmed this is a bug in Mongoose and will fix it. and removed needs repro script Maybe a bug, but no repro script. The issue reporter should create a script that demos the issue labels Oct 18, 2019
vkarpov15 added a commit that referenced this issue Oct 19, 2019
@vkarpov15
Copy link
Collaborator

@AndrewBarba @simllll thanks for pointing out this issue and providing details on how to repro it. Turns out some recent refactoring to eachAsync() caused it to break in a non-obvious way. The fix will be in 5.7.6, and I can confirm the below script runs consistently now:

'use strict';
  
const mongoose = require('mongoose');
mongoose.set('debug', true);

const Schema = mongoose.Schema;

run().catch(err => console.log(err));

async function run() {
  await mongoose.connect('mongodb://localhost:27017/test', { useNewUrlParser: true, useUnifiedTopology: true });

  const Model = mongoose.model('Test', Schema({ name: String }));
  const count = await Model.countDocuments();

  const numDocs = 50000;
  if (count < numDocs) {
    const docs = [];
    for (let i = count; i < numDocs; ++i) {
      docs.push({ name: `doc${i}` });
    }
    await Model.create(docs);
  }

  console.log('Executing query...');

  
  let numProcessed = 0;
  await Model.
    find().
    batchSize(1000).
    cursor().
    eachAsync(async item => {
      console.log(item.name);
      ++numProcessed;
    }, { parallel: 9 });

  console.log('NumProcessed:', numProcessed);
}

@AndrewBarba
Copy link

Thanks for the quick fix, really appreciate it

@simllll
Copy link
Contributor Author

simllll commented Oct 20, 2019

Thanks also from me, great that you found the issue so quick +1

Looking forward to 5.7.6 :)

@simllll
Copy link
Contributor Author

simllll commented Nov 6, 2019

I think the original issue is solved, but somehow my executions take now way longer... e.g. I iterate over a big collection without any conditions ({}) and I "only" get arround 250 entries/seconds. With mongoose 5.6 I got around 2000-3000 per second. I tried playing around with batchSize, but it seems it does not make any difference. Unsure how to dbeug this correclty, but could it be that batchSize does not apply anymore?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug We've confirmed this is a bug in Mongoose and will fix it.
Projects
None yet
Development

No branches or pull requests

3 participants