Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mongoose.syncIndexes() creates indexes even if they exist #12250

Closed
2 tasks done
AbdelrahmanHafez opened this issue Aug 10, 2022 · 14 comments
Closed
2 tasks done

mongoose.syncIndexes() creates indexes even if they exist #12250

AbdelrahmanHafez opened this issue Aug 10, 2022 · 14 comments
Labels
confirmed-bug We've confirmed this is a bug in Mongoose and will fix it.

Comments

@AbdelrahmanHafez
Copy link
Collaborator

AbdelrahmanHafez commented Aug 10, 2022

Prerequisites

  • I have written a descriptive issue title
  • I have searched existing issues to ensure the bug has not already been reported

Mongoose version

6.5.2

Node.js version

16.x

MongoDB server version

4.x

Description

mongoose.syncIndexes(...) currently sends a createIndex command to the database even if an index already exists.
It should only create indexes for indexes that do not exist on the MongoDB server.

Steps to Reproduce

12250.js

'use strict';
const sinon = require('sinon');
const mongoose = require('mongoose');
const { Schema } = mongoose;
const assert = require('node:assert');


run().catch(console.error);

async function run() {
  // Arrange
  await mongoose.connect('mongodb://localhost:27017/test');
  await mongoose.connection.dropDatabase();

  const userSchema = new Schema({
    name: { type: String }
  }, { autoIndex: false });

  userSchema.index({ name: 1 });

  const User = mongoose.model('User', userSchema);

  const createIndexSpy = sinon.spy(User.collection, 'createIndex');
  const listIndexesSpy = sinon.spy(User.collection, 'listIndexes');

  // Act
  await mongoose.syncIndexes();
  assert.equal(createIndexSpy.callCount, 1);
  assert.equal(listIndexesSpy.callCount, 1);

  await mongoose.syncIndexes();

  // Assert
  assert.equal(listIndexesSpy.callCount, 2);
  assert.equal(createIndexSpy.callCount, 1);
}

Output

AssertionError [ERR_ASSERTION]: 2 == 1
    at run (E:\dev\mongoose\test.js:35:10)
    at processTicksAndRejections (node:internal/process/task_queues:96:5) {
  generatedMessage: true,
  code: 'ERR_ASSERTION',
  actual: 2,
  expected: 1,
  operator: '=='
}

Expected Behavior

mongoose should only create indexes for indexes that do not exist on the MongoDB server.

@AbdelrahmanHafez AbdelrahmanHafez added the confirmed-bug We've confirmed this is a bug in Mongoose and will fix it. label Aug 10, 2022
@vkarpov15 vkarpov15 added this to the 6.5.4 milestone Aug 15, 2022
AbdelrahmanHafez added a commit that referenced this issue Aug 15, 2022
@vkarpov15 vkarpov15 modified the milestones: 6.5.4, 6.5.5 Aug 30, 2022
vkarpov15 added a commit that referenced this issue Sep 7, 2022
refactor(model): allow optionally passing indexes to createIndexes and cleanIndexes
@vkarpov15 vkarpov15 modified the milestones: 6.5.5, 6.x Unprioritized Sep 7, 2022
lpizzinidev added a commit to lpizzinidev/mongoose that referenced this issue Dec 9, 2022
vkarpov15 added a commit that referenced this issue Dec 16, 2022
fix(model): prevent index creation on syncIndexes if not necessary
@Industrialice
Copy link

Industrialice commented Aug 11, 2023

EDIT: the issue was resolved and the comment is irrelevant. Although syncIndexes() triggered a catastrophic issue, the database was already not in the correct and expected state by the time it happened.

This bug should be considered critical as we can use our example to confirm it can cause a downtime and a lot of headache.

We have an upsert query which relies on a unique index to tell the database the entries with a matching field need to be merged instead of duplicated.

When syncIndexes was run, mongoose removed this unique index, leading to some upserting entries to insert duplicated documents. Mongoose then tried creating a new unique index, but it no longer works - MongoDB does not allow unique indices to be created if it detects duplicate entries.

Meanwhile all queries to the affected data got downgraded to COLLSCAN, which maxed out the CPU and led to clients not being able to access the server until the admins figured out a temporary fix.

The temporary fix to bring the server back online was to create a non-unique index. And the full solution was to then do a manual collection migration with a custom deduplication script.

Mongoose version in that case was 6.9.2. In its current state, the method is absolute cancer and should not be used on a live server.

@AbdelrahmanHafez
Copy link
Collaborator Author

@Industrialice
I can imagine how much of a hassle that could be, sorry for the inconvenience.

I believe this is already fixed on 6.8.1 by #12785. Can you come up with a repro script? Did you also try using the latest version (7.4.3)?

In its current state, the method is absolute cancer and should not be used on a live server.

Do you think that's because of that bug, or due to a fundamental issue with the design?

@Industrialice
Copy link

Industrialice commented Aug 11, 2023

@AbdelrahmanHafez
The issue we experienced is definitely present in 6.9.2, I checked and rechecked to confirm it was the version that was being run when the issue occurred.

I could try doing more testing on the issue if required, although reproducing the exact catastrophic issue can be problematic as the issue occurred on the live server specifically due to its high load (which was required for the duplicate document to get inserted in the brief time while the unique index was unavailable). But it'd be easy to check for other signs of an undex getting recreated (such as MongoDB index usage stats getting reset).

I haven't tried running versions past 6.9.2, although this issue doesn't seem to have relevant merges past 6.8.1.

The design question depends on what kind of goal the method pursues. From our understanding the goal of the method is to drop unused indexes and create indexes defined by schemas only if they don't already exist. Which sounds very safe and thus we just added running syncIndexes() on all collections as part of our maintenance.

If it were working as described we wouldn't have ended up with a database entering an invalid state. In that case it's just an implementation issue.

Design-wise the "team work" with MongoDB just made the issue so much worse - mongoose recreating the unique index led to a duplicate value getting inserted, and MongoDB refusing to handle such cases by itself (the only native tool dropDups for unique indexes was removed since version 3.0) led to an invalid database state that requires a bunch of manual effort (and potentially a downtime) to amend.

Addition: quickly testing it on a local environment with the same database setup and the same 6.9.2 mongoose there are no obvious indications that the index gets reset. MongoDB usage counter doesn't change and doesn't get reset. So I'm either overestimating this method's reliability, or there are more conditions required for the issue to occur.

@Industrialice
Copy link

Industrialice commented Aug 11, 2023

Okay, I recreated the full picture now of everything that happened and syncIndexes's itself got incorrectly blamed, even if it did trigger the issue.

The schema with an index on the field in question already existed, but it was not marked unique as duplication was not previously possible. The unique flag was added during a recent data rework, which went live a few hours before the maintenance.

Due to how mongoose works, it saw that the index on the field already exists and just ignored the new unique flag. It doesn't seem to emit any warnings either. Index not actually receiving the right properties was not noticed in time.

That is until syncIndexes() was run. It dropped the existing non-unique index and tried to create a unique index, but by that time there were already a few duplicate values, which prevented that from happening.

The only possible improvement I see is mongoose sending a warning that index changes were ignored. syncIndexes() did what it was supposed to do, even if it ended up converting a minor issue into catastrophic. I'd note that if index management was done manually, the first step would have been to create a unique index and drop the non-unique one only after seeing that the unique one was created successfully. Unsure if handling an edge case like this should be expected from a library function though.

So sorry for being misleading in the previous messages. The full circumstances ended up being tricky to trace back. I updated my initial comment to state that it is no longer relevant due to containing a wrong assumption (the wrong assumption that syncIndexes() dropped a unique index).

@JavaScriptBach
Copy link
Contributor

JavaScriptBach commented Aug 12, 2023

It sounds like the problem is that 1) you deployed indexes at the same time as your application, and 2) you made a backwards-incompatible index change. We also run Mongoose in production and this is a bit of a sharp edge because it's so easy to define your index along with the rest of your schema, even though index changes really need to be carefully reviewed and deployed separately.

@Industrialice
Copy link

Not reviewing the index change was definitely a mistake on our end. Unfortunate that instead of hitting any safe nets that mistake ended up spiraling following the Murphy's Law pattern.

@AbdelrahmanHafez
Copy link
Collaborator Author

A couple of comments:

  • Please don't try to reproduce scripts using your live database, we usually run a local database with a minimal repro script, and you can watch all the commands that are being sent to MongoDB using mongoose.set('debug', true);.
  • I would recommend for your scenario where you have an index and you would like to make it unique, add an extra index that is unique, deploy your app, then once the unique index is built, remove the old non-unique index. I actually suspect that this might not work. I'll try to come up with a repro script to assert if that works or not.
  • I noticed you're sticking to mongoose v6.x, which is not as maintained as the latest version, mongoose v7.x. Is there anything we can do to help you migrate to the latest version?

@AbdelrahmanHafez
Copy link
Collaborator Author

AbdelrahmanHafez commented Aug 12, 2023

I've done some testing, it seems like the only way to have no index-downtime is a little bit complicated, let's say you have a schema with a customId property that has an index on it, and you want to make that index unique.

You start with the following initial state

const userSchema = new Schema({
  customId: String
});

userSchema.index({ customId: 1 });

const User = mongoose.model('User', userSchema);

await User.syncIndexes();

Then if you just add { unique: true }, what will happen is that mongoose will drop the current index and create a new one. We can not create a new one without dropping the current due to a couple of reasons:
1- We create indexes in the background, so we just trigger the index creation, which could take tens of minutes or hours, depending on how big your collection is.
2- You can not have two indexes with the same auto-generated name.

So what you would need to do is add another index with a different name, and wait until it's actually created.
Two indexes, one unique and one not

userSchema.index({ customId: 1 }); // keep this as is for now
// add the following index, has to have a different name to avoid mongo throwing an error
userSchema.index({ customId: 1 }, { unique: true, name: 'customId_unique_1' }); 
// later
await User.syncIndexes();

Then you can delete the initial index, and have only the unique index.
One unique index with a custom name

userSchema.index({ customId: 1 }, { unique: true, name: 'customId_unique_1' });
// later
await User.syncIndexes();

You could leave it at that if you don't mind the custom name for the index. If you want to get rid of the custom name you will need two extra steps:
1- Add a unique index to customId without specifying a name.

userSchema.index({ customId: 1 }, { unique: true }); // will be built with the default name
userSchema.index({ customId: 1 }, { unique: true, name: 'customId_unique_1' });  // existing in db
// later
await User.syncIndexes();

2- Get rid of the index with a custom name.

userSchema.index({ customId: 1 }, { unique: true });
// later
await User.syncIndexes();

I agree that this is a lengthy flow for applying indexes, but I don't think there's a way around it, because you can not have multiple indexes with the same name in MongoDB, and you can not "update" an index AFAIK. You have to drop and re-create with the new options. So to guarantee you always have some index, you need to apply these steps carefully.

In my experience, this is not a problem as long as your collection has <100k documents, once your collection grows to 100k+~millions, index builds take a considerable amount of time.

@vkarpov15 Can you think of any way to improve this flow for changing an index from non-unique to unique while maintaining some kind of index at all times and avoid COLLSCAN?

@Industrialice
Copy link

Industrialice commented Aug 13, 2023

I'd clarify that the issues have been resolved and there's nothing else on our side that needs to be done.

Your method with creating a unique index on top of a non-unique wouldn't have worked in our case as duplicate entries would have appeared while the unique index is being created (due to it being created in the background), meaning it would have always failed.

In our case the issue was resolved by creating a temporary collection with a unique index that was used to store new entries, deduplicating the old collection and creating a unique index for it, and then merging the collections together. In our case this method worked without any client-visible downtime, albeit there was some (rather irrelevant in this particular case) data unavailability while the collections were being migrated.

We haven't yet switched past 6.9.2 as our dependency update increments are relatively large to keep a stable environment.

And my thoughts on avoiding the COLLSCAN.
I checked your code and you're dropping indexes first, and only once that is fully done you start creating indexes. So in our case even if unique index building wasn't failing, we'd have ended up DDoSing the server via COLLSCAN once the existing non-unique index got dropped. CPU getting maxed out would have also meant the new index building time is much larger.
So at a very minimum that can be improved - build indexes, then drop.
This way you not only avoiding the COLLSCAN, you can bail out from dropping indexes if any of the indexes failed to create (assuming you can wait for index builds to finish and that you can get the index build status from MongoDB).

The described approach would have completely prevented the issue we experienced and it doesn't seem to have any obvious downsides compared to your current implementation.

@AbdelrahmanHafez
Copy link
Collaborator Author

So at a very minimum that can be improved - build indexes, then drop.

Please re-read my comment again, I explained why we can not do that. a TLDR version is because of index-name conflicts. You can not have two indexes with the same auto-generated name in MongoDB.

Although an argument could be made that we can try to create-then-drop and have a special case for indexes with conflicting names where we drop-then-create.

We could even have an option allowDroppingBeforeCreatingIndexes: true that is opt-in. IMHO this feels like an overkill and would add too much complexity to both the end-users and the maintainers of mongoose.

@vkarpov15 I'll leave this thread open for the nice discussion, this is a scenario that I suspect might happen with more people, but the OP issue is fixed already. Feel free to close this issue if you feel like there is no action needed regarding the discussion.

@Industrialice
Copy link

Industrialice commented Aug 14, 2023

I explained why we can not do that. a TLDR version is because of index-name conflicts. You can not have two indexes with the same auto-generated name in MongoDB.

I reread your comment as the first time I didn't realize you mean it as the implication of why you aren't doing it this way inside syncIndexes().

For the names I guess you are trying to avoid violating the default index name expectation? Also explicit names would need special handling.

Having to leave the terminal with the command running for hours doesn't sound optimal, that is true.

The only operation that seems to be dangerous is when there's both a drop and create index affecting the same field. So the high-level question here is whether it may make sense to handle this one explicitly, or would you consider it too much of an edge case?

Other than adding or removing unique from an index, there'd also be cases of adding or removing a compound index, which may lead to the same issue. Though I can't estimate right now how likely the compound index change to be problematic.

@vkarpov15 vkarpov15 modified the milestones: 7.x Unprioritized, 7.4.4, 7.4.5 Aug 15, 2023
@vkarpov15
Copy link
Collaborator

Adding allowDroppingBeforeCreatingIndexes option is unnecessary. If you want to run syncIndexes() without dropping existing indexes, just use Model.createIndexes(). There's a reason why Mongoose runs createIndexes() by default, not syncIndexes(): as this thread discussed, syncIndexes() is potentially destructive.

Here's a few thoughts on how this issue can be avoided in the future:

  1. Run Model.diffIndexes() before Model.syncIndexes(). diffIndexes() returns list of indexes that syncIndexes() will drop and a list of indexes that syncIndexes() will create. Without actually dropping or creating any indexes. That can help you check what indexes will be dropped/created, and help you spot any that are incorrect.
  2. Instead of creating a separate collection, you can use the duplicate key error's keyValue property to identify the documents that conflict and temporarily update the offending values as follows:
'use strict';
const mongoose = require('mongoose');
const { Schema } = mongoose;
const assert = require('node:assert');

run().catch(console.error);

async function run() {
  // Arrange
  await mongoose.connect('mongodb://localhost:27017/test');
  await mongoose.connection.dropDatabase();

  const userSchema = new Schema({
    name: { type: String }
  }, { autoIndex: false });

  userSchema.index({ name: 1 }, { unique: true });

  const User = mongoose.model('User', userSchema);

  await User.create({ name: 'test' });
  await User.create({ name: 'test' });

  try {
    await User.syncIndexes();
  } catch (err) {
    const keyValue = err.keyValue;
    // Clean up duplicates
    const conflicting = await User.find(keyValue);
    for (const user of conflicting) {
      user.name = user.name + '::' + (new mongoose.Types.ObjectId());
      await user.save();
    }
  }
  await User.syncIndexes();
  console.log('Done');
}

@vkarpov15 vkarpov15 removed this from the 7.4.5 milestone Aug 23, 2023
@Industrialice
Copy link

Your suggested solution to fix the database doesn't work on a live busy database, as new duplicated entries will get created after you searched for conflicts and before you call syncIndexes(). I mentioned that it was the issue in our case, which is why we couldn't have done that.

Other suggestions on how to use API to avoid issues sound reasonable, as long as it's reflected by the documentation.

@vkarpov15
Copy link
Collaborator

That's fair, if new duplicated entries are being inserted and you aren't able to patch your code to either disable the code that causes dupes or use findOneAndUpdate() to avoid duplicates, my solution for cleaning up duplicates wouldn't help. We'll add some notes to the docs highlighting the relationship between syncIndexes() and diffIndexes()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug We've confirmed this is a bug in Mongoose and will fix it.
Projects
None yet
Development

No branches or pull requests

5 participants