Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCC support #9001

Closed
thernstig opened this issue May 14, 2020 · 17 comments
Closed

OCC support #9001

thernstig opened this issue May 14, 2020 · 17 comments
Labels
discussion If you have any thoughts or comments on this issue, please share them!
Milestone

Comments

@thernstig
Copy link

Some documentation suggests doing e.g. a findOne() followed by modifications, and then a .save(). Isn't this recommendation prone to breaking the atomicity that MongoDB otherwise brings via the update operations? Could it not mean e.g. two REST operations to read/update some document could give incorrect end results?

Am I missing something, or should the Mongoose docs not recommend this?

@AbdelrahmanHafez AbdelrahmanHafez added the docs This issue is due to a mistake or omission in the mongoosejs.com documentation label May 14, 2020
@AbdelrahmanHafez
Copy link
Collaborator

user.save() will insert a new document if it's a new document, or update it if it's already on the database.

@AbdelrahmanHafez AbdelrahmanHafez added this to the 5.9.15 milestone May 14, 2020
@thernstig
Copy link
Author

thernstig commented May 14, 2020

Yes, it is the update case I am talking about. When doing a const doc = new Document followed by doc.save() I find no problem. It is the query->update->save that is a problem. There are multiple locations throughout all documentation that shows example of the query->update->save way of handling this.

@AbdelrahmanHafez
Copy link
Collaborator

Why do you think that this is bad? You may need data from the document to determine the update value.

@thernstig
Copy link
Author

thernstig commented May 14, 2020

Maybe I am missing something, but isn't that a classical race condition issue?

  1. Flow A sends a findOne() to get document X.
  2. Flow B sends a findOne() to get the same document X. Both A and B hold the same exact same document as the result of the identical queries. This is entirely possible in a REST API situation with many users of the API.
  3. Flow A modifies some data in document X.
  4. Flow B modifies the same data, but to some other value, in document X.
  5. Both use .save(). Depending on if flow A or flow B finishes first, the final document stored in the database will look different.

The proper way to avoid this would be a transaction I assume. Still, the examples in the docs showing the query->update->save works isolated but not in a production context.

@AbdelrahmanHafez
Copy link
Collaborator

There can be race conditions, but it has nothing to do with .save(), it can happen with updateOne(...) as well because save() uses updateOne(...) internally.

Document#save() does not send the whole document after modifying it to the database. Instead, mongoose tracks the changes and sends those changes as an updateOne(...) to Mongo.

Run mongoose.set('debug', true); to see what gets sent to MongoDB.

@thernstig
Copy link
Author

thernstig commented May 15, 2020

There can be race conditions, but it has nothing to do with .save(), it can happen with updateOne(...) as well because save() uses updateOne(...) internally.

I do not understand this sentence. How could it happen if I use only updateOne() directly? In that instance the operation is atomic.

Thus it can, from my understanding, not happen if using updateOne() only.

The thing I am after is that only this flow is a potential race condition: query -> modify doc -> save(). That flow is not atomic. Using updateOne() is the better solution since it is atomic.

Am I missing something completely?

@AbdelrahmanHafez
Copy link
Collaborator

The next two snippets do exactly the same thing.

With save

  const userSchema = new Schema({
    name: String,
    paidAmount: Number,
    isPremium: { type: Boolean, default: false }
  });

  const User = mongoose.model('User', userSchema);

  await User.create({ name: 'Hafez', paidAmount: 500, isPremium: true });

  // later
  const user = await User.findOne();
  if (user.isPremium) {
    user.paidAmount += 100;
  } else {
    user.paidAmount += 50;
  }

  mongoose.set('debug', true);
  // calls updateOne internally
  await user.save();

Output

users.updateOne({ _id: ObjectId("5ebe4a0a93864a5858c511c8") }, { '$set': { paidAmount: 600 } })

With update

  const user = await User.findOne();
  const update = {};
  if (user.isPremium) {
    update.paidAmount = user.paidAmount + 100;
  } else {
    update.paidAmount = user.paidAmount + 50;
  }

  mongoose.set('debug', true);

  await User.updateOne({ _id: user._id }, update);

Output (same)

users.updateOne({ _id: ObjectId("5ebe4a66209f4449b0ab55da") }, { '$set': { paidAmount: 600 } })

@thernstig
Copy link
Author

Yes, in that case. But not if you just plainly update a field (without using previous values). And in the second example, I assume you could achieve that with this: https://stackoverflow.com/a/56551655

Thus, making it atomic.

The documentation should thus preferably not recommend query->update->save() because it certainly has a high chance of having users run into race conditions that are very hard to discover.

vkarpov15 added a commit that referenced this issue May 15, 2020
docs(document): elaborate on how Document#save() works
@AbdelrahmanHafez
Copy link
Collaborator

AbdelrahmanHafez commented May 16, 2020

I agree that in cases where we don't depend on data in the current document, and the example is not related to .save(), we should use updateOne to inform people that they can do that, it definitely has better performance.

As for the SO question regarding update aggregation pipelines, it's added recently to MongoDB, so we can't add it in the docs, let alone how verbose it is, most users will find it confusing.

Yet, I still don't see how query=>update=>save can cause bugs, other from the performance side of it, I challenge you to come up with a script that demonstrates a bug with Document#save() but not with Model#updateOne().

Document#save() literally is Model#updateOne();.

@thernstig
Copy link
Author

thernstig commented May 18, 2020

const mongoose = require('mongoose');

const db = mongoose.createConnection('mongodb://127.0.0.1:27017/database1', {
  useNewUrlParser: true,
});

const test = new mongoose.Schema({
  something: String,
});

const Test = db.model('Test', test);

(async () => {
  await db.dropDatabase();

  await Test.create({
    something: 'initiated',
  });

  // Two different API endpoints where both users queries something at the same time
  const test1 = await Test.findOne({});
  let test2 = await Test.findOne({});

  // User 1's API request updates something
  test1.something = 'first update';
  await test1.save();

  // User 2's API request updates something, but it's for another route, where we do a check
  if (test2.something === 'initiated') test2.something = 'second update';
  await test2.save();
  console.log(await Test.findOne({}));

  // output: { _id: 5ec2b99d4768207ca9b6c34c, something: 'second update', __v: 0 }

  await db.dropDatabase();

  await Test.create({
    something: 'initiated',
  });

  // We do the same thing again, but this time the first API endpoint is updated to
  // use updateOne instead of query->update->save()

  // Two different API endpoints where both users queries something at the same time
  await Test.updateOne({ something: 'first update' }); // this update is now atomic
  test2 = await Test.findOne({});

  // User 2's API request updates something, but it's for another route, where we do a check
  if (test2.something === 'initiated') test2.something = 'second update';
  await test2.save();
  console.log(await Test.findOne({}));
  // output: { _id: 5ec2b9d202eb247d5dd2be50, something: 'first update', __v: 0 }
})();

Note that this is a contrived example, and it would have been much better for user 2 to also use updateOne() as to not get into the race condition where user 2 does a findOne() before user 1's updateOne().

In a distributed system, query->update->save causes hard-to-catch bugs.

@vkarpov15 vkarpov15 modified the milestones: 5.9.15, 5.9.16 May 18, 2020
@AbdelrahmanHafez
Copy link
Collaborator

In the second case, move test2 = await Test.findOne(); one line above, or move line 22=>27, then both cases would be equivalent, and will yield the same results,

@thernstig
Copy link
Author

thernstig commented May 21, 2020

@AbdelrahmanHafez The code should be read as the calls to the DB are happening in parallel, i.e. you cannot read the code literally. It should simulate two API endpoints accessing the same database. So it cannot be "moved" in a real life example as you state.

Regarding moving from 22=>27 I also specifically state in the end that it can be solved as well. But that is of smaller matter.

What the above proves is that query->modify->save does not work in a distributed, parallel system where API calls can come in any order.

@AbdelrahmanHafez
Copy link
Collaborator

@thernstig I know you can't read the code literally, I was demonstrating how updateOne/save can both cause a bug, and both can work correctly. This is not an issue with updateOne vs save.

Admittedly, though, updateOne reduces (but doesn't eliminate) the risk of that race condition because users don't need data, therefore there's less chance of the second findOne happening after that update.

Thoughts @vkarpov15?

@thernstig
Copy link
Author

I am not sure I agree, I feel there is an issue with updateOne vs save as demonstrated. #9001 (comment) was taking incorrect assumptions that code could be moved around in a way that is not possible when parallel requests come in.

updateOne can reduce the risk of race conditions if you just write it properly in certain cases. I do not think this is an issue of covering all cases, but most cases. Isn't that what is important, to give good guidance to users that reduces the risk? Using updateOne always instead would thus minimize a very real risk in larger, concurrent API request scenarios.

I'm not sure I have more value to add to this thread, I believe I have demonstrated how a problem can be minimized.

Do not take this for not being very happy with mongoose, I just started this thread to try to help users in something I believe is a hard-to-track issue happening in production systems.

@vkarpov15 vkarpov15 modified the milestones: 5.9.16, 5.9.17 May 25, 2020
@vkarpov15
Copy link
Collaborator

Sorry I didn't keep up with this thread, I've been a bit behind on GitHub notifications.

@thernstig your concern in this comment is a non-issue for most use cases, at least in my experience. Whether A or B wins out in the end doesn't matter if the user double-clicked a "save" button to update their profile. And versioning handles the nasty cases that can happen when modifying arrays in parallel.

I wrote about the tradeoffs between different ways of updating documents in Mongoose here. I think this image sums it up:

image

Here's another option: instead of recommending updateOne() for all cases, which is somewhat clunky, we can add support for a stricter form of versioning where save() fails if the underlying document in the database has changed since it was loaded based on a timestamp. Like OCC. I'd prefer that to be opt-in, but we've had enough asks for officially supported OCC that we should consider it.

@vkarpov15 vkarpov15 added discussion If you have any thoughts or comments on this issue, please share them! and removed docs This issue is due to a mistake or omission in the mongoosejs.com documentation labels Jun 2, 2020
@vkarpov15 vkarpov15 modified the milestones: 5.9.17, 5.x Unprioritized Jun 2, 2020
@thernstig
Copy link
Author

@vkarpov15 I think my concern is "it can happen", it can be another admin clicking a save button for a user. And those cases can be concerning, as they are hard to spot. Any way to eliminate such concerns is of value I think.

OCC seems like a feature worth having, but obviously an investment from your side. An idea would be a recommendation in docs that if a user has a more serious case where atomicity is a must, they could use transactions, at least if affecting multiple docs.

Thanks for a great package.

@vkarpov15
Copy link
Collaborator

You're right that these cases can be concerning, but in practice they rarely are in my experience. One update wins out or the other, which is typically fine, unless the data ends up in an inconsistent state.

We'll add OCC support in a future release.

@vkarpov15 vkarpov15 changed the title Docs: Query documents followed by .save() OCC support Jun 11, 2020
@vkarpov15 vkarpov15 modified the milestones: 5.x Unprioritized, 5.10 Jul 5, 2020
@vkarpov15 vkarpov15 mentioned this issue Apr 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion If you have any thoughts or comments on this issue, please share them!
Projects
None yet
Development

No branches or pull requests

3 participants