OCC support #9001

thernstig · 2020-05-14T11:05:23Z

Some documentation suggests doing e.g. a findOne() followed by modifications, and then a .save(). Isn't this recommendation prone to breaking the atomicity that MongoDB otherwise brings via the update operations? Could it not mean e.g. two REST operations to read/update some document could give incorrect end results?

Am I missing something, or should the Mongoose docs not recommend this?

The text was updated successfully, but these errors were encountered:

AbdelrahmanHafez · 2020-05-14T11:24:29Z

user.save() will insert a new document if it's a new document, or update it if it's already on the database.

thernstig · 2020-05-14T12:24:31Z

Yes, it is the update case I am talking about. When doing a const doc = new Document followed by doc.save() I find no problem. It is the query->update->save that is a problem. There are multiple locations throughout all documentation that shows example of the query->update->save way of handling this.

AbdelrahmanHafez · 2020-05-14T12:47:59Z

Why do you think that this is bad? You may need data from the document to determine the update value.

thernstig · 2020-05-14T17:36:04Z

Maybe I am missing something, but isn't that a classical race condition issue?

Flow A sends a findOne() to get document X.
Flow B sends a findOne() to get the same document X. Both A and B hold the same exact same document as the result of the identical queries. This is entirely possible in a REST API situation with many users of the API.
Flow A modifies some data in document X.
Flow B modifies the same data, but to some other value, in document X.
Both use .save(). Depending on if flow A or flow B finishes first, the final document stored in the database will look different.

The proper way to avoid this would be a transaction I assume. Still, the examples in the docs showing the query->update->save works isolated but not in a production context.

AbdelrahmanHafez · 2020-05-15T03:59:24Z

There can be race conditions, but it has nothing to do with .save(), it can happen with updateOne(...) as well because save() uses updateOne(...) internally.

Document#save() does not send the whole document after modifying it to the database. Instead, mongoose tracks the changes and sends those changes as an updateOne(...) to Mongo.

Run mongoose.set('debug', true); to see what gets sent to MongoDB.

thernstig · 2020-05-15T07:45:00Z

There can be race conditions, but it has nothing to do with .save(), it can happen with updateOne(...) as well because save() uses updateOne(...) internally.

I do not understand this sentence. How could it happen if I use only updateOne() directly? In that instance the operation is atomic.

Thus it can, from my understanding, not happen if using updateOne() only.

The thing I am after is that only this flow is a potential race condition: query -> modify doc -> save(). That flow is not atomic. Using updateOne() is the better solution since it is atomic.

Am I missing something completely?

AbdelrahmanHafez · 2020-05-15T07:57:07Z

The next two snippets do exactly the same thing.

With save

  const userSchema = new Schema({
    name: String,
    paidAmount: Number,
    isPremium: { type: Boolean, default: false }
  });

  const User = mongoose.model('User', userSchema);

  await User.create({ name: 'Hafez', paidAmount: 500, isPremium: true });

  // later
  const user = await User.findOne();
  if (user.isPremium) {
    user.paidAmount += 100;
  } else {
    user.paidAmount += 50;
  }

  mongoose.set('debug', true);
  // calls updateOne internally
  await user.save();

Output

users.updateOne({ _id: ObjectId("5ebe4a0a93864a5858c511c8") }, { '$set': { paidAmount: 600 } })

With update

  const user = await User.findOne();
  const update = {};
  if (user.isPremium) {
    update.paidAmount = user.paidAmount + 100;
  } else {
    update.paidAmount = user.paidAmount + 50;
  }

  mongoose.set('debug', true);

  await User.updateOne({ _id: user._id }, update);

Output (same)

users.updateOne({ _id: ObjectId("5ebe4a66209f4449b0ab55da") }, { '$set': { paidAmount: 600 } })

thernstig · 2020-05-15T11:37:05Z

Yes, in that case. But not if you just plainly update a field (without using previous values). And in the second example, I assume you could achieve that with this: https://stackoverflow.com/a/56551655

Thus, making it atomic.

The documentation should thus preferably not recommend query->update->save() because it certainly has a high chance of having users run into race conditions that are very hard to discover.

docs(document): elaborate on how Document#save() works

AbdelrahmanHafez · 2020-05-16T14:53:19Z

I agree that in cases where we don't depend on data in the current document, and the example is not related to .save(), we should use updateOne to inform people that they can do that, it definitely has better performance.

As for the SO question regarding update aggregation pipelines, it's added recently to MongoDB, so we can't add it in the docs, let alone how verbose it is, most users will find it confusing.

Yet, I still don't see how query=>update=>save can cause bugs, other from the performance side of it, I challenge you to come up with a script that demonstrates a bug with Document#save() but not with Model#updateOne().

Document#save() literally is Model#updateOne();.

thernstig · 2020-05-18T17:06:17Z

const mongoose = require('mongoose');

const db = mongoose.createConnection('mongodb://127.0.0.1:27017/database1', {
  useNewUrlParser: true,
});

const test = new mongoose.Schema({
  something: String,
});

const Test = db.model('Test', test);

(async () => {
  await db.dropDatabase();

  await Test.create({
    something: 'initiated',
  });

  // Two different API endpoints where both users queries something at the same time
  const test1 = await Test.findOne({});
  let test2 = await Test.findOne({});

  // User 1's API request updates something
  test1.something = 'first update';
  await test1.save();

  // User 2's API request updates something, but it's for another route, where we do a check
  if (test2.something === 'initiated') test2.something = 'second update';
  await test2.save();
  console.log(await Test.findOne({}));

  // output: { _id: 5ec2b99d4768207ca9b6c34c, something: 'second update', __v: 0 }

  await db.dropDatabase();

  await Test.create({
    something: 'initiated',
  });

  // We do the same thing again, but this time the first API endpoint is updated to
  // use updateOne instead of query->update->save()

  // Two different API endpoints where both users queries something at the same time
  await Test.updateOne({ something: 'first update' }); // this update is now atomic
  test2 = await Test.findOne({});

  // User 2's API request updates something, but it's for another route, where we do a check
  if (test2.something === 'initiated') test2.something = 'second update';
  await test2.save();
  console.log(await Test.findOne({}));
  // output: { _id: 5ec2b9d202eb247d5dd2be50, something: 'first update', __v: 0 }
})();

Note that this is a contrived example, and it would have been much better for user 2 to also use updateOne() as to not get into the race condition where user 2 does a findOne() before user 1's updateOne().

In a distributed system, query->update->save causes hard-to-catch bugs.

AbdelrahmanHafez · 2020-05-19T21:29:29Z

In the second case, move test2 = await Test.findOne(); one line above, or move line 22=>27, then both cases would be equivalent, and will yield the same results,

thernstig · 2020-05-21T07:19:26Z

@AbdelrahmanHafez The code should be read as the calls to the DB are happening in parallel, i.e. you cannot read the code literally. It should simulate two API endpoints accessing the same database. So it cannot be "moved" in a real life example as you state.

Regarding moving from 22=>27 I also specifically state in the end that it can be solved as well. But that is of smaller matter.

What the above proves is that query->modify->save does not work in a distributed, parallel system where API calls can come in any order.

AbdelrahmanHafez · 2020-05-24T23:21:43Z

@thernstig I know you can't read the code literally, I was demonstrating how updateOne/save can both cause a bug, and both can work correctly. This is not an issue with updateOne vs save.

Admittedly, though, updateOne reduces (but doesn't eliminate) the risk of that race condition because users don't need data, therefore there's less chance of the second findOne happening after that update.

Thoughts @vkarpov15?

thernstig · 2020-05-25T11:34:15Z

I am not sure I agree, I feel there is an issue with updateOne vs save as demonstrated. #9001 (comment) was taking incorrect assumptions that code could be moved around in a way that is not possible when parallel requests come in.

updateOne can reduce the risk of race conditions if you just write it properly in certain cases. I do not think this is an issue of covering all cases, but most cases. Isn't that what is important, to give good guidance to users that reduces the risk? Using updateOne always instead would thus minimize a very real risk in larger, concurrent API request scenarios.

I'm not sure I have more value to add to this thread, I believe I have demonstrated how a problem can be minimized.

Do not take this for not being very happy with mongoose, I just started this thread to try to help users in something I believe is a hard-to-track issue happening in production systems.

vkarpov15 · 2020-06-02T16:08:06Z

Sorry I didn't keep up with this thread, I've been a bit behind on GitHub notifications.

@thernstig your concern in this comment is a non-issue for most use cases, at least in my experience. Whether A or B wins out in the end doesn't matter if the user double-clicked a "save" button to update their profile. And versioning handles the nasty cases that can happen when modifying arrays in parallel.

I wrote about the tradeoffs between different ways of updating documents in Mongoose here. I think this image sums it up:

Here's another option: instead of recommending updateOne() for all cases, which is somewhat clunky, we can add support for a stricter form of versioning where save() fails if the underlying document in the database has changed since it was loaded based on a timestamp. Like OCC. I'd prefer that to be opt-in, but we've had enough asks for officially supported OCC that we should consider it.

thernstig · 2020-06-04T08:09:51Z

@vkarpov15 I think my concern is "it can happen", it can be another admin clicking a save button for a user. And those cases can be concerning, as they are hard to spot. Any way to eliminate such concerns is of value I think.

OCC seems like a feature worth having, but obviously an investment from your side. An idea would be a recommendation in docs that if a user has a more serious case where atomicity is a must, they could use transactions, at least if affecting multiple docs.

Thanks for a great package.

vkarpov15 · 2020-06-11T21:44:43Z

You're right that these cases can be concerning, but in practice they rarely are in my experience. One update wins out or the other, which is typically fine, unless the data ends up in an inconsistent state.

We'll add OCC support in a future release.

AbdelrahmanHafez added the docs label May 14, 2020

AbdelrahmanHafez added this to the 5.9.15 milestone May 14, 2020

AbdelrahmanHafez mentioned this issue May 15, 2020

docs(document): elaborate on how Document#save() works #9003

Merged

vkarpov15 added a commit that referenced this issue May 15, 2020

Merge pull request #9003 from AbdelrahmanHafez/gh-9001

51c5e15

docs(document): elaborate on how Document#save() works

vkarpov15 modified the milestones: 5.9.15, 5.9.16 May 18, 2020

vkarpov15 modified the milestones: 5.9.16, 5.9.17 May 25, 2020

vkarpov15 added discussion and removed docs labels Jun 2, 2020

vkarpov15 modified the milestones: 5.9.17, 5.x Unprioritized Jun 2, 2020

vkarpov15 changed the title ~~Docs: Query documents followed by .save()~~ OCC support Jun 11, 2020

vkarpov15 modified the milestones: 5.x Unprioritized, 5.10 Jul 5, 2020

vkarpov15 added the pending release label Jul 27, 2020

vkarpov15 closed this as completed in 6a7b275 Aug 14, 2020

AbdelrahmanHafez removed the pending release label Dec 7, 2020

vkarpov15 mentioned this issue Apr 21, 2021

Gh 10128 #10163

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCC support #9001

OCC support #9001

thernstig commented May 14, 2020

AbdelrahmanHafez commented May 14, 2020

thernstig commented May 14, 2020 •

edited

Loading

AbdelrahmanHafez commented May 14, 2020

thernstig commented May 14, 2020 •

edited

Loading

AbdelrahmanHafez commented May 15, 2020

thernstig commented May 15, 2020 •

edited

Loading

AbdelrahmanHafez commented May 15, 2020

thernstig commented May 15, 2020

AbdelrahmanHafez commented May 16, 2020 •

edited

Loading

thernstig commented May 18, 2020 •

edited

Loading

AbdelrahmanHafez commented May 19, 2020

thernstig commented May 21, 2020 •

edited

Loading

AbdelrahmanHafez commented May 24, 2020

thernstig commented May 25, 2020

vkarpov15 commented Jun 2, 2020

thernstig commented Jun 4, 2020

vkarpov15 commented Jun 11, 2020

OCC support #9001

OCC support #9001

Comments

thernstig commented May 14, 2020

AbdelrahmanHafez commented May 14, 2020

thernstig commented May 14, 2020 • edited Loading

AbdelrahmanHafez commented May 14, 2020

thernstig commented May 14, 2020 • edited Loading

AbdelrahmanHafez commented May 15, 2020

thernstig commented May 15, 2020 • edited Loading

AbdelrahmanHafez commented May 15, 2020

With save

Output

With update

Output (same)

thernstig commented May 15, 2020

AbdelrahmanHafez commented May 16, 2020 • edited Loading

thernstig commented May 18, 2020 • edited Loading

AbdelrahmanHafez commented May 19, 2020

thernstig commented May 21, 2020 • edited Loading

AbdelrahmanHafez commented May 24, 2020

thernstig commented May 25, 2020

vkarpov15 commented Jun 2, 2020

thernstig commented Jun 4, 2020

vkarpov15 commented Jun 11, 2020

thernstig commented May 14, 2020 •

edited

Loading

thernstig commented May 14, 2020 •

edited

Loading

thernstig commented May 15, 2020 •

edited

Loading

AbdelrahmanHafez commented May 16, 2020 •

edited

Loading

thernstig commented May 18, 2020 •

edited

Loading

thernstig commented May 21, 2020 •

edited

Loading