Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance drop after upgrading 5 => 6 on populate with large volume of data #12865

Closed
2 tasks done
app-rc opened this issue Jan 3, 2023 · 13 comments · Fixed by #12867
Closed
2 tasks done

Performance drop after upgrading 5 => 6 on populate with large volume of data #12865

app-rc opened this issue Jan 3, 2023 · 13 comments · Fixed by #12867
Milestone

Comments

@app-rc
Copy link

app-rc commented Jan 3, 2023

Prerequisites

  • I have written a descriptive issue title
  • I have searched existing issues to ensure the bug has not already been reported

Mongoose version

6.8.2

Node.js version

16.17.0

MongoDB server version

4.2.23

Typescript version (if applicable)

No response

Description

Hi ✋

I noticed a very big performance drop when i switch from Mongoose 5 to 6.
This drop is only on very large populate (bulk 500 + thousands items inside each). It was fine in v5.

After digging, it seems thats it is caused by this line in assignRawDocsToIdStructure.js

const hasResultArrays = Object.values(resultOrder).find(o => Array.isArray(o));

If i switch with the previous condition (in mongoose 5), it works fine !

Array.isArray(_resultOrder) && Array.isArray(doc) && _resultOrder.length === doc.length

Can I put the old condition back while waiting for a fix?
Thank you for your reply

Steps to Reproduce

await Model
    .find({})
    .lean()
    .populate([
      {
        path: '_model2',
        select: 'props2',
        populate: {
          path: '_model3',
          select: 'props3'
        }
      },
      {
        path: '_model4',
        select: 'props4'
      },
      ...

Expected Behavior

No response

@Uzlopak
Copy link
Collaborator

Uzlopak commented Jan 3, 2023

Hmm. Well.

Object.values is an iterator, which is passed to the .find() method. The issue with that is, that we probably dont use the iterator functionality in a performant way. We just want to determine if the result has atleast one array attribute or not. So if we have an Object with alot of attributes and the first atribute is an array, we should be exiting fast. If the last element is an Array we should exit slow. So the runtime for this is O(n).

.find() third parameter is the chained Array. This means, that the javascript runtime has to pass the complete Array to the find method, even though we dont need it in this case. So instead of being able to iterate each value step by step Object.values() has to generate and hold the whole Array in memory and then pass it to the find(). So we have here a potential memory issue.

Well. i dont know if v8 can magically detect that it does not need the whole array in the find operation. and tbh i am to lazy to look it up. But it would be my first guess why this operation is slow.

So I would probably change it to

  let hasResultArrays = false;
  for (const value of Object.values(resultOrder)) {
    if (Array.isArray(value)) {
      hasResultArrays = true;
      break;
    }
  }
  let hasResultArrays = false;
  for (const key in resultOrder) {
    if (Array.isArray(resultOrder[key])) {
      hasResultArrays = true;
      break;
    }
  }

not tested.

@app-rc
Copy link
Author

app-rc commented Jan 3, 2023

Yes, i understand, il think you should probably use .some() instead. It breaks at first found.
The problem is that i have big objects without array, and it is still very slow because it has to be full checked :/

Previous version was not good ?

@Uzlopak
Copy link
Collaborator

Uzlopak commented Jan 3, 2023

.some() has also array as third parameter. So we have probably again the memory issue.

see:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/some

Previous version was not good ?

I personally wont invest time into investigating changes between version 5 and 6.

Also Array.isArray(_resultOrder) && Array.isArray(doc) && _resultOrder.length === doc.length doesnt lo ok to have equivalent functionality like const hasResultArrays = Object.values(resultOrder).find(o => Array.isArray(o));

So how could that even be a solution?

But you are free to propose a PR.

@app-rc
Copy link
Author

app-rc commented Jan 3, 2023

Yes i understand, of course.

I think a lit trick can be done with a simple condition.

hasResultArrays is only used if recursed and sorting are true.
Is it possible to calculate hasResultArrays only if thoses values are true

Something like that ?

let hasResultArrays
  
if (recursed && sorting) {
  hasResultArrays = Object.values(resultOrder).some(o => Array.isArray(o))
}

for .find() vs .some() you can find a small benchmark here :
https://www.measurethat.net/Benchmarks/Show/5921/0/some-vs-find

Thank you !

@Uzlopak
Copy link
Collaborator

Uzlopak commented Jan 3, 2023

Your benchmark is imho not testing the issue.

The issue is, that you generate the array on the fly with Object.values(). Object.values() returns an Iterator and not an Array. To be able to call find() or some(), the javascript runtime has to generate the full Array and then pass it to the method. But your benchmark just creates some Array and then calls find() and same(), which is of course fast but does not keep in account that we have to generate an Array or an Iterator on each call.

I think this benchmark is illustrating the performance issues better:
https://www.measurethat.net/Benchmarks/Show/22810/0/some-vs-find-vs-keys-and-values

@Uzlopak
Copy link
Collaborator

Uzlopak commented Jan 3, 2023

To your first claim: Yes. Calculate it only if relevant. But also keep in mind, that recursive is set to true in the function itself to indicate that assignRawDocsToIdStructure called itself. This means, that it would be calculated on every call when recursive is set to true.

You could make use of the fact that options is an Object and as such is passed as function parameter in javascript by reference.

So you could do

  if (sorting && recursed && options.hasArray === undefined) {
    options.hasArray = false;
    for (const key in resultOrder) {
      if (Array.isArray(resultOrder[key])) {
        options.hasArray = true;
        break;
      }
    }
  }

and then instead of if (hasResultArrays) { you do if (options.hasArray) {

And so you calculate it truely only once.

@app-rc
Copy link
Author

app-rc commented Jan 3, 2023

Yes ! You are right, it's better like this ! That's perfect !

Do you know when this fix could be added ?

Thank you so much !

@Uzlopak
Copy link
Collaborator

Uzlopak commented Jan 3, 2023

I would prefer that you create a PR and test if the performance really improves for your case.

@Uzlopak
Copy link
Collaborator

Uzlopak commented Jan 3, 2023

@app-rc
Created a PR
#12867

@app-rc
Copy link
Author

app-rc commented Jan 3, 2023

Yes it works perfectly, i was creating the PR.
Ok thank you !

Do i need to do something else or will it pass at next build ?

@Uzlopak
Copy link
Collaborator

Uzlopak commented Jan 3, 2023

So you can confirm that the performance improved?

@app-rc
Copy link
Author

app-rc commented Jan 3, 2023

Just tested with your branch, and it works perfectly !
I confirm, it improves the performance !

@Uzlopak
Copy link
Collaborator

Uzlopak commented Jan 3, 2023

If you want you can also write into the PR that you can confirm the performance change. I guess it will than be reviewed, merged and then released as usual. Releases are made about every 1-2 weeks. So probably you have to wait a little till it is released.

@vkarpov15 vkarpov15 added this to the 6.8.3 milestone Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants