Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDC-3905: Eager fetching yields N+1 queries in *ToMany associations when using DQL #4762

Open
doctrinebot opened this issue Sep 16, 2015 · 14 comments
Assignees

Comments

@doctrinebot
Copy link

Jira issue originally created by user benjamin:

I have an entity with all kinds of associations:

  • OneToOne
  • OneToMany
  • ManyToOne
  • ManyToMany

I have tested using fetch="EAGER" on each of these associations, and running a DQL query against the entity.

For both OneToOne and ManyToOne, the SQL log looks like:

SELECT ... FROM Entity;
SELECT ... FROM RelatedEntity WHERE id IN (?);

Which is exactly 2 queries whatever the number of entities. This is all good.

However, for OneToMany the SQL log looks like:

SELECT ... FROM Product;
SELECT ... FROM Picture WHERE productId = ?;
SELECT ... FROM Picture WHERE productId = ?;
SELECT ... FROM Picture WHERE productId = ?;
...

And for ManyToMany the SQL log looks like:

SELECT ... FROM Product;
SELECT ... FROM Category t0 INNER JOIN ProductCategory ... WHERE ProductCategory.productId = ?;
SELECT ... FROM Category t0 INNER JOIN ProductCategory ... WHERE ProductCategory.productId = ?;
SELECT ... FROM Category t0 INNER JOIN ProductCategory ... WHERE ProductCategory.productId = ?;
...

Eager loading OneToMany and ManyToMany results in N+1 queries, so there's no point eager loading them at all.

Is this a bug, or a known limitation?

What prevents Doctrine from using a WHERE IN () and a single, second query?

@TriAnMan
Copy link

TriAnMan commented Feb 8, 2016

I also suffer from this behavior.

Are any workarounds there?

@Ocramius
Copy link
Member

Ocramius commented Feb 8, 2016

@Trainmaster the current workaround is doing fetch joins or multi-step hydration ( http://ocramius.github.io/blog/doctrine-orm-optimization-hydration/ )

I am not aware of any work being done to implement this particular feature

@bram-pkg
Copy link

Any update on this?

@Wulfheart
Copy link

Also interested in this.
Laravel handles this with a function called with.

@kafoso
Copy link

kafoso commented Jan 26, 2022

I would love to see this fixed.

Personally, I've also come to a place where I rarely use eager loading and rarely provider getters for automatically retrieving the contents of a collection, produced by an ORM\OneToMany or ORM\ManyToMany attribute/annotation. I do this afterwards through repositories/factories, which also allows for additional filtering when needed.

Do, however, remember that some database systems have a limit to the number of elements allowed in a list. E.g. Firebird's (https://firebirdsql.org/) 1499 list element limit.

See: https://www.ibphoenix.com/resources/documents/general/doc_323

Next, there's the query size in pure bytes to consider, e.g. MySQL's max_allowed_packet. A long list, outside of probably hitting PHP's memory_limit during the hydration process, might quickly eat away those precious bytes. Especially if the lookup is done via things like UUIDs or SHA256 sums, and not just simple integers.

If this hydration issue ever gets fixed, I'd recommend from the get go, that this is performed in chunks.

Suggested chunked hydration

Add a new method, getCollectionChunkSize, to the Doctrine\DBAL\Driver interface (yes, DBAL and not ORM). Of course, this method must then be implemented on all classes implementing the Doctrine\DBAL\Driver interface, which carries potential for backwards compatibility breakage in userland implementation (but there's always that risk).

protected int $collectionChunkSize = 2000;

...

public function getCollectionChunkSize(): int
{
    return $this->collectionChunkSize;
}

This value may of course vary depending on the platform/database system. The value should be changeable during the configuration stage of the DBAL.

The chunked fetching will then be performed as such:

$ids = [...]; // A list of n IDs, which may be integers, UUIDs, SHA256, etc.

foreach (array_chunk($ids, $platform->getCollectionChunkSize()) as $chunk) {
    ...

    $queryBuilder->andWhere($queryBuilder->expr()->in('entity.id', $chunk)); // Or something safer, e.g. using parameters, to avoid potential for 2nd order SQL injection

    ...
}

@kgasienica
Copy link

Any updates on this? Don't let Laravel be better... 😄

@Ocramius
Copy link
Member

@kgasienica propose a patch - if you haven't seen updates, asking for updates is redundant, no? :)

@guilhermeblanco
Copy link
Member

Hi,

While there is room for improvement, this is not a roadblock for your system.
The solution is to not rely on fetch="EAGER" and proactively query our dataset, using DQL (@Wulfheart pointed at Laravel solution, but it is nothing more than building your own query and executing it).

I always advise people to never use fetch="EAGER" as it leads to N+1 problem. You can solve it by running¨

One to One example:

SELECT p, pi FROM Product p JOIN p.image pi ...

Many to One example:

SELECT p, c FROM Product p JOIN p.category c ...

Aside from this solution, we would welcome contributions for batching support on eager associations.

Thanks,

@rodion-k
Copy link

@guilhermeblanco

One to One example:

SELECT p, pi FROM Product p JOIN p.image pi ...
Many to One example:

SELECT p, c FROM Product p JOIN p.category c ...

Actually this thread is about *ToMany associations. There is no simple solution for such associations. You can build such a query, but if there are multiple *Many assosiactions, it will be too slow.

@Ocramius
Copy link
Member

See proposed solution above: #4762 (comment)

It requires manual work, but that's how multi-step hydration works there

@rodion-k
Copy link

I read this one, but partial objects are deprecated #8471

@beberlei
Copy link
Member

I created a PR for subselect fetch a while ago that might fix this. Am on mobile so cant eaily find the PR id here

@SCIF
Copy link

SCIF commented Jul 26, 2022

@beberlei ,

I created a PR for subselect fetch a while ago that might fix this. Am on mobile so cant eaily find the PR id here

Here is it: #8391 :) People are looking forward to have a convenient way of fetching m-m relations

@mpdude
Copy link
Contributor

mpdude commented Dec 14, 2022

Regarding multi-step hydration (kudos @Ocramius for the excellent blog post!) a slight issue is that partial objects have been deprecated in #8471.

I completely understand why partial objects are a bad idea. They are, however, helpful in queries like (quoting from Marco's blog)

SELECT user, socialAccounts FROM User user LEFT JOIN user.socialAccounts socialAccounts;
SELECT PARTIAL user.{id}, sessions FROM User user LEFT JOIN user.sessions sessions;

The second query avoids loading the user fields again, where they are just bloat since all user entities are already hydrated and available in the identity map. The partial object is discarded, but is lightweight during load. You can write the second query as

SELECT user, sessions FROM User user LEFT JOIN user.sessions sessions

which is a regular fetch-join that will initialize the sessions collection in all user objects that have been fetched in the first step, but it pulls more data from the user table than necessary (and that data may even be duplicated due to the to-many join taking place).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests