Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new DQL construct to support initialization of to-many associations? #10314

Open
mpdude opened this issue Dec 18, 2022 · 2 comments
Open

Comments

@mpdude
Copy link
Contributor

mpdude commented Dec 18, 2022

Feature Request

Q A
New Feature yes
RFC yes
BC Break no

Background

  • fetch=EAGER does not make sense for to-many collections, since currently one additional query per loaded entity is needed to initialize to-many collections (DDC-3905: Eager fetching yields N+1 queries in *ToMany associations when using DQL #4762)
  • An early draft/POC exists ([GH-1569] Optimize eager fetch for collections to batch query #8391) that might allow to get away with one extra query per collection field, at least for one-to-many associations; but this cannot be enabled on a per-query basis and/or does not allow for context-sensitive optimization.
  • In order to initialize collections, use fetch joins.
  • Fetch joins for multiple, unrelated collections may lead to exponential growth of the database result set (http://ocramius.github.io/blog/doctrine-orm-optimization-hydration/); multi-step hydration (explained in that blog post) is the recommended workaround.
  • That blog post also shows the usage of PARTIAL, e. g. SELECT PARTIAL user.{id}, sessions FROM User user JOIN user.sessions sessions to avoid loading all of the user.* data in this query, since all that data has already been loaded (we're only interested in initializing the collection in this step). But partial objects have been deprecated for good reasons in Deprecate Partial Objects #8471.
  • When not using PARTIAL, overhead data may be multiplied in the database result set. For example, given a Customer → Order → OrderLineItem association chain with all being to-many, initializing the Order::$items collection means fetching unnecessary Order data once for every item, and unnecessary Customer data for as many times as the Customer has total OrderLineItems.

Suggestion

Provide a mechanism, helper class, new DQL query type or keyword or other extension that

  • makes it easy (or easier) for users to initialize to-many collections,
  • can be used to conveniently perform multi-step hydration
  • reduces database (= result set size) overhead as much as possible
  • can be maintained without having to provide partial object support
  • at best, can be used given arbitrary queries

To be clear, the goal of this feature is to only initialize uninitialized collection fields of entities already loaded/being in the identity map.

Users must be able to determine for which of those entities initialization shall occur, probably based on a DQL query expression.

It would be acceptable to initialize only one field at a time, and to perform a constant number (O(1) ) of database queries per such field.

The new mechanism is not required to return any data (entities) when invoked, although it may do so if the semantics can be clearly defined. This might limit the applicability of DQL SELECT clauses.

At this time, I have no insight into whether a new DQL query type, query hints, new SELECT keywords or even helper classes (with no DQL changes required) would be needed to implement the suggestion.

Illustrative examples

Let's assume we have the User with to-many Session and SocialAccount associations as in the blog post.

SELECT u FROM User u WHERE u.created >= ... – we're interested in a subset of Users only.

Example 1

A new INITIALIZED keyword means "select identifying columns only, and additionally skip result row when not in identity map":

SELECT INITIALIZED u, s FROM User u LEFT JOIN u.socialAccount s WHERE u.created >= ...
SELECT INITIALIZED u, s FROM User u LEFT JOIN u.sessions s WHERE u.created >= ...

Basically, this is like PARTIAL but avoids declaring the identifying columns u.{id, ...} and does not need to create partial objects, since objects not in the identity map may be skipped.

Example 2

A new INITIALIZE query type might make it more expressive that we're aiming at collection fields, and may specify only one such field at a time?

INITIALIZE u.socialAccount FROM User u WHERE u.created >= ...

Maybe this would not even require to start on the root entity and/or specify the entire association chain from there. With the association chain described above:

INITIALIZE o.items FROM Customer c LEFT JOIN c.orders o WHERE ...

Under the hood, the multi-step hydration would happen through SELECT PARTIAL _t1.{id}, _t2 FROM Order _t1 LEFT JOIN o.items _t2 WHERE _t1 IN (SELECT IDENTITY(o) FROM Customer c LEFT JOIN c.orders o WHERE ...) (does that even work?).

@beberlei
Copy link
Member

I believe we can look at the Entity Graph API that Java Persistence API has added in version 2.1, an example here: https://www.baeldung.com/jpa-entity-graph

@AnatolySnegovskiy
Copy link

I believe that the selection should be native
something like SELECT u.title, u.children FROM User
if the children field is specified in my entity, then it should be logical that I can get it from DQL

It's a mystery to me why it still doesn't work

the only place where I see a problem is possible looping, but this is solved by skipping related entities in the requested

I am now starting to think about how, make a fork and implement this mechanism on my own

If there are already some solutions, make selections by type
SELECT u.title, u.order, u.children, u.parent FROM User
Please tell me
In the meantime, in such a request, I get messages about the absence of fields children and parent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants