Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query: reuse complex projection in orderby etc rather than defining it again #16038

Open
maumar opened this issue Jun 11, 2019 · 9 comments
Open

Comments

@maumar
Copy link
Contributor

maumar commented Jun 11, 2019

test:

Select_take_null_coalesce_operator

query:

ctx.Customers.Select(
                    c => new
                    {
                        c.CustomerID,
                        c.CompanyName,
                        Region = c.Region ?? "ZZ"
                    }).OrderBy(c => c.Region).Take(5)

expected sql:

@__p_0='5'

SELECT TOP(@__p_0) [c].[CustomerID], [c].[CompanyName], COALESCE([c].[Region], N'ZZ') AS [Region]
FROM [Customers] AS [c]
ORDER BY [Region]

actual sql:

@__p_0='5'

SELECT TOP(@__p_0) [c].[CustomerID], [c].[CompanyName], COALESCE([c].[Region], N'ZZ')
FROM [Customers] AS [c]
ORDER BY COALESCE([c].[Region], N'ZZ')
@smitpatel
Copy link
Member

This could be done via EV running after translation. During translation phase, projections are not always added to the list. (When they are stored in projectionMapping)

@smitpatel
Copy link
Member

Related #20291

@smitpatel smitpatel self-assigned this Mar 16, 2020
@smitpatel smitpatel removed their assignment Aug 27, 2020
@ChrisJollyAU
Copy link

Just a couple of questions on this

  1. The above example is basically on a SqlFunctionExpression. Should it just handle that or other complex expressions (e.g. ScalarSubqueryExpression or ExistsExpression)
  2. Is this aiming to reuse it solely if there is a projection the same as the order by expression. Or even if the expression is solely in the ORDER BY, should it be moved to the projection with an alias and the alias referenced in the ORDER BY

I do have a fully working expression visitor from another provider that I'm working on that does exactly this that I can fixup/optimize

@roji
Copy link
Member

roji commented Jan 16, 2024

The above example is basically on a SqlFunctionExpression. Should it just handle that or other complex expressions (e.g. ScalarSubqueryExpression or ExistsExpression)

AFAICT there's nothing here that's specific to SqlFunctionExpression; any time that something in the projection is ordered by, the ORDER BY clause should just reference that projection via its alias.

Is this aiming to reuse it solely if there is a projection the same as the order by expression. Or even if the expression is solely in the ORDER BY, should it be moved to the projection with an alias and the alias referenced in the ORDER BY

If the expression is solely in the ORDER BY, then it shouldn't be in the projection, since we don't want to project that thing (and e.g. download the data to the client) - only order by it. So no, I think this is only about not duplicating the expression when it's referenced both by in the ORDER BY and in the projection.

Note that the same issue most likely exist with OFFSET/LIMIT, which are also evaluated after the projection.

I do have a fully working expression visitor from another provider that I'm working on that does exactly this that I can fixup/optimize

I've been working on various query architecture things recently, and I actually don't think that an additional visitation pass is the right way to fix this; ideally, the SelectExpressions actual projection is always populated (not only when it's e.g. pushed down), and this would all be done eagerly as part of translation, rather than in another pass afterwards...

@roji
Copy link
Member

roji commented Jan 16, 2024

Note: this conceptually intersects with #32277, which is about deduplicating subqueries in the tree; the scope here is much narrower (between clauses within the same select only), and we wouldn't necessarily use the same approach for both.

@ChrisJollyAU
Copy link

ChrisJollyAU commented Jan 16, 2024

If the expression is solely in the ORDER BY, then it shouldn't be in the projection, since we don't want to project that thing (and e.g. download the data to the client) - only order by it. So no, I think this is only about not duplicating the expression when it's referenced both by in the ORDER BY and in the projection.

Good point. I've had to go the other direction and put it into the projection all the time as my provider (https://github.com/CirrusRedOrg/EntityFrameworkCore.Jet / MS Access) did not support those complex expressions in the ORDER BY.

You mentioned that you don't think doing another visitation pass would be the right way to go. Obviously if you are wanting to do it differently that would require a decent number of changes to the query pipeline to support that. If needed to be done now, I take it doing it in the QueryTranslationPostProcessor is reasonable? Or would there be a better place to do it?

The test mentioned in the first post ultimately comes out as (note MS Access SQL syntax as that is what I have handy)

SELECT TOP 5 `t`.`CustomerID`, `t`.`CompanyName`, `t`.`Region`
FROM (
    SELECT `c`.`CustomerID`, `c`.`CompanyName`, IIF(`c`.`Region` IS NULL, 'ZZ', `c`.`Region`) AS `Region`
    FROM `Customers` AS `c`
) AS `t`
ORDER BY `t`.`Region`

I just picked this up now when I was extending the expression to be du duplicated to the SqlFunctionExpression but if you have a Limit or both Limit and Offset you will most likely need to have the query with the complex expression in a subquery. In my case the TOP 5 part was originally pushed down into the subquery which executed before any ordering.

Had a look at the other issues you mentioned.

#18775 I'm pretty sure that's a duplicate issue. It's a different type of expression but you can still do something like orderingExpression.Equals(projectionExpression) and get true

#32277 Yep, this does intersect. Actually, though you are going to have a problem with that sample case mentioned. I initially thought my post processor would de duplicate the whole clause

SELECT MAX([i5].[Timestamp])
    FROM [IterationValues] AS [i4]
    INNER JOIN [Iteration] AS [i5] ON [i4].[IterationId] = [i5].[Id]
    WHERE [i4].[IterationId] IN (
        SELECT [i3].[value]
        FROM OPENJSON(@__ids_0) WITH ([value] int '$') AS [i3]
    ) AND [i].[IterationId] = [i4].[IterationId]

but then I noticed in the OPENJSON section that the aliases it uses are all different (i6, i0, i3) so comparing expressions would always return false. You would need some other way of matching expressions.

@roji
Copy link
Member

roji commented Jan 16, 2024

Obviously if you are wanting to do it differently that would require a decent number of changes to the query pipeline to support that.

Yeah, that's definitely true. I'm currently doing a big push to clean up/refactor/modernize the query pipeline, although this specific area (rethink how projections/potential projections work) is probably long ways away.

If needed to be done now, I take it doing it in the QueryTranslationPostProcessor is reasonable?

Yep - you can add visitors to the postprocessor for your provider and do whatever transformations make sense there. It's sometimes tricky to get ordering right - you probably want to do your stuff after the relational postprocessor has completed, and the query tree is "done".

The test mentioned in the first post ultimately comes out as (note MS Access SQL syntax as that is what I have handy)

Right, makes sense - you're basically pushing down the query into a subquery so that the complex expression ends up in the subquery's projection, and at that point you can reference it as a simple column from the outer query's ORDER BY. FWIW a bit part of the work I'm currently doing on the pipeline is to make it much easier to transform the query tree in arbitrary ways - so EF 9.0 should look substantially better in this respect.

but then I noticed in the OPENJSON section that the aliases it uses are all different (i6, i0, i3)

This depends on exactly which kind of duplication is in question... In some case, the expression tree that comes out of translation contains the same SelectExpression instance, referenced from two places in the tree; this is easy to catch in a post-processing pass and transform as necessary (though one needs to do it right away, before any other visitor makes a change inside and causes the SelectExpression to be duplicated twice).

Anyway, it's interesting to know you're working on this kind of select manipulations for Jet; keep in touch around this and don't hesitate to let us know about what you're doing (and where EF could help you do better)!

@ChrisJollyAU
Copy link

I had some time over the weekend so converted my expression visitor over to SQL Server. You can find it at https://github.com/ChrisJollyAU/efcore/tree/dedupe if you want to have a look (based on 9.0 preview 2). It does deduplicate a number of queries.

The only thing I have noticed, is that when pushing down to a subquery, in EF 9 some subqueries get an additional column added to it that EF 8 doesnt do.

e.g. in Double_order_by_binary_expression

SELECT [w0].[Binary]
FROM (
    SELECT [w].[Id] + 2 AS [Binary], [w].[Id]
    FROM [Weapons] AS [w]
) AS [w0]
ORDER BY [w0].[Binary]

there is the extra [w].[Id] added in the pushdown. While not major, it is potentially unneeded. This comes about because with that SelectExpression there is the private value _identifier that has that column in it and the pushdown picks that up and adds it to the subquery projection. In EF8 _identifier doesn't have any values in it.

I know you are still thinking of changing the query architecture so maybe this will help in some form even if you don't go with a postprocessor. Otherwise, if you don't get to change the architecture, with a bit of cleanup and such, it may be something to merge

@roji
Copy link
Member

roji commented Apr 1, 2024

Thanks @ChrisJollyAU - at some point we'll definitely get around to working on such query improvements, unfortunately there's little chance it'll happen in 9.0.

Thinking about this a bit again... For a "proper" (non-postprocessing fix), we'd first need to get rid of the pending selector architecture (#32957): for the above query (Select+OrderBy), we currently move the Select forward ("pending"), and so the OrderBy is rewritten to contain the duplicated, complex expression. In other words, this original query:

ctx.Customers.Select(c => new { c.CustomerID, Region = c.Region ?? "ZZ" }).OrderBy(c => c.Region)...

... becomes the following in preprocessing:

ctx.Customers.OrderBy(c => c.Region ?? "ZZ").Select(c => new { c.CustomerID, Region = c.Region ?? "ZZ" })...

So the first step in fixing this properly would be removing that reordering (no small task). At that point, we should be able to modify RelationalSqlTranslatingEV so that when it translates the OrderBy lambda (c.Region), it produces a references to the projection rather than duplicating the entire projected expression. This would be a specific mode of RelationalSqlTranslatingEV when it's running after the projection (e.g. OrderBy, Skip, Take...) as opposed to before (Where, GroupBy - which are evaluated before the projection and cannot refer to it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants