Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support database window functions #12747

Open
Tracked by #22950 ...
roji opened this issue Jul 21, 2018 · 49 comments
Open
Tracked by #22950 ...

Support database window functions #12747

roji opened this issue Jul 21, 2018 · 49 comments

Comments

@roji
Copy link
Member

roji commented Jul 21, 2018

Both SQL Server and PostgreSQL seem to support window functions in a similar way - in at least some cases the syntax even seems to be identical. It could be nice to add support for this to EF Core.

A main question is how this would be expressed in LINQ, AFAIK there's no way to do these kinds of operations in memory.

@Diaskhan
Copy link

Diaskhan commented Jul 21, 2018

1.How do u Imagine window functions ?
Have U some thoghts, code example ?

Like.addcolumn(WF,sum,[doc_id,id])

2.What if wf gonna be few ?
Because simple agrregate sum,count is not to hard to implement. But in wf we must consider all test case!

@roji
Copy link
Member Author

roji commented Jul 21, 2018

Regarding the API, it's a good question and ideally there would be a generic (non-database!) LINQ way of expressing this.

Here's a totally incomplete API idea:

Employees.PartitionBy(emp => emp.Department, (emp, dept) => new { Id=emp.Id, Salary=emp.Salary, SalarySum=Math.Sum(dept.Select(emp => emp.Salary)) })

Basically you define two lambdas: the partition lambda (given an employee, returns the partition to which it belongs), and the projection lambda (given an employee and their department, projects with the new partition-calculated lambda). So the above would return an IEnumerable of an anonymous type containing each employee's Id, Salary and also their departments salary sum. One issue is that an in-memory implementation would recalculate the sum for each and every employee in the partition, which isn't very efficient.

This definitely does not cover everything that can be done with (PostgreSQL) window functions, but it's a start :)

Hopefully if the API is done correctly, there won't be a problematic limit on which window functions are supported.

@Diaskhan
Copy link

Diaskhan commented Jul 21, 2018

hmmmm. what about order by ?
postgre
SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary;

ms sql

SELECT ROW_NUMBER() OVER(PARTITION BY PostalCode ORDER BY SalesYTD DESC) AS "Row Number",   
    p.LastName, s.SalesYTD, a.PostalCode  

@pmiddleton
Copy link
Contributor

@roji - That syntax is very similar to what I'm working on for Pivot #11963.

Once I get that working maybe it won't be too bad to follow the same patter to get this added in. Of course I say that having look at this for 3 minutes :)

@roji
Copy link
Member Author

roji commented Jul 21, 2018

@pmiddleton oh, interesting, I don't know SQL Server's PIVOT functionality at all, is there a lot of overlap? I wonder if it's worth looking at the two together...

@roji
Copy link
Member Author

roji commented Jul 21, 2018

@Diaskhan I haven't looked into it yet, but maybe simply an overload with a third lambda parameter that provides the ORDER BY clause? This definitely needs to be thought out more thoroughly than my quick API proposals here.

@pmiddleton
Copy link
Contributor

@roji - There is no overlap in end user functionality, but I believe the implementation in the EF backend will be similar.

@divega
Copy link
Contributor

divega commented Nov 1, 2018

Just thinking about this and #13805 (comment), I came to realize that many scenarios in which LINQ's GroupBy does not translate well to SQL's GROUP BY because both rows and some computation of an aggregate or raking functions need to be returned are probably equivalent to SQL window functions.

@Jonatthu
Copy link

@divega Is this something still on progress or no action will be taken?

@roji
Copy link
Member Author

roji commented Oct 17, 2019

This feature is on the backlog, which means it's not going to be included in 3.1. Planning for the version after that hasn't started yet, so we don't know yet if this feature will be included. However, as there's relatively little interest this is unlikely to be prioritized.

@shoooe
Copy link

shoooe commented Oct 27, 2019

Just to give a use case: In GraphQL this is used a lot. When you have batching in combination with pagination this becomes essential.

@bessgeor
Copy link

bessgeor commented Apr 13, 2020

At least RANK() and ROW_NUMBER() seem to be expressible via

context
  .Employees
  // extract group key to window's partition by when pattern matches the one below
  .GroupBy(emp => emp.Department)
  .SelectMany
  (
    group => group
      // extract ordering to window's order by
      .OrderByDescending(emp = emp.Salary)
      .Select((emp, i) => new
      {
        emp.Department,
        emp.Number,
        emp.Salary,
        Rank = i
      })
  )

@kostat
Copy link

kostat commented Apr 18, 2020

The window functions are actually fully supported, though via a 3rd party extension.

Example of SQL Server PERCENT_RANK() function over partitions:

var years = new int?[] { 2016, 2017 };

DbContext.Set<StaffSalesPercentile>()
    .Query((VwStaffSales sales, Staffs staffs, StaffSalesPercentile alias) =>
    {
        var fullName = CONCAT_WS(" ", staffs.FirstName, staffs.LastName);
        var cumeDist = AggregateBy(PERCENT_RANK())
                .OVER(PARTITION(BY(sales.Year))
                .ORDER(BY(sales.NetSales).DESC));

        var r = SELECT<StaffSalesPercentile>(fullName.@as(alias.FullName),
            sales.NetSales.@as(alias.NetSales),
            sales.Year.@as(alias.Year),
            cumeDist.@as(alias.Percentile));
        FROM(sales).JOIN(staffs).ON(staffs.StaffId == sales.StaffId);
        WHERE(years.Contains(sales.Year));

        return r;
    });

Apparently this is a direction efcore may consider as well

@chamikasandamal
Copy link

Any update on this?

@ajcvickers
Copy link
Member

@chamikasandamal This issue is in the Backlog milestone. This means that it is not planned for the next release (EF Core 6.0). We will re-assess the backlog following the this release and consider this item at that time. However, keep in mind that there are many other high priority features with which it will be competing for resources.

@petalvik
Copy link

There are examples of window functions with working code in the Linq To DB ORM.
There is a linq2db.EntityFrameworkCore library, which will allow you to use window functions in EF Core.
This library is in the EF Core Tools & Extensions list.

@John0King
Copy link

John0King commented Jul 22, 2022

syntax suggestion:

var query =  db.Set<Table>()
    .UseWindowFunction(x=> EFWindowFunctions.RowNumber( ) , partition: new { x.Alpha  }, orderby: new []
    {  
        new OrderByDefine{ Column = x.CreateDate, Sort = Sort.Asc } 
    },   (x, i)=> new { x,  rowNumber = i } )
    .ToList(); 

static TResult UseWindowFunction<T,TFuncResult,TResult>(this IQuerable<T> query,  Expression<Func<IQuerable<T>, TFuncResult>> windowFunction, object? partition,  IEnumerable<OrderByDefine> orderby,
Expresstion<Func<T,TFuncResult,TResult>>  resultSelector)
{

}

static int RowNumber<T>(this EFWindowFunctions _)
{
    throw  new InvalidOperation("****")
}

sql tempalte:

select  A,B,C, <filed name>
FROM
(
    SELECT A,B,C, <windowFunction>  OVER ( <partition> , <orderby>)  as <filed name>
    FROM
    (
        <Original Query>
    )  AS  ___t
)  AS __t

@virzak
Copy link

virzak commented Jul 22, 2022

Library available

Hi all, in an attempt to get rid of ugly SQL in a project for my client, I've written a library that does provides window functions and binary functions as well. It has been used in production since January of this year, so battle tested in at least this one project.

I hope anyone interested can check it out, use it, harshly criticize it and contribute to it.

It is currently available for SQL Server, SQLite and Postgres and is multitargeted to frameworks 6 and 7-preview.

https://github.com/zompinc/efcore-extensions

Previously discussed here

@ErikEJ
Copy link
Contributor

ErikEJ commented Jul 22, 2022

@virzak suggest you do a PR to the tools list in the docs

@John0King

This comment was marked as spam.

@ajcvickers

This comment was marked as spam.

@xuzeyu91

This comment was marked as spam.

@roji

This comment was marked as spam.

@mojtabakaviani
Copy link

Please support window functions in efcore 9, i think this is the best time to add this feature.

@roji
Copy link
Member Author

roji commented Feb 8, 2024

An important scenario for this is paging, when the total number of rows (without the limit/offset applied) needs to be (efficiently) computed. See usage of COUNT(*) OVER () to do this efficiently in #32969 (comment).

@virzak
Copy link

virzak commented Feb 8, 2024

@roji,

Is there any interest in an EF Core community standup featuring https://github.com/zompinc/efcore-extensions ? Would love to come on and present it.

It recently implemented Window Functions inside a Where clause, multiple levels nested window functions and as of today it has count(*) over()

Here is the breakdown I had in mind:

Approximate breakdown could be:

  • Window functions basics
  • API design
  • Differences between provider implementation and steps library takes to align them
  • SubQuery generation
  • Expression tree modifications using IQueryTranslationPreprocessor which allows
    • Window function inside the .Where clause
    • Nested Window function
    • Pushing Window Function inside a Join into subquery without exposing public API such as .AsSubQuery() to the end user.
  • Interaction with binary data such as varbinary
  • Real world applications such as last non null puzzle

@roji
Copy link
Member Author

roji commented Feb 8, 2024

@virzak definitely could be! Let us discuss and we'll reach out as time allows etc.

@ErikEJ
Copy link
Contributor

ErikEJ commented Feb 8, 2024

@virzak are they on the tools list in the docs?

@virzak
Copy link

virzak commented Feb 8, 2024

@ErikEJ Right here.

@roji
Copy link
Member Author

roji commented Feb 24, 2024

Recent community standup video discussing windows function and @virzak's EF extension for supporting them.

@roji
Copy link
Member Author

roji commented Feb 24, 2024

Dumping some thoughts I had on the LINQ way to express windows functions... Note that this doesn't mean we plan to implement this any time soon (although it's definitely on my radar).

Most variations we've seen so far e.g. in the Zomp extension and in the above proposals) go in the direction of allowing users to directly express e.g. OVER via special LINQ operators. While that's a perfectly valid approach, ideally with LINQ the user doesn't express SQL directly, but rather uses the usual LINQ concepts (Where, OrderBy...) and EF translates that to the ideal SQL construct.

In the context of window functions, that would mean pattern-matching LINQ subquery patterns that match window functions. For example, consider the following query:

_ = await context.Blogs
    .Select(b => new { Blog = b, TotalCount = context.Blogs.Count() })
    .ToListAsync();

EF currently evaluates the context.Blogs.Count() fragment in the funcletizer, meaning that it gets sent as a separate query (roundtrip), and the result gets integrated as a parameter in the main query.

Instead, EF could translate this to a window function:

SELECT [b].[Id], [b].[Name], COUNT(*) OVER () AS TotalCount
FROM [Blogs] AS [b]

(this would require changing the funcletizer to not evaluate for this kind of case)

Going further a PARTITION BY clause could be expressed by adding a Where operator:

_ = await context.Blogs
    .Select(b => new { Blog = b, TotalCount = context.Blogs.Count(b2 => b2.Category == b.Category) })
    .ToListAsync();

This adds a count of all Blogs in the each Blog's category. It is currently translated via an inefficient correlated subquery:

SELECT [b].[Id], [b].[Category], [b].[Name], (
    SELECT COUNT(*)
    FROM [Blogs] AS [b0]
    WHERE [b0].[Category] = [b].[Category]) AS [TotalCount]
FROM [Blogs] AS [b]
WHERE [b].[Id] > 8

... but could be identified and transformed to the following using window functions:

SELECT [b].[Id], [b].[Name], COUNT(*) OVER (PARTITION BY [b].[Category]) AS TotalCount
FROM [Blogs] AS [b];

A window function ORDER BY could be expressed as follows:

_ = await context.Blogs
    .Select(b => new { Blog = b, TotalCount = context.Blogs.Select(b2 => b.Id).OrderBy(i => i).Where(i => i < b.Id).Min() })
    .Where(x => x.Blog.Id > 8)
    .ToListAsync();

This calculates the "minimum ID so far" for each Blog, and would translate to:

SELECT [b].[Id], [b].[Name], MIN([b].[Id]) OVER (ORDER BY [b].[Id]) AS TotalCount
FROM [Blogs] AS [b];

Some notes:

  • This approach works well for at least the basics, but will likely not work beyond a certain point. For example, LINQ lacks a "moving window" operator, which would provide e.g. 1 row preceding + 2 rows after (compare to Chunk which does exist). We could possibly propose such missing operators for LINQ, and then also use them to translate to SQL window functions, or introduce our own EF-specific operator.
  • I suspect once we'll explore the full range of possibilities of SQL window functions, we'll find other mismatches and problems - but we should still probably tend towards translating simple LINQ as above wherever possible, rather than introducing new operators/constructs.
  • Note that window functions are evaluated after any filter, grouping has been applied. That means that if there's a Where on the main (outer) DbSet, the exact same Where must be present on the inner subquery (the one we translate to a window function). The above examples are simple and have no such filters, but we'll need to take that into account.
    • This may mean that the user will need to get the LINQ expression "just right" in order for it to get translated to a window function, otherwise translation may fail (or worse, do a slow correlated query instead). This may be a disadvantage of this approach; with the alternative approach of having the user specify OVER, they're more directly expressing the SQL they want etc.
    • It may be possible to do something nicer by first expressing an IQueryable including the filter, and then reusing it twice - once in the main query and once in the subquery. This is a bit like what we're planning to do with CTEs, where the same fragment is integrated into the query twice (Translate embedded IQueryables to common table expressions (CTEs) #29918).
  • A generally tricky aspect is that window functions are only supported in the projection and ORDER BY of SELECT (e.g. not WHERE). As @virzak implemented in his extension, we'd have to have a mechanism to push down to a subquery if e.g. a WHERE is composed which contains window functions etc. I think it may be OK to split this out to a 2nd work phase.

@virzak @ajcvickers very interested to hear any thoughts you have on the above!

@virzak
Copy link

virzak commented Feb 27, 2024

@roji, I really like this approach. In retrospect, I should have started implementing the extension by detecting these LINQ patterns. There is no disadvantage to replacing an inefficient subquery in favour or window function where it makes sense.

Add RowNumber to the solution:

// Row Number
var query = context.Blogs
    .Select((b, i) => new { Blog = b, RowNumber = i + 1 });

The test cases of the Zomp library actually use LINQ to get the expected result and compares them to the actual results.

Some of those LINQ expressions are complex and not user friendly (perhaps they could be simplified). Here is an attempt at Percent_Rank.

expectedSequence = TestRows
.Select(r => r.Col1)
.OrderBy(x => x, comparer)
.Select(v => groups
    .Where(g => comparer.Compare(g.Key, v) < 0)
    .Select(g => g.Count())
    .Sum() / (double)(TestRows.Length - 1));

Another huge advantage of LINQ: there are differences between implementations.

Percent_Rank differs between Postgres and SQL Server by whether nulls come first or not.

So even though LINQ expression is more complex, it provides more determinism than the OVER() approach.

When you have the data:

Id Col1
2 null
3 10
5 -1
7 null
11 null
13 -12
17 null
19 null
23 1759

Sql Server produces:

SELECT
      Percent_rank = Percent_rank() Over( order by "Col1" )
      ,[Id]
      ,[Col1]
  FROM [TestRows]
Percent_rank Id Col1
0 2 null
0 7 null
0 11 null
0 17 null
0 19 null
0.625 13 -12
0.75 5 -1
0.875 3 10
1 23 1759

Postgres produces:

SELECT Percent_rank() Over( order by "Col1" ) as "Percent_rank", "Id", "Col1" FROM public."TestRows"
ORDER BY "Id" ASC 
Percent_rank Id Col1
0.5 2 NULL
0.25 3 10
0.125 5 -1
0.5 7 NULL
0.5 11 NULL
0 13 -12
0.5 17 NULL
0.5 19 NULL
0.375 23 1759

Perhaps a hybrid approach is good also, if some functionality cannot be covered by LINQ, but first stage of implementation can cover as much as possible without exposing any new API.

Adding "Moving Window" into Enumerable and System.Linq.Async's AsyncEnumerable would be a great addition for purposes other than EF Core. In my client project I have an implementation, which I would dump if LINQ had these functions. Of course it would need to cover cases where window size doesn't remain the same on each iteration (ROWS BETWEEN UNBOUNDED PRECEDING AND 2 FOLLOWING).

I also agree with the upsides of Over() approach as well. .PartitionBy(b.Category) is friendlier and less error prone than .Count(b2 => b2.Category == b.Category). Though not introducing new APIs seems like a far bigger advantage.

As for getting the filters "just right", perhaps we could encourage an approach where a query is constructed in multiple statements:

var filteredQuery = context.Blogs.Where(z => z.Id > 10);
var finalQuery = filteredQuery
    .Select(b => new { Blog = b, TotalCount = filteredQuery.Count(z => z.Id < b.Id) });

@roji
Copy link
Member Author

roji commented Feb 27, 2024

Add RowNumber to the solution

Absolutely. This is something that has been discussed before (#24218) and decided against, but I think we should reconsider and implement it.

Some of those LINQ expressions are complex and not user friendly (perhaps they could be simplified).

Yeah, I suspected this may be the case (that was the "just right" comment above). We can try to think about this and see if we can come up with something friendlier; if not, I think it may be OK to also expose an explicit window function LINQ construct as well.

Another huge advantage of LINQ: there are differences between implementations.

Exactly - plain old LINQ is great (where possible) also because it's an abstraction that shields the user from database-specific details.

Percent_Rank differs between Postgres and SQL Server by whether nulls come first or not.

Yeah, that's a very general difference between PG and SQL Server that affects everything (including just plain ORDER BY). That's one example of cases where we don't attempt to force the database into matching the .NET client behavior (or the behavior of ther databases) - similar to how SQL Server is case-insensitive by default (though note that PG ORDER BY allows specifying NULLS FIRST/LAST, and npgsql/efcore.pg#627 tracks exposing that to users; not sure if there's something similar for Percent_Rank). To summarize, I think a behavioral discrepancy specifically on NULL ordering would be acceptable if there's no good way around it.

Perhaps a hybrid approach is good also, if some functionality cannot be covered by LINQ, but first stage of implementation can cover as much as possible without exposing any new API.

Yep, I think that's a good approach.

As for getting the filters "just right", perhaps we could encourage an approach where a query is constructed in multiple statements [...]

Yes, that would indeed avoid the repetition that's otherwise needed. That would unfortunately clash with precompiled queries/NativeAOT, since those are only planned to work for statically-analyzable queries - and once you start breaking up the query into multiple IQueryable variables, that's no longer a simple static query. Though if the composed queryables are localized in a single function, statically analyzing them may become possible. In any case, we'll have the same problem with features other than window functions, e.g. the CTE support planned in #29918.

@pmiddleton
Copy link
Contributor

Ok time for me to chime in since this conversation has taken off.

I started implementing this feature in ef core back at xmas. I currently have things at a state where I have an api fully designed and can run queries. I am now adding in support for more aggregates, adding unit tests, and flushing out all the plethora of bugs.
I was hoping to have the basics finished up this spring.

For the api I went with a set of new fluent functions, similar to what others have done for a number of reasons.

I at first investigated adding in some custom linq operators. This lead to a bunch of dead ends due to being limited to hanging extension methods of the existing interfaces. You can't sufficiently control where the api surfaces itself when writing linq queries this way. You would really have to sit down with the linq team and modify the base interfaces/methods to make this approach work.

The fluent api approach is more intuitive from the average end developers standpoint. You have a well defined set of methods that show you exactly how to write a windowing function in a query. You are not relying on behind the scenes ef core magic to hopefully translate things into what you want. While there is merit in having ef core "just figure things out" in some cases I don't think it would be the best approach for that to be the main way to define windowing functions.

Here is an example of what I am currently doing.

MaxDiscount = EF.Functions.Over().PartitionBy(od.ProductID, od.OrderID).OrderBy(od.OrderID).Rows(RowsPreceding.CurrentRow, 5).Max(od.Discount),

I felt this format lent it self to the best usability and easy of implementation. It also allows for extension points for the various providers. For example different providers have different needs for row/range/groups in the sliding window, while others support adding in an excludable clause.

I currently was only shooting for support in select/order by, but based on this thread adding in support for where shouldn't be too difficult.

You can see my current working branch here. https://github.com/pmiddleton/EntityFramework/tree/windowing
There are still a bunch of placeholders and todos all over the place, but you can see the api I have defined. Best to start at the WindowFunctionSqlServerTest class and work backwards from there.

Let me know what you think.

Paul

@ajcvickers
Copy link
Member

@pmiddleton

I started implementing this feature in ef core back at xmas.

Exciting to have you back!

@roji
Copy link
Member Author

roji commented Feb 27, 2024

The fluent api approach is more intuitive from the average end developers standpoint.

Here I assume you mean using special LINQ constructs (EF.Functions.Over().PartitionBy(...)), right? If so, whether it's more intuitive or not really depends on whether the developer is already familiar with SQL window function concepts. For example, for the scenario of getting the number of Blogs in each Blog's category, the "natural" LINQ variant (as above) is:

_ = await context.Blogs
    .Select(b => new { Blog = b, TotalCount = context.Blogs.Count(b2 => b2.Category == b.Category) })
    .ToListAsync();

... whereas using a special construct would instead include something like:

EF.Functions.Over().PartitionBy(b.Category).Count(),

I of course chose the "easy" example (see discussion below), but for this category of cases, at least IMHO the 1st way is a much more intuitive, easy and discoverable way to express things without needing to understand windows functions etc.

You are not relying on behind the scenes ef core magic to hopefully translate things into what you want.

It's worth noting that to a certain extent, all of EF's LINQ translation is basically that; that is, uses don't express SQL concepts (either via strings or via SQL-conceptual operators, as many other ORMs do), but rather use C# code - and EF is responsible for figuring out how to best translate that. I do agree that there's an extremity where a very specific combination of LINQ operators need to be composed just right for EF's pattern matching to kick in and do the right thing - at that point a custom operator maybe makes more sense.

But I think we may be reaching the limit of where the conversation can lead without diving into the concrete details. We know that there's at least quite a few window function constructs that can be expressed quite naturally without any explicit SQL concepts (like OVER); we suspect there may be others where that may not be the case. A case-by-base analysis of the different window functions scenarios and what natural LINQ constructs are necessariy to express them would IMHO provide the information we need here (I did a tiny bit of that in my comment above but that's clearly not enough).

And once again, even if there are scenarios where a custom LINQ operator (OVER) is more appropriate, that still doesn't mean we shouldn't tackle the pattern-matching approach for the cases where we a nice, expressive LINQ construct exists without it.

Finally, the specific proposed syntax looks good to me (I'd need a bit more time to think about it in depth). One thing that may be a bit odd, is that the operator chain rooted over EF.Functions.Over() isn't typed to Employee in any way - and therefore the operators (PartitionBy, OrderBy) accept any argument (and also a non-lambda parameter in constrast to e.g. regular OrderBy). Making EF.Functions.Over typed (Over<Employee>) could be a slight improvement, or even just accept EF.Functions.Over(e), which would achieve the same.

@ajcvickers
Copy link
Member

ajcvickers commented Feb 27, 2024

And once again, even if there are scenarios where a custom LINQ operator (OVER) is more appropriate, that still doesn't mean we shouldn't tackle the pattern-matching approach for the cases where we a nice, expressive LINQ construct exists without it.

To me, this is the key. If the developer writes a LINQ query that would be best translated using a window function, then EF Core should do this. This is what we normally do for database functions like LIKE. But I'm pretty sure we'll also need a way to express window functions explicitly. Again, we do this for database functions like LIKE. (I just like writing like LIKE.) This is useful for cases that are impossible or messy to write in LINQ, but also useful for people coming to this from a database perspective and know basically what SQL they want. So, in a nutshell, I don't see a place where we don't do both. This also means its okay to do the explicit API first, and pattern match later.

@pmiddleton
Copy link
Contributor

@roji don't get me wrong. I'm all for finding patterns such as this and automatically doing the right thing behind the scenes.

_ = await context.Blogs
    .Select(b => new { Blog = b, TotalCount = context.Blogs.Count(b2 => b2.Category == b.Category) })
    .ToListAsync();

Your right that this is the most intuitive way to write this particular query in linq.

However the number of these scenarios is limited. What I meant by ef magic is that you could never hope to teach your average dev that we could transform something like this into a window function.

expectedSequence = TestRows
.Select(r => r.Col1)
.OrderBy(x => x, comparer)
.Select(v => groups
    .Where(g => comparer.Compare(g.Key, v) < 0)
    .Select(g => g.Count())
    .Sum() / (double)(TestRows.Length - 1)); 

Once you get beyond the simples cases, where honestly the average dev isn't even thinking about using a window function in the first place, then you end up with better usability from a well defined api.

@pmiddleton
Copy link
Contributor

@roji - good catch on the Over. In my various linq extension method approaches things were typed to Employee but that was dropped when I moved to the current approach. Ill take another look at that.

I am trying to think if there is ever a use case where you need to bind outside of the scope of Employee I can't think of one off the top of my head but I'll keep pondering.

@roji
Copy link
Member Author

roji commented Feb 27, 2024

However the number of these scenarios is limited. [...] Once you get beyond the simples cases, where honestly the average dev isn't even thinking about using a window function in the first place, then you end up with better usability from a well defined api.

Right, I think we're more or less aligned on this... I do want to spend a bit more time to think about the problematic scenarios: in at least some cases, there may be opportunities for proposing general improvements to LINQ that make sense (not just for databases), and that may also simplify cases where currently the LINQ expression isn't good. But I know that's very abstract (it's mainly about me having to spend more time thinking about LINQ & window functions to understand things better).

I am trying to think if there is ever a use case where you need to bind outside of the scope of Employee I can't think of one off the top of my head but I'll keep pondering.

I don't think that can ever happen in SQL, but I'm no expert on window functions. Ideally we'd have had the IQueryable for Employee where we want to specify a window function invocation, and we'd just hang Over() on top of that - but we don't, only the Select's element parameter. That's what led me to go in the direction of extracting the IQueryable out and referencing it twice as a simplification (@virzak also mentioned this possibility), e.g.:

var filteredBlogs = context.Blogs.Where(b => b.Id > 8);
_ = await filteredBlogs
    .Select(b => new { Blog = b, TotalCount = filteredBlogs.Count(b2 => b2.Category == b.Category) })
    .ToListAsync();

Count() here can be replaced with an Over() extension method over IQueryable, which flows the type forward like any normal LINQ operator. That would be the ideal way to express things IMHO, but unfortunately it does depend on extracting the IQueryable out, which may be a bit much (and is also a bit problematic for NativeAOT/static queries)...

@pmiddleton
Copy link
Contributor

pmiddleton commented Feb 27, 2024

Adding an Over method to IQueryable was one of my original ideas. One of the main issues I ran into is that you now have access to that Over method in places where it does not make any logical sense. I also tried hanging it off of Select elements parameter but that required binding an extension method to object so yea worse than hanging it off of IQueryable :)

What you need is an overload for Select that passes in both the entity and some new IWindow into the lambda which you can then Over on. Though at the end of the day that is just a slight variation of syntax vs the fluent api off of EF.Functions.

@roji
Copy link
Member Author

roji commented Feb 27, 2024

Yep, that all makes sense to me. FWIW I don't think it would be terrible to have Over() on IQueryable - there are other queryable operators which don't make sense everywhere. For example, most LINQ operators don't translate after GroupBy, and that's OK (we'd throw an informative error).

Also, at least this would be a queryable extension, which generally restricts it to EF queries only. This is why we allow ourselves to add queryable extensions, but not enumerable extensions (which would appear in totally non-EF contexts).

My main issue with the Over-over-queryable approach is that it requires breaking trhe query apart, to allow reusing the IQueryable as a local variable... Maybe that's not too terrible either, I don't know.

What you need is an overload for Select that passes in both the entity and some new IWindow into the lambda which you can then Over on.

That's a nice idea (though note that this Select overload would also be an IQueryable extension, like the Over() extension discussed above).

Also, note that window functions are also supported by OrderBy. And in fact, in @virzak extension you can use them everywhere (e.g. in a Where clause), and the extension detects that and does a subquery pushdown to have the function in the subquery projection. If we want to support this powerful high-level kind of query translation (which I think is a good idea), we'd need to basically allow window functions anywhere, so you also need overloads for all queryable operators - not great.

@pmiddleton
Copy link
Contributor

One thing that may be a bit odd, is that the operator chain rooted over EF.Functions.Over() isn't typed to Employee in any way

I took at quick look at this tonight. You can root the chain via the over method but it needs to be done via a passed parameter. The input to the parent linq method will quite probably be an anonymous type so it's the only option.

You then end up with a query that looks like this.

 MaxDiscount = EF.Functions.Over(od).PartitionBy(pb => new { pb.Quantity, pb.UnitPrice }).OrderBy(ob => ob.Quantity).Rows(RowsPreceding.CurrentRow, 5).Max(m => m.Discount),

vs what I currently have

MaxDiscount = EF.Functions.Over().PartitionBy(od.ProductID, od.OrderID).OrderBy(od.OrderID).Rows(RowsPreceding.CurrentRow, 5).Max(od.Discount),

While this does look more like a more typical linq operator, at the end of the day all I don't think it buys you anything. The use of lambdas still won't stop you from including references to any random in scope variable, the same as with the untyped approach. Using a fluent api approach means the window call already lives inside of a lambda from the parent select, orderby, etc. and is meant to use it's parameter. An entire query looks like this.

  var results = context.Employees.Select(e => new
  {
      e.Id,
      e.Name,
      MaxSalary = EF.Functions.Over().Max(e.Salary),
      MinSalary = EF.Functions.Over().Min(e.Salary)
  }).ToList();

Now I'm hardly a linq expert so I might be missing something?

@virzak - was there any reason your api went with one approach over the other? Were there any inherent advantages you found by doing it the way you did?

@roji
Copy link
Member Author

roji commented Feb 28, 2024

While this does look more like a more typical linq operator, at the end of the day all I don't think it buys you anything.

I agree there's not huge value to having lambdas - though there's also not huge value to not having them... Yeah, the main point is to align the operators with what LINQ operators typically look like, especially given that sometimes the exact same operators are used here (OrderBy, Max), and in other contexts these accepts lambdas. So it's a consistency thing.

Another similar thing is that it's generally odd for the operator chain to not be rooted to anything; this is why I was trying to explore options where Over() is composed over IQueryable, rather than a root-less EF.Functions.Over(). Passing the Select() parameter to Over() as least provides that root (albeit in a slightly weird way). I also think that LINQ users are used to dotting on the operator parameter, so having a lambda may better guide them to reference properties on e rather than on some arbitrary thing that wouldn't be legal (you're right that doing that is still possible even with lambdas, but the lambda does guide in the right direction IMHO).

I don't pretend that the above are very strong arguments - but then I don't think there are strong arguments to the other side (the syntax is slightly less verbose?).

@bachratyg
Copy link

Might not be intuitive with query syntax:

from a in SomeTable
from b in SomeRelatedTable
select new
{
   a.Number,
   b.Number,
   Diff = EF.Functions.Over(???).Max(a.Number - b.Number)
}

@virzak
Copy link

virzak commented Feb 28, 2024

@virzak - was there any reason your api went with one approach over the other? Were there any inherent advantages you found by doing it the way you did?

@pmiddleton Not sure I understand what would be an advantage of having a lambda inside a lambda. The MaxDiscount in your example is already part of a lambda expression body, so you can create any partition by or order by expression you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests