Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate string.Join and string.Concat #2981

Closed
1 task done
jnm2 opened this issue Sep 2, 2015 · 43 comments · Fixed by #28110
Closed
1 task done

Translate string.Join and string.Concat #2981

jnm2 opened this issue Sep 2, 2015 · 43 comments · Fixed by #28110
Assignees
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@jnm2
Copy link

jnm2 commented Sep 2, 2015

string.Join would be a cool feature to have now and then. Some database engines support this with an aggregate function, others such as SQL Server need a little hand-holding. Pluses include terser code and less data over the wire when projecting, since the alternative to concatenating on the server would be returning a child string collection for each row and concatenating the collection after the query. (If I understand correctly, child collections are usually transferred by joining each row of the child collection to the original row set and duplicating a lot of data.)

I expect that this would be low priority for you to add to EF, but I'm curious if anything is being done to allow extensions to the LINQ to query translation. We would be thrilled to implement string.Join support ourselves if it didn't require modifying EF and provider source code every release and recompiling.

I wasn't able to find anything about extending LINQ to query translation so I apologize if I missed it, but I know you are dramatically refactoring for EF Core and I can imagine many cool scenarios would be possible if there was an extension point in LINQ querying.

Depends on:

@jnm2 jnm2 changed the title Enabling string.Join support and extending LINQ to query Enabling string.Join support and extending LINQ to query Sep 2, 2015
@rowanmiller rowanmiller added this to the Backlog milestone Sep 11, 2015
@rowanmiller rowanmiller changed the title Enabling string.Join support and extending LINQ to query Enabling string.Join support Sep 11, 2015
@rowanmiller
Copy link
Contributor

We have extensibility, we don't have any docs yet but you want to look at MethodCallTranslator.

Leaving this open to track String.Join. We would definitely be interested in accepting a contribution for this.

@rowanmiller rowanmiller added the help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. label Sep 11, 2015
@jnm2
Copy link
Author

jnm2 commented Mar 30, 2016

Notes: For SQL Server, you must use for xml path(''), type, otherwise XML characters like <, > and & will be escaped into &gt; etc.

for xml path(''), type).value('(./text())[1]', 'nvarchar(max)') has much greater performance than for xml path(''), type).value('.', 'nvarchar(max)') (13.2x in my tests).

(see this thread)

@jnm2
Copy link
Author

jnm2 commented Jul 1, 2016

Also: string.Concat

I'm planning to do a contribution when I get time. I'm waiting for the standard dust to settle after 1.0.0 before I use EF Core in production because I see a fair number of bugs.

@akamud
Copy link

akamud commented Mar 28, 2017

I understand this can be extended adding a new MethodCallTranslator, but what is the best way to add it to EF? The classes I found are all in the internal namespace.

@smitpatel
Copy link
Member

Look at SqlServerStringSubstringTranslator. That will give you general pattern how those translators work. They are in Internal namespace but that is ok. Internal namespace allows us to make changes without breaking users since they shouldn't be directly interacting with those APIs.

@akamud
Copy link

akamud commented Mar 30, 2017

I understand that, but isn't there a way to extend the functionality without using Internal namespaces? That isn't really "extensible" from my point of view.

I don't wanna produce code that may break with every EF update if I'm just "extending" the framework.

@smitpatel
Copy link
Member

This issue talks about implementing functionality in EF core itself, which is not exactly "extending" framework. Rather extending (or improving to be more precise) Linq-to-SQL translation.
At present we don't have extension point for user to plug in custom method call translator. The current extension point are for provider writers to provide their custom translators.

Is there a specific reason you want to "extend" the framework?

@akamud
Copy link

akamud commented Mar 30, 2017

I was under the impression that EF supported extensibility from Rowan's comment.

The ability to crate my own translators for common queries looks like a great feature. Similar to what nein-linq does.

@peterwurzinger
Copy link

peterwurzinger commented Jul 10, 2017

Let me revive this thread @rowanmiller :
My plan for the SQL-Server implementation is to map string.Join(...) to T-SQLs CONCAT or CONCAT_WS via implementing an IMethodCallTranslator.

My first question is, if there is a general mistake in my thought? Imho calling CONCAT/CONCAT_WS would lead to a quite similar behavior as string.Join offers.

The second question refers to a gerneral strategy regarding the support of DBMS versions. As already mentioned there are two T-SQL functions to handle string concatenation - CONCAT and CONCAT_WS. While CONCAT is already available since SQLServer 2012 CONCAT_WS will be introduced with 2017.
I guess CONCAT would then be the way to go to support older systems/applications also..?

@jnm2
Copy link
Author

jnm2 commented Jul 10, 2017

I'm pretty sure concat only works when the number of strings is known when building the query. Each string has to be passed in as an argument. It's not like you could do CONCAT(SELECT Str FROM X).

Last time I researched this, this was the state of the art, and it does require attention to avoid some tricky encoding corner cases: http://sqlblog.com/blogs/rob_farley/archive/2010/04/15/handling-special-characters-with-for-xml-path.aspx

@akamud
Copy link

akamud commented Jul 10, 2017

I still don't understand how we can add an IMethodCallTranslator to the engine in a way that isn't hackish, everything is private or internal.

Still no documentation on this?

@divega
Copy link
Contributor

divega commented Jul 11, 2017

FWIW starting with SQL Server 2017, STRING_AGG() should be a good option for the translation of string.Join() in the cases that the input comes from a table.

@peterwurzinger
Copy link

@jnm2 You're absolutely right, I confused relations with scalar values. SQL-Server won't execute a script like

SELECT CONCAT(SELECT 'a' UNION SELECT 'b')

Maybe I can find the time to work into the FOR XML PATH - thing and adopt my implementation accordingly.

@divega: STRING_AGG seems to do the trick CONCAT isn't able to. But it will only work for SQLServer Versions >= 2017 and since there is just one general EFCore-package for SQLServer one would break the compatibility with the majority of SQLServer instances.

@divega
Copy link
Contributor

divega commented Jul 11, 2017

since there is just one general EFCore-package for SQLServer one would break the compatibility with the majority of SQLServer instances

Until now it just happens that we haven't added enough features that are version specific for it to become really compelling but we could certainly add API to enable groups of features for a version of the database server, e.g. the most granular version:

optionsBuilder.UseSqlServer(connectionString, opt=>opt.Use2017Functions());

A bit more general:

optionsBuilder.UseSqlServer(connectionString, opt=>opt.Use2017Features());

or even shorter:

optionsBuilder.UseSqlServer2017(connectionString);

We have discussed this before, e.g. at #9000 (comment) and #3011 (comment).

@peterwurzinger
Copy link

@divega I personally like this approach, but its concrete implementation goes far beyond the borders of a normal contribution since it would (at least) touch all SQLServer specific mappers/translators/younameit.

@smitpatel
Copy link
Member

smitpatel commented Jul 11, 2017

@akamud - The thing you want to achieve is custom translation. Check #7368 which adds user defined scalar functions. Then you will have a local function which would do string.Join for you and provide a SQL translation for that (using .HasTranslation). It will not be hacky but you would still need to understand Expression tree etc to understand what should the translation look like.

This issue talks about adding translation in-built in EF. Adding another MethodCallTranslator which derives from IMethodCallTranslator inside EF is not hack-ish. Its internal but its code in same assembly.

@peterwurzinger - For implementation, you can start with very basic approach taking a translation which would work on all SqlServer editions. Then it would be localized enough to particular Translator being added.

In 2nd phase we need to flow information about SqlServer2017 functions to methodcall translators. There would be 2 approaches to that.

  • Either do normal translation as is and modify generated SQL-tree to change the methods. (e.g. Look at RowNumberPagingExpressionVisitor which changes Fetch/Offset to rownumber())
  • Or modify SqlServerCompositeMethodCallTranslator to take additional dependency on ISqlServerOptions (check SqlServerQueryGeneratorFactory class on how it takes dependency and flow the information about rownumber paging to SqlGenerator) and based on the options set to use SqlServer 2017 functions add different set of translators. So translators being added to the list will be conditional based on the options set.

@ajcvickers ajcvickers removed propose-close help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. labels Feb 16, 2018
@9Rune5
Copy link

9Rune5 commented Oct 25, 2018

As much as I was happy to discover STRING_AGG(), I cannot help but feel that it would have been better had sql server added support for an array datatype instead. I suspect that would solve quite a few things.

@msmolka
Copy link

msmolka commented Dec 17, 2018

@9Rune5 isn't array types supported by memory optimized tables/types? You can create memory optimized type and pass it as parameter I believe. I didn't not checked but I think this is supported.

@9Rune5
Copy link

9Rune5 commented Dec 17, 2018

@9Rune5 You can create memory optimized type and pass it as parameter I believe.

Sure, but in this case we want to have a column in the result set contain a subtable.

context.MyTable.Select(t=>new{t.Id, mysubs = t.Subtable.Select(sub=>new{sub.Id, sub.Description}});

When I do stuff like that, EF generates queries with some mighty impressive joins and I am left with the feeling that those queries could be further optimized had SQL Server been able to return a table column type. Instead I'm left with some funky behaviour if I'm foolish enough to employ some .Skip() and .Take().

For me, string.Join() support is a desperate measure to get halfway to properly supporting the type of EF query in my example. I have a problematic join like this in some code I wrote recently, and STRING_AGG() seemed to go a long way of resolving the performance issues I witnessed. In that particular case I was able to live with the compromise of not having the data returned in an orderly fashion. A somewhat messy-looking string would have sufficed.

But many of my assumptions could turn out to be dead wrong and the only real solution is lazy-loading (or something completely different). After almost a decade of using EF, my team still suffer from "maybe hand-written sql can get us out of this performance jam"-ities and that is unfortunate. (the result isn't nice)

@Looooooka
Copy link

Not just string.join...mssql supports returning xml and json strings("FOR JSON AUTO" and "FOR JSON PATH" ).
There has been talk about basic string.join support since EF v 1 and nothing has been done about it. Even when you manage to write 99% of the SQL queries using linq, the second you realize these 3 functions return smaller and faster results than anything EF can come up with, you are stuck with using EF's raw sql query support, which kind of sucks because you're stuck with non-typed string datas representing actual T-SQL. And if you hand that magic piece of code to team members who don't know T-SQL, you're stuck with maintaining that piece of code for all eternity. I honestly cannot wait for the day we get basic functionality of any one of these functions. Even if the Expression needed to be used is complex as hell. Anything to get rid of raw T-SQL queries.

@roji
Copy link
Member

roji commented May 25, 2022

Null semantics of STRING_AGG
DROP TABLE IF EXISTS data;
CREATE TABLE data
(
    id INT PRIMARY KEY IDENTITY,
    name VARCHAR(MAX),
);

-- Simple sample
INSERT INTO data (name) VALUES ('foo'), ('bar');
SELECT STRING_AGG(name, ', ') FROM data; -- foo, bar

-- NULLs are ignored and the corresponding separator is not added
DELETE FROM data WHERE 1=1;
INSERT INTO data (name) VALUES ('foo'), (NULL), ('bar');
SELECT STRING_AGG(name, ', ') FROM data; -- foo, bar

-- NULLs only results in NULL
DELETE FROM data WHERE 1=1;
INSERT INTO data (name) VALUES (NULL);
SELECT STRING_AGG(name, ', ') FROM data; -- NULL

-- Empty table results in NULL
DELETE FROM data WHERE 1=1;
SELECT STRING_AGG(name, ', ') FROM data; -- NULL

@roji roji changed the title Enabling string.Join support Translate string.Join and string.Concat May 25, 2022
@roji roji removed this from the Backlog milestone May 25, 2022
roji added a commit to roji/efcore that referenced this issue May 26, 2022
* string.Join
* string.Concat

Closes dotnet#2981
roji added a commit to roji/efcore that referenced this issue May 26, 2022
* string.Join (SQL Server and SQLite)
* string.Concat (SQL Server and SQLite)
* Standard deviation and variance (SQL Server)

Closes dotnet#2981
Closes dotnet#28104
@roji roji added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label May 27, 2022
@ajcvickers ajcvickers added this to the 7.0.0 milestone May 31, 2022
@wertzui
Copy link
Contributor

wertzui commented Jun 16, 2022

@ajcvickers as this has been added to the 7.0.0 milestone does that mean that it will be included in EF Core 7?

@roji
Copy link
Member

roji commented Jun 16, 2022

@wertzui that means that it's in the plan for 7.0, but since it's still open, work hasn't been done yet, so there's always a chance it may get pulled out of the release.

Specifically for this though, there's a PR out, so chances are it will be in.

roji added a commit to roji/efcore that referenced this issue Jun 18, 2022
* string.Join (SQL Server and SQLite)
* string.Concat (SQL Server and SQLite)
* Standard deviation and variance (SQL Server)

Closes dotnet#2981
Closes dotnet#28104
roji added a commit to roji/efcore that referenced this issue Jun 18, 2022
* string.Join (SQL Server and SQLite)
* string.Concat (SQL Server and SQLite)
* Standard deviation and variance (SQL Server)

Closes dotnet#2981
Closes dotnet#28104
@ajcvickers
Copy link
Member

@roji Is this done?

@roji
Copy link
Member

roji commented Jul 6, 2022

Not yet, #28110 still has to be merged (I think there's very little left).

roji added a commit to roji/efcore that referenced this issue Jul 9, 2022
* string.Join (SQL Server and SQLite)
* string.Concat (SQL Server and SQLite)
* Standard deviation and variance (SQL Server)

Closes dotnet#2981
Closes dotnet#28104
@ghost ghost closed this as completed in #28110 Jul 9, 2022
ghost pushed a commit that referenced this issue Jul 9, 2022
* string.Join (SQL Server and SQLite)
* string.Concat (SQL Server and SQLite)
* Standard deviation and variance (SQL Server)

Closes #2981
Closes #28104
@ajcvickers ajcvickers modified the milestones: 7.0.0, 7.0.0-preview7 Jul 10, 2022
@ajcvickers ajcvickers modified the milestones: 7.0.0-preview7, 7.0.0 Nov 5, 2022
@magicxor
Copy link

magicxor commented Jul 8, 2023

I'm using the string.Join method in an EF Core (Microsoft.EntityFrameworkCore.SqlServer 7.0.5) LINQ query and I've encountered an inconsistency that I can't quite understand.

The following code works without any issues:

var result = await this.Addresses
  .Select(address => string.Join(", ",
	  new List<string>() { address.Street1, address.City, address.Country }))
  .Take(10)
  .ToListAsync();

However, when I try to filter out empty strings from the list using Where(s => s != "") before the string.Join, the query doesn't work:

var result = await this.Addresses
  .Select(address => string.Join(", ",
	  new List<string>() { address.Street1, address.City, address.Country }.Where(s => s != "")))
  .Take(10)
  .ToListAsync();

(it throws InvalidOperationException: The LINQ expression 's => s != ""' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or 'ToListAsync'. See https://go.microsoft.com/fwlink/?linkid=2101038 for more information.)

I'd expect both queries to work. Could someone provide some insight into why the second query doesn't execute properly? Is there something I'm overlooking or is it potentially a bug or limitation of the EF Core's LINQ provider?

@roji
Copy link
Member

roji commented Jul 8, 2023

@magicxor the composition of LINQ operators over primitive collections (new List<string>() { ... }.Where(...)) isn't yet supported in EF Core 7.0. The good news is that this feature is coming in EF Core 8.0 (announcement), though this specific query still doesn't translate there because of #30922 (probably), will make a note there with your case.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.