Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate non-aggregate string.Join to CONCAT_WS on SQL Server #28900

Draft
wants to merge 1 commit into
base: release/7.0
Choose a base branch
from

Conversation

roji
Copy link
Member

@roji roji commented Aug 26, 2022

As usual, this was a bit more tricky than it looks.

  • The translation is in SqlServerSqlTranslatingExpressionVisitor because there's an array parameter.
  • CONCAT_WS is interesting in that the type it returns has a length based on its inputs, so concatenating 3 4-letter words with a 1-char delimiter returns a varchar(14). We don't have column/parameter values, so I'm setting the return mapping to be varchar(max) or nvarchar(max) (based on whether we've seen nvarchar or not). See below for
  • There's also CONCAT which is similar to CONCAT_WS. We already have a relational translation for string.Concat in StringMethodTranslator, but that works only for the overloads with 2-4 arguments, and not for 5+ (which has an array parameter). We could translate to CONCAT instead just like for CONCAT_WS, but then we should override the relational and do it regardless of number of args (shouldn't use different translations for different arg numbers). If you agree I can do this.

Closes #28899

Interesting experiments for CONCAT_WS result type
SELECT CONCAT_WS(', ', CAST('foo' AS varchar(max)), CAST('bar' AS varchar(max)));

SELECT CONCAT_WS(', ', 'foo', 'bar'); -- varchar(8), adds lengths of arguments + delimiter as necessary
SELECT CONCAT_WS(', ', CAST('f' AS varchar(1)), CAST('b' AS varchar(1))); -- varchar(4)
SELECT CONCAT_WS(', ', CAST('f' AS varchar(1)), 'bar'); -- varchar(6)
SELECT CONCAT_WS(', ', CAST('f' AS varchar(3)), 'bar'); -- varchar(8)
SELECT CONCAT_WS(', ', 'f', CAST('bar' AS varchar(max))); -- varchar(max)

SELECT CONCAT_WS(', ', 'foo', CAST('bar' AS char(3))); -- varchar(8), char expanded to varchar
SELECT CONCAT_WS(CAST(', ' AS char(2)), CAST('foo' AS char(3)), CAST('bar' AS char(3))); -- varchar(8), even though all arguments are char

SELECT CONCAT_WS(', ', N'foo', 'bar'); -- nvarchar(16), varchar treated as nvarchar

-- Look at this thing (one for the book):
SELECT CONCAT_WS('|', REPLICATE('x', 7999), 'bar'); -- returns ...xxxxxb ('ar' is truncated)

-- To find out an expression's type:
DECLARE @what sql_variant;
SELECT @what = 'some expression';
SELECT
    SQL_VARIANT_PROPERTY(@what, 'BaseType'),
    SQL_VARIANT_PROPERTY(@what, 'Precision'),
    SQL_VARIANT_PROPERTY(@what, 'Scale'),
    SQL_VARIANT_PROPERTY(@what, 'MaxLength');

@roji roji requested a review from smitpatel August 26, 2022 15:49
@smitpatel
Copy link
Member

SELECT CONCAT_WS('|', REPLICATE('x', 7999), 'bar'); -- returns ...xxxxxb ('ar' is truncated)

does that mean any result string crossing 8000 (or 4000 if unicode), will be truncated?

@smitpatel
Copy link
Member

Is this required for 7.0?

@roji
Copy link
Member Author

roji commented Aug 26, 2022

SELECT CONCAT_WS('|', REPLICATE('x', 7999), 'bar'); -- returns ...xxxxxb ('ar' is truncated)

does that mean any result string crossing 8000 (or 4000 if unicode), will be truncated?

Yes 🙄

If there's a single varchar/nvarchar(max) in there, the result is also max, so no truncation occurs. So this only affects the case where all arguments (and the delimiter) are non-max.

Is this required for 7.0?

No, not required.. The reason I did this is that we've added string.Join translation in another context (aggregate), so it's nice to be able to just say "we now support string.Join" (in both aggregate and non-aggregate contexts). It also seems very low-risk, but if we're against it we can do it for 8.0.

@smitpatel
Copy link
Member

It is not very "low-risk". I prefer not to do it. Not really a frequently asked feature.

@roji
Copy link
Member Author

roji commented Aug 27, 2022

Well, it's just a new translation for something we didn't translate before, so I'm not sure in what sense the risk can be high.

If you're concerned specifically with the truncation, we can introduce a CAST to varchar/nvarchar(max) e.g. on the delimiter, which would work around that. Or we can leave it as-is as a SQL Server quirk, like how we do with trailing whitespace.

@roji
Copy link
Member Author

roji commented Aug 31, 2022

Making this a draft as we're not doing this in 7.0.

@roji
Copy link
Member Author

roji commented Apr 4, 2023

Note: CONCAT_WS exists since SQL Server 2017 (14.x). We can use the compatibility level (#30163) to determine whether to translate or not (or to throw).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Translate non-aggregate string.Join to SQL Server CONCAT_WS
3 participants