Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize codegen for collections expression of single spread of ReadOnlySpan for collection builder emit strategy #73102

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

DoctorKrolic
Copy link
Contributor

Implements suggestion from #71296 (comment). Now when collection expression if of form [.. readOnlySpan] we can just take that readOnlySpan instead of copying it since we know that it is an immutable type

…OnlySpan` in case of collection builder emit strategy
@DoctorKrolic DoctorKrolic requested a review from a team as a code owner April 19, 2024 12:36
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Apr 19, 2024
@dotnet-policy-service dotnet-policy-service bot added the Community The pull request was submitted by a contributor who is not a Microsoft employee. label Apr 19, 2024
@DoctorKrolic DoctorKrolic changed the title Optimize codegen for collections expression of single spread of ReadOnlySpan in case of collection builder emit strategy Optimize codegen for collections expression of single spread of ReadOnlySpan Apr 19, 2024
@CyrusNajmabadi
Copy link
Member

Can you state explicitly under what circumstances this optimization will apply? For example I want to make sure it does not apply to:

ReadOnlySpan<int> intCopy = [.. rosInts]

Thanks!

@DoctorKrolic
Copy link
Contributor Author

ReadOnlySpan<int> intCopy = [.. rosInts]

It does apply to this case, making it effectively a reassignment. Why it is bad? The ROS is an immutable type and the previous codegen didn't perform deep copy of elements anyway, so this in unobservable to the user (equalify change doesn't count since it isn't guaranteed for collection expressions). Am I worng?

@CyrusNajmabadi
Copy link
Member

Ros is not an immutable type. It's an immutable view over potentially mutable data.

Trivial example, wrap a normal array with a ros. Then assign to two other ROSs (one as a normal assignment, one as a spread copy). Now change the original array. What do you see?

@CyrusNajmabadi
Copy link
Member

Afaict, this is only safe if passed to a collection builder Create method that is scoped. That way we know the new collection must be making a copy itself. Right @RikkiGibson?

@RikkiGibson
Copy link
Contributor

It's incorrect to optimize [..ros] to ros. You have to copy it, even if target type is also ReadOnlySpan, due to semantic differences Cyrus has laid out.

Afaict, this is only safe if passed to a collection builder Create method that is scoped. That way we know the new collection must be making a copy itself. Right @RikkiGibson?

I'm not following the scenario you have in mind. If a Create method is being used to create the resulting collection, then it's not the ReadOnlySpan<T> ros1 = [..ros]; case.

I can't think of any case where we can safely "skip" copying when user code includes a spread. It would have to involve proving that the spread operand and its referents is not referenced by anything else, and treating it like a "move" of the storage instead of a copy. It's not a type of optimization we really do in Roslyn. Especially for a case where, if user wants those semantics, they could just replace [..ros] with ros.

@RikkiGibson
Copy link
Contributor

RikkiGibson commented Apr 19, 2024

Looking at the code a bit more. Yeah, I see that VisitCollectionBuilderCollectionExpression is implemented by just lowering the collection expr as ReadOnlySpan then passing it to Create. This means that if you do only the ImmutableArray part of this PR then you make codegen worse by doing that copy.

In the case of MyCollectionBuilderCollection<T> collection = [..ros], where ros is ReadOnlySpan<T>, it def feels like you could skip copying the ros.

And yes, to avoid a behavioral breaking change you need to take care to still do the copy for RefStruct<T> Create(/*non-scoped*/ ReadOnlySpan<T>).

@CyrusNajmabadi
Copy link
Member

I can't think of any case where we can safely "skip" copying when user code includes a spread. It would have to involve proving that the spread operand and its referents is not referenced by anything else, and treating it like a "move" of the storage instead of a copy.

Here's teh scenario:

ReadOnlySpan<int> x = ... backing mutable store ...;
ImmutableArray<int> y = [.. x];

This should be safe (IMO, but correct me if i'm wrong), but it should be possible to compile this to:

ImmutableArray<int> y = ImmutableArray.Create<int>(x);

Instead of:

ReadOnlySpan<int> __compilerCopy = ... copy x ...;
ImmutableArray<int> y = ImmutableArray.Create<int>(__compilerCopy);

Basically, if the ROS arg is scoped, we know that the final collection can't keep a reference to it. So it will make a copy itself, so we don't need to.

(I could def be wrong on this).

@RikkiGibson
Copy link
Contributor

I agree with your assessment.

@CyrusNajmabadi
Copy link
Member

And yes, to avoid a behavioral breaking change you need to take care to still do the copy for RefStruct Create(/non-scoped/ ReadOnlySpan).

Interestingly enough, ImmutableArray.Create doesn't seem to have a scoped argument:

https://github.com/dotnet/runtime/blob/b067ce009ab5992168d1d635a9a9beb387008608/src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray.cs#L90

I'm curious why that is. @stephentoub any insights here?

@RikkiGibson
Copy link
Contributor

You only ever need scoped when the signature has a ref struct output. In this case it only returns ImmutableArray so we already know it can't return references in the parameter.

@CyrusNajmabadi
Copy link
Member

In the case of IA, we'd either need to special case. Or see if the signature of it could be updated. And if there's a good reason the sig is as it is, and the non-copy is not-safe, we likely should not special case this type. :)

@RikkiGibson
Copy link
Contributor

RikkiGibson commented Apr 19, 2024

@DoctorKrolic as far as how to adjust the implementation. I would recommend to do a check in VisitCollectionBuilderCollectionExpression to see if we are in this special case:

  • collection-expr is a ROS of a single spread, with same element type as the collection-expr, and Create method cannot return references in its argument.

In which case, don't call into VisitArrayOrSpanCollectionExpression, just use the (lowered) spread-operand expression as the argument to Create.

@RikkiGibson
Copy link
Contributor

Sorry for the spam. It can probably be generalized a little further than this, e.g. if expr in [..expr] is implicitly convertible to the ReadOnlySpan that we want, in a way that doesn't imply copying (for example, if it is an array or mutable span), we could also just do that conversion and pass the result to Create (given the ref safety criteria is also met. Dealer's choice on whether you actually go for that optimization.

@DoctorKrolic
Copy link
Contributor Author

@RikkiGibson Can you clarify this part please:

Create method cannot return references in its argument

I am generally not that much familiar with scoped and such, so an example of a test, which would fail in case we reuse existing ROS where we shouldn't would be helpful

@CyrusNajmabadi
Copy link
Member

You only ever need scoped when the signature has a ref struct output. In this case it only returns ImmutableArray so we already know it can't return references in the parameter.

PErfect. That makes sense. Thanks!

@CyrusNajmabadi
Copy link
Member

@stephentoub Ignore the ping :)

@DoctorKrolic
Copy link
Contributor Author

for example, if it is an array or mutable span), we could also just do that conversion and pass the result to Create (given the ref safety criteria is also met

I would like to focus on the original case here with ROS only to keep the size of the PR small. My expirience says that making several "simple" optimizations might result in exponential explosion of test cases, which need to be checked)

@CyrusNajmabadi
Copy link
Member

I am generally not that much familiar with scoped and such, so an example of a test, which would fail in case we reuse existing ROS where we shouldn't would be helpful

Sure. Say you have the following:

[CollectionBuilder(typeof(MyRefType), "Create")]
ref struct MyRefType<T>
{
// ...
}

static class MyRefType
{
    public static MyRefType<T> Create<T>(ReadOnlySpan<T> values); // versus
    public static MyRefType<T> Create<T>(scoped ReadOnlySpan<T> values);
}

In the case of:

ReadOnlySpan<int> s = ... create ...;
MyRefType<int> t = [.. s];

Because the first MyRefType.Create overload returns a ref-struct, and is not scoped, it might capture the ROS being passed in, and thus would see the underlying values of it change if it wrapped something like a mutable array. So, to be safe, we must make a copy.

However, the arg to the second 'Create' overload is 'scoped'. This informs the compiler taht it could not be captured by the return value of create. And as such, we don't need to make a copy.

As rikki points out, this is only needed when returning a ref-struct, as a non-ref-struct could not ever capture the ROS coming in, so it would have to be making a copy to function at all.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Apr 19, 2024

Because the first MyRefType.Create overload returns a ref-struct, and is not scoped, it might capture the ROS being passed in, and thus would see the underlying values of it change if it wrapped something like a mutable array. So, to be safe, we must make a copy.

AS an fuller example of that:

[CollectionBuilder(typeof(MyRefType), "Create")]
ref struct HalfSpan<T>
{
    private readonly ReadOnlySpan<T> _ros;
    public int Length => _ros.Length / 2;
    public T this[int index] => _ros[index * 2];
}

static class MyRefType
{
    public static MyRefType<T> Create<T>(ReadOnlySpan<T> values) => new HalfSpan<T>(values);
}

// later

string[] values = ["a", "b", "c", "d";
HalfSpan<string> s = [.. values];

// This change must not be visible in 's'.  `.. values` promises you get your own 'copy' of hte values.
values[0] = "A"; values[1] = "B"; values[2] = "C"; values[3] = "D";

@RikkiGibson
Copy link
Contributor

As far as how to correctly implement the ref safety criteria. I think the following will do (in a code path where we are already narrowed down to the (MyCollectionBuilderType)[..readOnlySpan] case):

var canSkipCopyingArgument = !createMethod.ReturnType.IsRefLikeType || createMethod.Parameters[0].EffectiveScope != ScopedKind.ScopedValue;

@DoctorKrolic DoctorKrolic changed the title Optimize codegen for collections expression of single spread of ReadOnlySpan Optimize codegen for collections expression of single spread of ReadOnlySpan for collection builder emit strategy Apr 19, 2024
@DoctorKrolic
Copy link
Contributor Author

@CyrusNajmabadi @RikkiGibson I think this is conceptually ready, now would like a review of implementation code please

@RikkiGibson
Copy link
Contributor

I would welcome a follow-up PR which also handles cases where the spread operand is implicitly convertible to ReadOnlySpan<T>.

@RikkiGibson
Copy link
Contributor

@dotnet/roslyn-compiler for a second review of a smallish community PR

@DoctorKrolic
Copy link
Contributor Author

@cston PTAL

@DoctorKrolic DoctorKrolic requested a review from cston April 25, 2024 19:00
@cston
Copy link
Member

cston commented Apr 26, 2024

Thanks @DoctorKrolic, the change looks good, but we'll hold off merging while C#13 feature work is in progress.

The reason is we've spent significant time recently addressing unintended breaking changes which has taken time away from C#13. To reduce overhead and risk of regressions while we focus on C#13, we’ll limit non-essential changes for now. We'll revisit postponed PRs as we finish up feature work.

Since this change is an optimization, and not driven by a reported issue, we’ll hold off merging and revisit later. I've marked the issue as Blocked to prevent merging in the meantime. Thanks for your patience.

@DoctorKrolic
Copy link
Contributor Author

while C#13 feature work is in progress

@cston When the feature in question is expected to be done? I have a few optimizing PR ideas, but I don't want to introduce even more overhead to the team while you are working on a feature

@cston
Copy link
Member

cston commented May 2, 2024

@DoctorKrolic, we'll probably revisit postponed PRs after wrapping up 17.11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Blocked Community The pull request was submitted by a contributor who is not a Microsoft employee. untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants