Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompiled query inner loop source generator #33350

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

roji
Copy link
Member

@roji roji commented Mar 18, 2024

This is a draft implementation for our NativeAOT/precompiled query inner loop source generator. The overall approach seems complete, but I'm submitting an incomplete draft mainly to get a review from @sharwell that this follows incremental source generator best practices won't cause performance issues - even with a lot of LINQ queries in the project. Of course anyone else is welcome to review as well.

This source generator identifies queryable LINQ queries which we know we can precompile, and injects a special marker node into their expression tree to make them work in the inner dev loop; any query without this marker node represents a query that couldn't be precompiled, and causes an exception. The source generator also emits warnings for such incompatible queries.

This source generator:

  1. Searches for a closed list of well-known LINQ operators which terminate IQueryable LINQ queries (i.e. cause them to execute). e.g. ToList(), ToDictionaryAsync(). Regular LINQ queries (over IEnumerable) are unaffected in any way.
  2. For those, it works down the method invocation chain of LINQ operators, and verifies that the root is an EF DbContext. Anything else (e.g. a LINQ query that's broken across multiple variables) is considered a dynamic query that can't get precompiled (a warning is generated).
  3. Queries that pass are considered safe; the generator generates an interceptor for the terminating operator that injects the safe marker node, and then calls the original operator. For example, ToList() becomes:
public static global::System.Collections.Generic.List<TSource> ToList_Safe<TSource>(
    this global::System.Collections.Generic.IEnumerable<TSource> source)
{
    var safeWrapped = global::Microsoft.EntityFrameworkCore.Query.Internal.PrecompiledQuerySafeMarker.Compose((
        global::System.Linq.IQueryable<TSource>)source);
    return global::System.Linq.Enumerable.ToList(safeWrapped);
}

Some resources for getting familir with incremental source generators (for the team):

Closes #32727

@roji roji requested review from sharwell and a team March 18, 2024 21:11
// TODO: Also allow per-source-file metadata which enables/disables the analysis?
// https://github.com/dotnet/roslyn/blob/main/docs/features/incremental-generators.cookbook.md#consume-msbuild-properties-and-metadata

// TODO: ideally, the terminatingOperators pipeline below wouldn't even run if the source generator isn't enabled (this is an opt-in
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sharwell FYI

.Select((t, _) => t.Left)
.WithTrackingName("EF" + nameof(LinqQuerySourceGenerator));

// TODO: Currently all interceptors from the entire project go into the same file, which may be a lot to redo every time something
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sharwell FYI


break;

// TODO: we currently check symbols by their name; should we switch to loading symbols from the Compilation (via
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sharwell FYI, I've been wondering about the best (and most efficient) way to go about checking symbols.

// TODO: method symbols and confirm it's an IQueryable flowing all the way through, etc.

// Work backwards through the LINQ operator chain until we reach something that isn't a method invocation
while (expression is InvocationExpressionSyntax
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the generator recurses through the chain of LINQ operator invocations. I've made this as light as possible for now (i.e. syntax-only, no symbol checking or anything); note that this doesn't support some (contrived) stuff such as non-extension invocation syntax for LINQ operators (which I think could be OK to not support).

public readonly string? InterceptorDeclaration;
public readonly Diagnostic? Diagnostic;

// TODO: Do these need to participate in the equality/hashcode check? By definition they change if the originating syntax node
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sharwell FYI, am not 100% clear on how this works... I'm pretty sure these values need to be cached, but I wanted confirmation.


namespace Microsoft.EntityFrameworkCore;

public sealed class FakeAnalyzerConfigOptionsProvider(params (string, string)[] globalOptions) : AnalyzerConfigOptionsProvider
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 This should be in tests


namespace Microsoft.EntityFrameworkCore;

public class LinqQuerySourceGeneratorTests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 I recommend rewriting this entire class to use the roslyn-sdk test library

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a sample you can point me at? I thought this class was already pretty simple/clean, but no objections to making it even better :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically you start by adding the verifier helper types to the project:

https://github.com/microsoft/vs-extension-testing/tree/31112f2ed8ab9f3b65740717388553f37922b1df/src/Microsoft.VisualStudio.Extensibility.Testing.SourceGenerator.UnitTests/Verifiers

Then you can use them like this:

https://github.com/microsoft/vs-extension-testing/blob/31112f2ed8ab9f3b65740717388553f37922b1df/src/Microsoft.VisualStudio.Extensibility.Testing.SourceGenerator.UnitTests/TestServicesSourceGeneratorTests.cs

Other testing approaches are almost guaranteed to have significant unintentional test holes. Many dozens of holes discovered in real-world analyzers are handled automatically by the testing library.

// .Select((t, _) => t.Right);

var terminatingOperators = context.SyntaxProvider
.CreateSyntaxProvider(IsPossibleTerminatingOperator, ProcessLinqQuery)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 I recommend creating a ForInvocationWithMetadataName method based on ForAttributeWithMetadataName.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the implementation, if I understand correctly this means using CompilationProvider to walk over the all the compilation's syntax trees and find the "interesting" nodes, rather than use SyntaxValueProvider itself.

I'm not sure exactly how often a compilation occurs as the user is editing, but doesn't this mean that this potentially expensive operation needs to happen every time a compilation occurs (which might be quite frequent)? Could you provide a bit more context on how this approach is faster than the current one, for my understanding?

(note that here there's not a single method we're interested in, but rather a couple dozen - so the string-based switch would still need to be there rather than a single metadata name passed to ForInvocationWithMetadataName).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of ForAttributeWithMetadataName contains numerous optimizations intended to reduce work performed at locations in code which are not the syntax of interest. Regardless of how many LINQ queries exist in the code, there will be even more code that is not a LINQ query. It's more important to avoid doing work when that other code is changed than it is to avoid work when a LINQ query changes.


// TODO: Currently all interceptors from the entire project go into the same file, which may be a lot to redo every time something
// TODO: changes; maybe generate an interceptor file per source file, to limit the regeneration scope to a single file?
// TODO: Is that possible (how)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just assign an output file name to each interceptor. Here's an example where I have a step to assign file names to each item, and then break all those items back apart so generation only needs to occur for the ones that changed.
https://github.com/dotnet/roslyn-analyzers/blob/22a5b5af1a402fbba34dfbbdeadeb5aa571d008e/src/Microsoft.CodeAnalysis.ResxSourceGenerator/Microsoft.CodeAnalysis.ResxSourceGenerator/AbstractResxGenerator.cs#L131-L164

@AndriySvyryd
Copy link
Member

@roji Are you tracking adding support for ExecuteUpdate somewhere?

@roji
Copy link
Member Author

roji commented Mar 24, 2024

Are you tracking adding support for ExecuteUpdate somewhere?

@AndriySvyryd ExecuteUpdate already works in the publish-time source generator (test), though there may be some trouble ahead if/when we do #32018...).

In the inner-loop sourcegen, it should be a matter of implementing the safe marker injection, just like all the other terminating operators which are currently missing.. There's not a lot of difference between ExecuteUpdate and e.g. Sum in that sense.

Definitely on my list :)

@roji
Copy link
Member Author

roji commented Apr 15, 2024

Note: switch to using dotnet/roslyn#72133 as the interception method when that's available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Precompiled query inner loop: source generator to warn and fail dynamic queries
3 participants