Switch pipeline trace to DiagnosticSource #1169

tillig · 2020-07-17T18:28:32Z

This PR is part of #788 and is the first step in getting the Autofac.Diagnostics.DotGraph package out.

It updates the diagnostic mechanism to use System.Diagnostics.DiagnosticSource to emit events and moves the diagnostics classes into an Autofac.Diagnostics namespace.

Subscribing to events in this system means creating a System.Diagnostics.DiagnosticListener and attaching it to the container's DiagnosticSource. Internally this follows an observer pattern that receives a stream of events, keyed by string IDs, with which the listener may do... whatever they want. There are convenience methods for subscribing to the container's diagnostics, like:

container.SubscribeToDiagnostics<MyListener>();

The default diagnostic tracer was updated to be a diagnostic listener in this format. Rather than paste an example in here, check that out.

I tried to follow patterns I'd seen in areas like ASP.NET Core with respect to things like:

Checking to see if there's a listener subscribed prior to emitting an event.
Keeping some things internal, like the strings that are the list of event IDs.

That latter one was sort of interesting. I wondered why it was internal but realized a couple of things:

Constants are faster because they get compiled into your code, but... they get compiled into your code. So changing the constant later doesn't actually take effect in consuming code outside the assembly. You'd have to use static readonly, but that's not as fast access-wise and this needs to be super low overhead.
Adding, removing, or changing events means adding, removing, or changing these event IDs. If there's not a public constants class, it means we can do that pretty freely (within reason) and the most "breaking" it will be for a consumer is that they won't get that event anymore. If it's part of the public API, it actually breaks in a more fatal way, and that's not cool.

The history is pretty messy because I started with the DOT graph tracer in this PR and then it moved out to a separate package. I tried to squash/rebase it but that, too, got kinda dicey since I'd been pulling changes from the v6 branch and... yeah.

…ns and conflicting styles.

… DiagnosticSource.

…tor.

Remove IsTopLevelOperation and ITracingIdentifier

…ied.

tillig · 2020-07-17T18:35:43Z

A few extra things did sneak in here:

The .editorconfig got updated based on styles we're hoping to make uniform. They do provide some "info" level suggestions in VS Code; we'll want to look at fixing some of that up once v6 is in develop.
We have a .github repo now with things like the bug/issue templates, the code of conduct, and so on. We need to remove them from here to allow that central repo to work.
I found a couple of PSScriptAnalyzer warnings I fixed in the build.ps1 script.
The MetadataKeys file had a TODO that'd been in there for quite some time and was actually obsolete. I removed the associated "TODO: delete this stuff" stuff. I did that because I was trying to use TODO to track the stuff I was working on and... my OCD took over.

I didn't run the perf metrics against it to see if things changed, though I can't imagine it's significant enough to make a huge impact. I've been wrong.

One thing that is different is that the diagnostic source is shared from the container down. You attach at the container level, not at any arbitrary lifetime scope level. I'm not sure if this reduction in functionality matters too much, but it's important to be aware - that's different than the pipeline tracer approach from before.

alistairjevans

Functionally looks great, I can see the benefits of using the DiagnosticSource system.

Unfortunately, there is a performance regression from v6. I've run the benchmarks for the DeepGraph and ChildScope against v6 vs issue788:

DeepGraphResolve - ~1.2μs / 13% slower

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
v6	9.513 μs	0.0358 μs	0.0299 μs	2.0599	0.0458	-	16.92 KB
issue-788	10.78 μs	0.092 μs	0.087 μs	2.0599	0.0458	-	16.91 KB

ChildScopeResolve - ~2μs / 7% slower

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
v6	30.47 μs	0.176 μs	0.156 μs	3.9063	1.9531	-	32.09 KB
issue-788	32.70 μs	0.456 μs	0.427 μs	4.0894	2.0142	-	33.88 KB

There's some thoughts and proposed changes in the individual comments, including on the performance, let me know what you think!

alistairjevans · 2020-07-18T08:14:25Z

src/Autofac/Core/Resolving/Pipeline/ResolvePipelineBuilder.cs

-                    // Optimise the path depending on whether a tracer is attached.
-                    if (ctxt.TracingEnabled)
+                    // Optimise the path depending on whether diagnostics are enabled.
+                    if (ctxt.DiagnosticSource.MiddlewareDiagnosticsEnabled())


I think this line may be at least partly responsible for the performance regression when there are no listeners; this line of code is possibly the hottest path in the entire pipeline.

In the original version of tracing I had to add a boolean cache of this 'Enabled' flag and store it in the Request Context. Can we do that again, and cache the result of this method when the Request begins?

This is tough because the thing being checked is looking to see if any subscribers have this enabled, right? And there could be multiple subscribers, which could be added at any time.

I could simplify the checks - check for literally the exact event being subscribed instead of one of a logical set. Right now I check for start or success or fail to determine if any subscribers exist, but in the majority case I'm guessing that's three "false" checks every time. I could switch it to check only for the specific event being issued. It'll change the flow a bit since you could subscribe to a failure event without subscribing to start or success, but maybe that's okay.

I think it's probably justified for performance reasons to say "If you change diagnostic subscriptions to middleware events, that change may not take effect until the next resolve operation.

Especially since the 99% use case is going to be turning it on once, for all events, and never changing it.

So evaluate all the subscriptions at the operation level and pass that all down through the stack? I think that might add some complexity. Let me see about making the checks more precise and see where that leaves us. I also have a not-totally-formed idea about using flags internally to do bitwise checks and speed up comparison.

Honestly I was just thinking that if someone changes the set of suscribers or subscribed events such that our checking method returns a different value, we...might not raise some middleware events during that transition. But then if someone did change that set during a request, they would have missed related operation and request messages anyway, which could introduce different problems.

Is this sort of thing what the DiagnosticSource Activity functionality is for?

Fundamentally it's whatever gets us the perf saving with no subscribers, so whatever approach works.

Are we anticipating encouraging the use of diagnostics in production? If so, do we need to bench the diagnostic source capture?

I thought about using Activity but it didn't seem worth the allocations.

I'm less concerned about changing subscriptions in the middle of the request than I am about trying to be too smart in the caching mechanism such that we stop responding correctly to observables or it becomes hard to add a new event in here. I could see wanting diagnostic events for all the regular lifetime events, too.

I don't think diagnostics in prod would be recommended. More like "turn this on if you find a problem and turn it back off" or something.

Realized now that I was basically checking enabled twice for any given write.

First time in if(source.MiddlewareDiagnosticsEnabled()) and later in the actual call to write the event. Same for operation and request.

I've removed the aggregate checks and just leave it to the call to write the event. This should remove some of the overhead. Let's see where this puts us perf-wise and I can try to tweak some more if it looks like it still needs help.

I'm trying to not cart too much additional context around if possible, and while a single Boolean isn't that big a deal, the orchestration of keeping the cache up to date with the actual settings seems like it could be a challenge long-term, similar to how container rebuilding got to be a bit sticky and unpredictable.

I can see where there is a use case for success or failure instrumentation, like perf counters or something, to track stats. That'd mean subscribing to just success/failure events but maybe not start events. The ability to have that granularity seems interesting to me, rather than just "all on" or "all off" for a whole request.

I can also see that we may want to support (at some point) the ability to conditionally determine if you're 'enabled' based on the data itself which wouldn't be possible if we try to pre-cache things too much. Like, "I want to see events that involve resolving a certain type" to diagnose an issue would be a use case for being enabled based on certain data.

Just pushed an update to have the event source DiagnosticListener check for IsEnabled() before checking if the specific named subscription is enabled. Benchmark the crap out if it, it should still basically boil down to a check for subscriptions != null before doing anything, which should be just as fast as what we had before. Granted, it's going to do that null check for each event rather than just one time, but that still should be pretty fast.

src/Autofac/Core/Resolving/Pipeline/ResolveRequestContextBase.cs

src/Autofac/Diagnostics/DiagnosticTracerBase.cs

src/Autofac/Autofac.csproj

src/Autofac/Core/Container.cs

src/Autofac/Core/Resolving/ResolveOperationBase.cs

src/Autofac/ILifetimeScope.cs

tillig · 2020-07-20T17:57:41Z

First push today fixes the this references. Diving into other stuff now.

…work.

…cription handling mid-operation or on specific lifetime scopes.

tillig · 2020-07-20T19:14:40Z

A few more pushes address

this usage - removed all the this from the src/** files
documentation updates
switching things so only IContainer exposes DiagnosticSource
disposal of DiagnosticSource when the container is disposed
updating the event handling methods of the base tracer to be protected

…tracer types.

… specific subscription.

tillig · 2020-07-20T20:21:10Z

Pushed an update to the way things get checked for subscription enablement to try to speed that up.

I think it's ready for another review and benchmarking.

alsami · 2020-07-21T06:04:09Z

I think it's ready for another review and benchmarking.

Just ran the Benchmarks on Linux, hope that's fine.

DeepGraphResolve	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
v6	16.67 us	0.206 us	0.183 us	0.4883	-	-	16.92 KB
issue-788	17.50 us	0.106 us	0.099 us	0.4883	-	-	16.91 KB

Increase around ~4%

ChildScopeResolve	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
v6	57.70 us	0.163 us	0.127 us	1.0986	0.5493	-	37.62 KB
issue-788	58.22 us	0.277 us	0.259 us	1.0986	0.5493	-	37.61 KB

Increase around ~0.9%

alistairjevans

All the functionality changes look good, API surface on the tracer is nice and compact now.

Perf-wise, some improvement, although not back to v6 levels. It's probably worth pointing out though that we are still a slight gain on develop even after these changes (I've included results from develop here).

DeepGraphResolve - ~0.9μs / 8% slower (before was ~1.2μs / 13% slower)

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
develop	10.82 us	0.056 us	0.052 us	2.2430	0.0458	-	18.45 KB
v6	9.569 μs	0.0553 μs	0.0517 μs	2.0599	0.0458	-	16.92 KB
issue-788	10.42 μs	0.080 μs	0.071 μs	2.0599	0.0458	-	16.91 KB

ChildScopeResolve - ~1.5μs / 5% slower (before was ~2μs / 7% slower)

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
develop	37.33 us	0.229 us	0.214 us	6.4697	0.4272	-	52.86 KB
v6	30.92 μs	0.152 μs	0.142 μs	3.9063	1.9531	-	32.09 KB
issue-788	32.48 μs	0.141 μs	0.132 μs	4.0894	2.0142	-	33.88 KB

Regardless, I'm going to approve the changes, since we're down to refining it now. If you wanted to merge and address any remaining perf investigation separately (or under #1127) that's fine with me.

src/Autofac/Core/Resolving/ResolveOperationBase.cs

src/Autofac/Core/Resolving/Pipeline/ResolvePipelineBuilder.cs

alistairjevans · 2020-07-21T16:40:11Z

🍻

tillig · 2020-07-21T16:41:48Z

I appreciate the patience on this - it was definitely a long and winding road to get here, but I think it's good overall. Now let me see about getting that DOT graph tracer updated and PR'd based on these changes. 🎉

tillig and others added 30 commits July 6, 2020 15:07

Removed obsolete diagnostic interface IContainerAwareComponent.

28a8abd

Adapted .editorconfig rules from corefx to help with naming suggestio…

aac41f3

…ns and conflicting styles.

Add reference to DiagnosticSource package.

c09f4ce

Extensions for writing the diagnostic messages.

87f2148

DiagosticSource is not CLSCompliant.

8f01e36

Remove IResolvePipelineTracer to start tracking down places to insert…

98dbdb0

… DiagnosticSource.

TODO cleanup so task-level TODO items are easier to track.

8ea2d94

Replaced pipeline tracer with DiagnosticListener and subscriptions.

aa6764f

Existing tests pass with the new trace setup.

aaf370c

this. settings like corefx. I got overruled!

40efaec

Reference updates to match conventions.

7010c9b

Docs and minor refactoring.

b09abc6

Tests for DiagnosticSourceExtensions.

ac8a48d

Tests for DiagnosticTracerBase.

1b3a2d9

PSScriptAnalyzer null check fix.

6875d12

"Full operation" tracing base class, common to default and DOT genera…

e0c320c

…tor.

Remove the concept of IsTopLevelOperation; it is no longer required.

c316344

Merge pull request #1166 from alistairjevans/remove-toplevel-operation

842b86a

Remove IsTopLevelOperation and ITracingIdentifier

Enable TracerMessages generation based on dotnet/msbuild#4751

9ad6a0d

Figure out resx generation later. CS2002 occurs with full info specif…

c650a51

…ied.

Additional labels for DOT graph display.

8d73ee6

Naming updates to reflect no more nested ops.

2d0fcae

Graphs work!

eee56ad

Clean up string manipulation code.

ca4815b

Easier testing setup - activator may be null in a mock.

b7fd58e

Rounded out DOT test suite.

b4fcb5b

Removed middleware operations from the DOT tracer.

ad1b052

Graph overhaul and file header update.

c48b53d

Merge branch 'v6' into issue-788

f1f9115

Graph now normalized based on dependency chain instances.

5dbebea

tillig added 6 commits July 16, 2020 15:29

Test cleanup.

3dfc836

Reduce redundant info in the graph.

54550a7

Removed DOT tracer in prep for new package.

c5efb26

Community files moved to .github repo.

75128b9

Moved Diagnostics to Autofac.Diagnostics namespace.

2770326

Namespace move, forgot the resources location.

9777ad6

alistairjevans requested changes Jul 18, 2020

View reviewed changes

Removed all this qualifiers from product source.

9bad3eb

tillig added 4 commits July 20, 2020 11:20

Disabled subscriptions on default diagnostic tracer to ensure traces …

09e0802

…work.

Trace event handling methods now protected instead of public.

879d496

Only IContainer has a public DiagnosticSource to avoid incorrect subs…

01cfe4e

…cription handling mid-operation or on specific lifetime scopes.

Docs about custom event handling.

10d0306

tillig added 3 commits July 20, 2020 12:41

Diagnostic operation complete event args generic to handle different …

aaeb9a0

…tracer types.

Removed redundant checks for diagnostics being enabled.

876f157

Checking basic IsEnabled() on DiagnosticSource before checking on the…

6795a91

… specific subscription.

alistairjevans approved these changes Jul 21, 2020

View reviewed changes

src/Autofac/Core/Resolving/ResolveOperationBase.cs Show resolved Hide resolved

src/Autofac/Core/Resolving/Pipeline/ResolvePipelineBuilder.cs Outdated Show resolved Hide resolved

alsami mentioned this pull request Jul 21, 2020

Improve internal object relationship graph for easier testing #1164

Closed

Optimizing hot path checks for diagnostics being enabled.

5a6b977

tillig merged commit 7c7c773 into v6 Jul 21, 2020

tillig deleted the issue-788 branch July 21, 2020 16:39

alistairjevans mentioned this pull request Jul 22, 2020

DiagnosticSource Tracing #1126

Closed

alistairjevans linked an issue Jul 22, 2020 that may be closed by this pull request

DiagnosticSource Tracing #1126

Closed

alistairjevans added this to the v6.0 milestone Sep 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch pipeline trace to DiagnosticSource #1169

Switch pipeline trace to DiagnosticSource #1169

tillig commented Jul 17, 2020

tillig commented Jul 17, 2020

alistairjevans left a comment

alistairjevans Jul 18, 2020

tillig Jul 18, 2020

alistairjevans Jul 18, 2020

tillig Jul 18, 2020

alistairjevans Jul 18, 2020

tillig Jul 18, 2020

tillig Jul 20, 2020

tillig Jul 20, 2020

tillig commented Jul 20, 2020

tillig commented Jul 20, 2020

tillig commented Jul 20, 2020

alsami commented Jul 21, 2020 •

edited

alistairjevans left a comment

alistairjevans commented Jul 21, 2020

tillig commented Jul 21, 2020

Switch pipeline trace to DiagnosticSource #1169

Switch pipeline trace to DiagnosticSource #1169

Conversation

tillig commented Jul 17, 2020

tillig commented Jul 17, 2020

alistairjevans left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tillig commented Jul 20, 2020

tillig commented Jul 20, 2020

tillig commented Jul 20, 2020

alsami commented Jul 21, 2020 • edited

alistairjevans left a comment

Choose a reason for hiding this comment

alistairjevans commented Jul 21, 2020

tillig commented Jul 21, 2020

alsami commented Jul 21, 2020 •

edited