New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add overview of runtime logging to the Book of the Runtime #98881
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @dotnet/area-extensions-logging Issue DetailsFirst draft. ominous peal of thunder Thanks to @davidwrighton for providing all of this information.
|
Tagging subscribers to this area: @dotnet/area-meta Issue DetailsFirst draft. ominous peal of thunder Thanks to @davidwrighton for providing all of this information.
|
docs/design/coreclr/botr/logging.md
Outdated
|
||
Please see the linked documentation pages for more information on how to use each of the above tools. They have an extensive set of options and some helpful features like time limits, attaching to existing processes, and configurable output formatting. | ||
|
||
For advanced scenarios, on Windows you can also use standard ETW tools like [WPR](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-recorder) to capture events from EventPipe, and on Linux you can use [LTTNG](https://lttng.org/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For advanced scenarios, on Windows you can also use standard ETW tools like [WPR](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-recorder) to capture events from EventPipe, and on Linux you can use [LTTNG](https://lttng.org/). | |
For advanced scenarios, on Windows you can also use standard ETW tools like [WPR](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-recorder), and on Linux you can use [LTTNG](https://lttng.org/). |
I think WPR is ETW only. I do not think that it works with EventPipe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must have misunderstood previous comments, I thought eventpipe funnels events to ETW as well? Is that only EventSource, even though the eventpipe API in C++ is in the ETW:: namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EventPipe is first and foremost a cross-process communication protocol.
EventPipe native component implements the provider side of the protocol. (The listener side of the protocol is in https://github.com/microsoft/perfview/tree/main/src/TraceEvent library.)
The multicasting over EventPipe/ETW/LTTNG/custom is done in EventSource (for events originating in managed code) and C/C++ macros (for events originating in native code). A few examples:
runtime/src/coreclr/inc/eventtracebase.h
Line 100 in ac94075
((Context.EtwProvider->IsEnabled && McGenEventXplatEnabled(Context.EtwProvider, &EventDescriptor)) || EventPipeHelper::IsEnabled(Context, EventDescriptor.Level, EventDescriptor.Keyword)) runtime/src/coreclr/inc/eventtracebase.h
Lines 123 to 124 in ac94075
#define ETW_EVENT_ENABLED(Context, EventDescriptor) (EventPipeHelper::IsEnabled(Context, EventDescriptor.Level, EventDescriptor.Keyword) || \ (XplatEventLogger::IsKeywordEnabled(Context, EventDescriptor.Level, EventDescriptor.Keyword))) - And here is similar logic for EventSource:
runtime/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs
Lines 2421 to 2425 in ac94075
public bool EnabledForAnyListener; // true if any dispatcher has this event turned on public bool EnabledForETW; // is this event on for ETW? #if FEATURE_PERFTRACING public bool EnabledForEventPipe; // is this event on for EventPipe? #endif
ETW:: namespace
A lot of tracing-related code is in ETW namespace for historic reasons, but it does not mean that it is ETW specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that makes a lot of sense. I'll revise this whole portion of it to more clearly separate EventPipe from ETW and EventSource conceptually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've verified that WPR works with the .NET runtime (and even has a scenario preset for it) so I'm going to use that as the one example tool to call out in this documentation, alongside perfcollect for linux. I'm moving the ETW/LTTNG stuff into a separate section to clearly separate it from EventPipe.
docs/design/coreclr/botr/logging.md
Outdated
|
||
StressLog | ||
------------ | ||
The StressLog is a circular buffer inside the runtime process that usually does not escape, and most StressLog messages are enabled in retail builds, which makes it useful for troubleshooting issues with GC or other subsystems in production scenarios. To enable it, you can set the `DOTNET_StressLog` environment variable to `1`, and you can configure it by setting the `DOTNET_LogFacility`, `DOTNET_LogFacility2` and `DOTNET_LogLevel` environment variables. If you are in a situation where you can't attach a debugger to the process, the `DOTNET_StressLogFilename` environment variable can be used to write the StressLog's contents to a file instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are more env variables to control stresslog. The two definitely worth mentioning are DOTNET_StressLogSize
and DOTNET_TotalStressLogSize
can be used to configure the size of per-thread circular buffer and cap the total size of all circular buffers in the process.
For example, here is my default combo to diagnose GC hole type crashes: #45557 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be great to list all the env vars you can set first (and give it some value like what's in Jan's comment) and then explain what kind of values they can take.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more detailed information on some of the specific stress log and traditional log environment variables in each of their sections, along with a section that lists all the log levels and facilities. Do you think this is sufficient? I can expand more on any section you'd like to see include more detail or if I've missed an environment variable you think is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I meant was the following, first list these env vars
set DOTNET_StressLog=1
set DOTNET_LogLevel=6
set DOTNET_LogFacility=0x00080001
set DOTNET_StressLogSize=0x2000000
set DOTNET_TotalStressLogSize=0x40000000
...
optional -
set DOTNET_LogFlushFile=1
set DOTNET_LogFileAppend=1
...
some of these are very self explanatory like DOTNET_StressLog
, and this is something you can just copy and paste (instead of having to pick each one from the text that follows and copy) and either don't need to change at all or only change very little. then you can explain what the non self explanatory ones like -
DOTNET_StressLogSize
configures how large the per-thread buffer for log messages is
DOTNET_TotalStressLogSize
configures a process-wide cap to avoid exhausting memory when you have many threads
DOTNET_LogLevel
see [put a link to the "Log Facilities and Levels" section here]
...
personally I find this format a lot easier to read than putting all of them in one or a few block of text but of course that's a bit subjective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How's the updated version of the stresslog and traditional logging sections? Thanks for the suggestion to rework it this way, I think it's more readable.
|
||
WebAssembly logging | ||
------------------- | ||
For WebAssembly builds of the runtime there is an additional interop layer written in TypeScript that has its own logging facilities, defined in [`logging.ts`](https://github.com/dotnet/runtime/blob/main/src/mono/browser/runtime/logging.ts). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @thaystg you might want to take a look at this part?
I'm working on updating the sample invocations for dotnet-trace and dotnet-collect. I'm testing using a simple NET7.0 application compiled in debug mode using VS2022, and while it works from inside VS and from the command line, passing the same invocation to
It looks like based on the I tried using EDIT: Updating to the latest version of VS2022 and retargeting the test app to net8.0 didn't fix it. So I'm not actually sure how to use these tools. |
I'm suspicious that the WasmStrip.exe is crashing when you are running it under the profiling tools, perhaps because something is different in the environment like running it from a different directory where it doesn't find a dependency. I'd suggest try running:
The addition of that |
Thanks! It sounds like I should mention |
Unfortunately, the mystery deepens: --show-child-io makes it work and successfully record a trace, but it suppresses all the output from the dotnet-counters tool. This at least means I can verify that the commands I'm suggesting work, but I guess I need to file a report about this problem somewhere... |
That is quite mysterious and not behavior I've heard of before! If you wanted to continue troubleshooting two potential suggestions:
static void Main(string[] args)
{
while(!Debugger.IsAttached)
{
Thread.Sleep(100);
}
... This way you could debug it and hopefully observe any failures without having to enable --show-child-io.
|
Looks like we should probably document that these tools break advanced console APIs. Maybe this is true for Process.Start redirection in general? This explains why At the very least this is niche enough that I don't think it needs a mention in our logging docs, and I can verify that my sample commands work for real apps. |
Sorry to spam people's mentions, can someone verify that this is a valid example of how to use dotnet-counters? I'm still getting an empty counter.csv and I've tried various options. I think from reading the docs it's correct. I've tested with both a regular net8.0 binary from bin along with a published self-contained singlefile exe.
|
The failures are probably broader than that. For example if I write an app like this: static void Main(string[] args)
{
Console.CursorTop = 5;
Console.WriteLine("Hello, World!");
} Running it directly at a command-line shell works fine, but invoking it with output redirected to a file will fail:
Console has some APIs to detect when redirection is occuring such as IsOutputRedirected or catching the IOException works as well. I agree with you that I didn't notice anything in the docs for the Console APIs (or the dotnet-* tools) that calls attention to the different ways that Console APIs respond depending on how the process was launched. It certainly could be improved but probably hasn't come up often enough to be prioritized. |
The syntax to enumerate specific counters is a little different than what you were using. Try this:
|
Aha, the official docs explain it like this:
Thanks for clarifying. I guess maybe I should figure out how to update that part of the docs too. |
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
Third draft.
Thanks to @davidwrighton and the many reviewers for providing all of this information.