Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add overview of runtime logging to the Book of the Runtime #98881

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kg
Copy link
Contributor

@kg kg commented Feb 23, 2024

Third draft.

Thanks to @davidwrighton and the many reviewers for providing all of this information.

@kg kg added documentation Documentation bug or enhancement, does not impact product or test code area-Extensions-Logging labels Feb 23, 2024
@ghost
Copy link

ghost commented Feb 23, 2024

Tagging subscribers to this area: @dotnet/area-extensions-logging
See info in area-owners.md if you want to be subscribed.

Issue Details

First draft. ominous peal of thunder

Thanks to @davidwrighton for providing all of this information.

Author: kg
Assignees: -
Labels:

documentation, area-Extensions-Logging

Milestone: -

@ghost
Copy link

ghost commented Feb 24, 2024

Tagging subscribers to this area: @dotnet/area-meta
See info in area-owners.md if you want to be subscribed.

Issue Details

First draft. ominous peal of thunder

Thanks to @davidwrighton for providing all of this information.

Author: kg
Assignees: kg
Labels:

documentation, area-Meta

Milestone: -

@tarekgh
Copy link
Member

tarekgh commented Feb 24, 2024

@tarekgh tarekgh added the runtime-coreclr specific to the CoreCLR runtime label Feb 24, 2024

Please see the linked documentation pages for more information on how to use each of the above tools. They have an extensive set of options and some helpful features like time limits, attaching to existing processes, and configurable output formatting.

For advanced scenarios, on Windows you can also use standard ETW tools like [WPR](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-recorder) to capture events from EventPipe, and on Linux you can use [LTTNG](https://lttng.org/).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For advanced scenarios, on Windows you can also use standard ETW tools like [WPR](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-recorder) to capture events from EventPipe, and on Linux you can use [LTTNG](https://lttng.org/).
For advanced scenarios, on Windows you can also use standard ETW tools like [WPR](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-recorder), and on Linux you can use [LTTNG](https://lttng.org/).

I think WPR is ETW only. I do not think that it works with EventPipe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have misunderstood previous comments, I thought eventpipe funnels events to ETW as well? Is that only EventSource, even though the eventpipe API in C++ is in the ETW:: namespace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EventPipe is first and foremost a cross-process communication protocol.

EventPipe native component implements the provider side of the protocol. (The listener side of the protocol is in https://github.com/microsoft/perfview/tree/main/src/TraceEvent library.)

The multicasting over EventPipe/ETW/LTTNG/custom is done in EventSource (for events originating in managed code) and C/C++ macros (for events originating in native code). A few examples:

  • ((Context.EtwProvider->IsEnabled && McGenEventXplatEnabled(Context.EtwProvider, &EventDescriptor)) || EventPipeHelper::IsEnabled(Context, EventDescriptor.Level, EventDescriptor.Keyword))
    is example of multiplexing on Windows. This code says "return true if ETW or EventPipe is subscribed to this event"
  • #define ETW_EVENT_ENABLED(Context, EventDescriptor) (EventPipeHelper::IsEnabled(Context, EventDescriptor.Level, EventDescriptor.Keyword) || \
    (XplatEventLogger::IsKeywordEnabled(Context, EventDescriptor.Level, EventDescriptor.Keyword)))
    is example of multiplexing on non-Windows. This code says "return true if this event EventPipe or LTTNG is subscribed to this event"
  • And here is similar logic for EventSource:
    public bool EnabledForAnyListener; // true if any dispatcher has this event turned on
    public bool EnabledForETW; // is this event on for ETW?
    #if FEATURE_PERFTRACING
    public bool EnabledForEventPipe; // is this event on for EventPipe?
    #endif

ETW:: namespace

A lot of tracing-related code is in ETW namespace for historic reasons, but it does not mean that it is ETW specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes a lot of sense. I'll revise this whole portion of it to more clearly separate EventPipe from ETW and EventSource conceptually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've verified that WPR works with the .NET runtime (and even has a scenario preset for it) so I'm going to use that as the one example tool to call out in this documentation, alongside perfcollect for linux. I'm moving the ETW/LTTNG stuff into a separate section to clearly separate it from EventPipe.


StressLog
------------
The StressLog is a circular buffer inside the runtime process that usually does not escape, and most StressLog messages are enabled in retail builds, which makes it useful for troubleshooting issues with GC or other subsystems in production scenarios. To enable it, you can set the `DOTNET_StressLog` environment variable to `1`, and you can configure it by setting the `DOTNET_LogFacility`, `DOTNET_LogFacility2` and `DOTNET_LogLevel` environment variables. If you are in a situation where you can't attach a debugger to the process, the `DOTNET_StressLogFilename` environment variable can be used to write the StressLog's contents to a file instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more env variables to control stresslog. The two definitely worth mentioning are DOTNET_StressLogSize and DOTNET_TotalStressLogSize can be used to configure the size of per-thread circular buffer and cap the total size of all circular buffers in the process.

For example, here is my default combo to diagnose GC hole type crashes: #45557 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be great to list all the env vars you can set first (and give it some value like what's in Jan's comment) and then explain what kind of values they can take.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more detailed information on some of the specific stress log and traditional log environment variables in each of their sections, along with a section that lists all the log levels and facilities. Do you think this is sufficient? I can expand more on any section you'd like to see include more detail or if I've missed an environment variable you think is important.

Copy link
Member

@Maoni0 Maoni0 Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I meant was the following, first list these env vars

set DOTNET_StressLog=1
set DOTNET_LogLevel=6
set DOTNET_LogFacility=0x00080001
set DOTNET_StressLogSize=0x2000000
set DOTNET_TotalStressLogSize=0x40000000
...

optional -
set DOTNET_LogFlushFile=1
set DOTNET_LogFileAppend=1
...

some of these are very self explanatory like DOTNET_StressLog, and this is something you can just copy and paste (instead of having to pick each one from the text that follows and copy) and either don't need to change at all or only change very little. then you can explain what the non self explanatory ones like -

DOTNET_StressLogSize
configures how large the per-thread buffer for log messages is

DOTNET_TotalStressLogSize 
configures a process-wide cap to avoid exhausting memory when you have many threads

DOTNET_LogLevel
see [put a link to the "Log Facilities and Levels" section here]

...

personally I find this format a lot easier to read than putting all of them in one or a few block of text but of course that's a bit subjective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's the updated version of the stresslog and traditional logging sections? Thanks for the suggestion to rework it this way, I think it's more readable.

docs/design/coreclr/botr/logging.md Outdated Show resolved Hide resolved
docs/design/coreclr/botr/logging.md Outdated Show resolved Hide resolved
docs/design/coreclr/botr/logging.md Outdated Show resolved Hide resolved
docs/design/coreclr/botr/logging.md Outdated Show resolved Hide resolved
docs/design/coreclr/botr/logging.md Outdated Show resolved Hide resolved

WebAssembly logging
-------------------
For WebAssembly builds of the runtime there is an additional interop layer written in TypeScript that has its own logging facilities, defined in [`logging.ts`](https://github.com/dotnet/runtime/blob/main/src/mono/browser/runtime/logging.ts).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi @thaystg you might want to take a look at this part?

docs/design/coreclr/botr/logging.md Show resolved Hide resolved
@kg
Copy link
Contributor Author

kg commented Feb 27, 2024

I'm working on updating the sample invocations for dotnet-trace and dotnet-collect. I'm testing using a simple NET7.0 application compiled in debug mode using VS2022, and while it works from inside VS and from the command line, passing the same invocation to dotnet-trace or dotnet-counters does not work and seems to not run the application.

dotnet-counters collect -- bin\Debug\net7.0\WasmStrip.exe "E:\Desktop\dotnet.native.wasm" --report:"E:\Desktop\wasmstrip\report.xml"
doesn't seem to capture any data, and the application exits instantly without writing anything to stdout/stderr.

dotnet-trace collect -- etc does at least generate a trace file with something in it, but it's not very big, and it's not possible to generate a report from it:

dotnet-trace report .\WasmStrip.exe_20240227_144857.nettrace topN
[ERROR] System.Exception: Read past end of stream.
   at FastSerialization.IOStreamStreamReader.Fill(Int32 minimum)
   at FastSerialization.MemoryStreamReader.ReadByte()
   at FastSerialization.Deserializer.ReadTag()
   at FastSerialization.Deserializer.ReadObject()
   at Microsoft.Diagnostics.Tracing.EventPipeEventSource.Process()
   at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.CopyRawEvents(TraceEventDispatcher rawEvents, IStreamWriter writer)
   at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.<>c__DisplayClass120_0.<FastSerialization.IFastSerializable.ToStream>b__0()
   at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.FastSerialization.IFastSerializable.ToStream(Serializer serializer)
   at FastSerialization.Serializer.WriteObjectData(IFastSerializable obj, Tags beginTag)
   at FastSerialization.Serializer.WriteObjectRef(IFastSerializable obj, Boolean defered)
   at FastSerialization.Serializer..ctor(IStreamWriter writer, IFastSerializable entryObject)
   at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.CreateFromEventPipeEventSources(TraceEventDispatcher source, String etlxFilePath, TraceLogOptions options)
   at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.CreateFromEventPipeDataFile(String filePath, String etlxFilePath, TraceLogOptions options)
   at Microsoft.Diagnostics.Tools.Trace.ReportCommandHandler.TopNReport(CancellationToken ct, IConsole console, String traceFile, Int32 number, Boolean inclusive, Boolean verbose) in /_/src/Tools/dotnet-trace/CommandLine/Commands/ReportCommand.cs:line 46

It looks like based on the --version output, when I did a dotnet tool install --global for trace and counters, I got the NET8 versions. Is there a requirement for a version match that I should specify in this documentation? I'm using the Developer PowerShell for VS2022, so I would expect to get the 'right' version of tools here considering I compiled the application with VS2022.

I tried using dotnet exec WasmStrip.exe instead of just WasmStrip.exe as the command to invoke but that causes a timeout/failure to attach.

EDIT: Updating to the latest version of VS2022 and retargeting the test app to net8.0 didn't fix it. So I'm not actually sure how to use these tools.

@noahfalk
Copy link
Member

I'm testing using a simple NET7.0 application compiled in debug mode using VS2022, and while it works from inside VS and from the command line, passing the same invocation to dotnet-trace or dotnet-counters does not work and seems to not run the application.

I'm suspicious that the WasmStrip.exe is crashing when you are running it under the profiling tools, perhaps because something is different in the environment like running it from a different directory where it doesn't find a dependency.

I'd suggest try running:

dotnet-trace collect --show-child-io -- bin\Debug\net7.0\WasmStrip.exe "E:\Desktop\dotnet.native.wasm" --report:"E:\Desktop\wasmstrip\report.xml"

The addition of that --show-child-io argument will cause any console output from WasmStrip.exe to get revealed and we may see that it is having an error and not exiting normally. Hopefully this will shed some light on what is happening.

@kg
Copy link
Contributor Author

kg commented Feb 29, 2024

I'm testing using a simple NET7.0 application compiled in debug mode using VS2022, and while it works from inside VS and from the command line, passing the same invocation to dotnet-trace or dotnet-counters does not work and seems to not run the application.

I'm suspicious that the WasmStrip.exe is crashing when you are running it under the profiling tools, perhaps because something is different in the environment like running it from a different directory where it doesn't find a dependency.

I'd suggest try running:

dotnet-trace collect --show-child-io -- bin\Debug\net7.0\WasmStrip.exe "E:\Desktop\dotnet.native.wasm" --report:"E:\Desktop\wasmstrip\report.xml"

The addition of that --show-child-io argument will cause any console output from WasmStrip.exe to get revealed and we may see that it is having an error and not exiting normally. Hopefully this will shed some light on what is happening.

Thanks! It sounds like I should mention --show-child-io in the docs as a troubleshooting tip or even suggest it as a starting point.

@kg
Copy link
Contributor Author

kg commented Feb 29, 2024

Unfortunately, the mystery deepens: --show-child-io makes it work and successfully record a trace, but it suppresses all the output from the dotnet-counters tool. This at least means I can verify that the commands I'm suggesting work, but I guess I need to file a report about this problem somewhere...

@noahfalk
Copy link
Member

noahfalk commented Mar 3, 2024

Unfortunately, the mystery deepens: --show-child-io makes it work and successfully record a trace, but it suppresses all the output from the dotnet-counters tool. This at least means I can verify that the commands I'm suggesting work, but I guess I need to file a report about this problem somewhere...

That is quite mysterious and not behavior I've heard of before! If you wanted to continue troubleshooting two potential suggestions:

  1. You could insert code like this in your app allowing you time to attach a debugger after running dotnet-trace or dotnet-counters:
static void Main(string[] args)
{
    while(!Debugger.IsAttached)
    {
        Thread.Sleep(100);
    }
    ...

This way you could debug it and hopefully observe any failures without having to enable --show-child-io.

  1. You can use the DIAGNOSTIC_PORTS environment variable to start your app in one shell and have it wait, then launch dotnet-trace or dotnet-counters in a 2nd shell. This allows seeing the output of dotnet-* tools and your app's output in separate console windows.
    https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-trace#use-diagnostic-port-to-collect-a-trace-from-app-startup

@kg
Copy link
Contributor Author

kg commented Mar 4, 2024

Unfortunately, the mystery deepens: --show-child-io makes it work and successfully record a trace, but it suppresses all the output from the dotnet-counters tool. This at least means I can verify that the commands I'm suggesting work, but I guess I need to file a report about this problem somewhere...

That is quite mysterious and not behavior I've heard of before! If you wanted to continue troubleshooting two potential suggestions:

1. You could insert code like this in your app allowing you time to attach a debugger after running dotnet-trace or dotnet-counters:
static void Main(string[] args)
{
    while(!Debugger.IsAttached)
    {
        Thread.Sleep(100);
    }
    ...

This way you could debug it and hopefully observe any failures without having to enable --show-child-io.

2. You can use the DIAGNOSTIC_PORTS environment variable to start your app in one shell and have it wait, then launch dotnet-trace or dotnet-counters in a 2nd shell. This allows seeing the output of dotnet-* tools and your app's output in separate console windows.
   https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-trace#use-diagnostic-port-to-collect-a-trace-from-app-startup

image

Looks like we should probably document that these tools break advanced console APIs. Maybe this is true for Process.Start redirection in general? This explains why show-child-io fixes it. The sample for Console.SetCursorPosition does wrap the call in a try-block, but it doesn't explain why and the catch for it also accesses the console, so it's not really obvious that this kind of code pattern could be a problem.

At the very least this is niche enough that I don't think it needs a mention in our logging docs, and I can verify that my sample commands work for real apps.

@kg
Copy link
Contributor Author

kg commented Mar 4, 2024

Sorry to spam people's mentions, can someone verify that this is a valid example of how to use dotnet-counters? I'm still getting an empty counter.csv and I've tried various options. I think from reading the docs it's correct. I've tested with both a regular net8.0 binary from bin along with a published self-contained singlefile exe.

dotnet-counters collect --counters System.Runtime:gen-0-gc-count -- publish\WasmStrip.exe "E:\Desktop\dotnet.native.wasm" --report:"E:\Desktop\wasmstrip\report.xml"

@noahfalk
Copy link
Member

noahfalk commented Mar 5, 2024

Looks like we should probably document that these tools break advanced console APIs. Maybe this is true for Process.Start redirection in general?

The failures are probably broader than that. For example if I write an app like this:

static void Main(string[] args)
{
    Console.CursorTop = 5;
    Console.WriteLine("Hello, World!");
}

Running it directly at a command-line shell works fine, but invoking it with output redirected to a file will fail:

C:\Users\noahfalk\source\repos\ConsoleAppDemo\ConsoleAppDemo\bin\Debug\net8.0>ConsoleAppDemo.exe > out.txt
Unhandled exception. System.IO.IOException: The handle is invalid.
   at System.ConsolePal.SetCursorPosition(Int32 left, Int32 top)
   at ConsoleAppDemo.Program.Main(String[] args) in C:\Users\noahfalk\source\repos\ConsoleAppDemo\ConsoleAppDemo\Program.cs:line 7

Console has some APIs to detect when redirection is occuring such as IsOutputRedirected or catching the IOException works as well.

I agree with you that I didn't notice anything in the docs for the Console APIs (or the dotnet-* tools) that calls attention to the different ways that Console APIs respond depending on how the process was launched. It certainly could be improved but probably hasn't come up often enough to be prioritized.

@noahfalk
Copy link
Member

noahfalk commented Mar 5, 2024

Sorry to spam people's mentions, can someone verify that this is a valid example of how to use dotnet-counters?

The syntax to enumerate specific counters is a little different than what you were using. Try this:

dotnet-counters collect --counters System.Runtime[gen-0-gc-count] -- publish\WasmStrip.exe "E:\Desktop\dotnet.native.wasm" --report:"E:\Desktop\wasmstrip\report.xml"

@kg
Copy link
Contributor Author

kg commented Mar 5, 2024

Sorry to spam people's mentions, can someone verify that this is a valid example of how to use dotnet-counters?

The syntax to enumerate specific counters is a little different than what you were using. Try this:

dotnet-counters collect --counters System.Runtime[gen-0-gc-count] -- publish\WasmStrip.exe "E:\Desktop\dotnet.native.wasm" --report:"E:\Desktop\wasmstrip\report.xml"

Aha, the official docs explain it like this:
A comma-separated list of counters. Counters can be specified provider_name[:counter_name].
which made me think that the [ ] were decorative, as in 'this is optional'. but the --help text from dotnet-counters is unambiguous and easy to understand, so I should've checked it:

Options:
  --counters <counters>                          A comma-separated list of counter providers. Counter providers can be
                                                 specified as <provider_name> or <provider_name>[comma_separated_count
                                                 er_names]. If the provider_name is used without qualifying
                                                 counter_names then all counters will be shown. For example
                                                 "System.Runtime[cpu-usage,working-set],Microsoft.AspNetCore.Hosting"
                                                 includes the cpu-usage and working-set counters from the
                                                 System.Runtime provider and all the counters from the
                                                 Microsoft.AspNetCore.Hosting provider. To discover provider and
                                                 counter names, use the list command.

Thanks for clarifying. I guess maybe I should figure out how to update that part of the docs too.

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@kg kg reopened this Apr 7, 2024
@kg kg marked this pull request as ready for review May 7, 2024 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Meta documentation Documentation bug or enhancement, does not impact product or test code runtime-coreclr specific to the CoreCLR runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants