Create CLR metrics semantic convention #956

gregkalapos · 2024-04-24T09:21:26Z

Area(s)

area:system

Is your change request related to a problem? Please describe.

The idea is to create SemConv CLR metrics for .NET, similar to JVM metrics.

System and Process metrics cover lots of useful things that can be utilized by .NET already, but similarly to Java, there are relevant metrics not covered by currently defined general metrics. Examples:

GC metrics
Thread pool metrics
Metrics around loaded/unladed assemblies
Exception related metrics

Describe the solution you'd like

Add a clr.* namespace and define metrics for the list above. Everything defined there should be very general and applicable to most .NET applications and instrumentations should be able to collect those on most environments.

Describe alternatives you've considered

An alternative could be to look at the already defined .NET metrics and try to fit the CLR metrics there. I believe that would not be ideal since those metrics are specific to higher level frameworks. CLR metrics that I propose here are purely runtime metrics, not tied to any framework. In this sense these are more like the jvm metrics for Java.

Additional context

There are already .NET metrics defined. Those are not really runtime metrics from the CLR (the runtime itself), rather they are metrics for higher level components (like ASP.NET Core, SignalR, etc.). I think these two can co-exist. Whatever higher level frameworks emit, still can live in their specific namespaces (like aspnetcore.*, signalr.*, http.client.*), but pure runtime level metrics (like GC, etc - see list above) would be on a clr.* namespace. So I suggest there should be no change to the current .NET metrics and there should be a clr.* namespace additionally to pure CLR runtime metrics.

The text was updated successfully, but these errors were encountered:

stevejgordon · 2024-04-24T11:35:57Z

I'd be interested in starting work on adding these.

joaopgrassi · 2024-04-30T10:01:31Z

@stevejgordon Awesome! I will assign this to you then if it's fine. Let me know if you need any help with working with the semconv tooling/things. :)

lmolkova · 2024-05-01T03:24:18Z

/cc @open-telemetry/semconv-dotnet-approver

antonfirsov · 2024-05-01T16:29:31Z

IMO standardization should happen in parallel with the implementation. This is what we did with the HTTP metrics and we found many dragons. The runtime work is tracked by dotnet/runtime#85372 which seems to be on the 9.0 roadmap but I don't see an owner assigned and there was no activity around the issue for almost a year.

pyohannes · 2024-05-02T10:10:43Z

Be aware that .NET runtime metrics are defined and already shipped via (one of the few) stable .NET instrumentation packages (OpenTelemetry.Instrumentation.Runtime).

Any related work should be coordinated with @open-telemetry/dotnet-maintainers, who decided to make the specification of runtime metrics part of the instrumentation library instead of semantic conventions.

stevejgordon · 2024-05-02T11:25:45Z

@pyohannes If I've followed the Java story around this, I believe their instrumentation defined many of the metrics later added to the semantic conventions, and I saw a comment from @trask plan(ned) in the Go issue, to handle the switch by using the OTEL_SEMCONV_STABILITY_OPT_IN=jvm configuration. Would that be the way forward here?

stevejgordon · 2024-05-02T11:27:01Z

My thoughts were to first review what existing metrics are available (not many as most are event counters I believe), which were produced from contrib instrumentation and also to compare some we collect in Elastic APM. That research would be a starting point to identifying what, if any, could/should be introduced in the semantic conventions.

pyohannes · 2024-05-02T12:02:20Z

@stevejgordon A viable first step would be to sync with the .NET SIG and align with their vision for runtime metrics.

stevejgordon · 2024-05-03T09:50:47Z

@pyohannes Yep, I agree. Unfortunately, the timing of those SIGs is very difficult for me to attend but I'll try to get to one soon and put this on the agenda. The way I see it we have a couple of challenges here.

a) The runtime isn't yet natively instrumented with metrics and that needs some discussion, including if any metrics that are added will comply with the semantic conventions format. It sounds like right now, the intention is that they will not.
b) There is some existing naming against the stable contrib runtime instrumentation package which differs from the semantic conventions recommendations that have come since.

In regards, to naming, a potentially naive solution is to introduce an env var as Java have, that any .NET runtime metrics and/or OTel contrib instrumentation honour when setting up the metric names. That way, users could opt-in to semantic convention naming, but the existing names (or new .NET runtime names) are the default.

lmolkova · 2024-05-03T21:35:47Z

Looking into https://github.com/open-telemetry/opentelemetry-dotnet-contrib/tree/main/src/OpenTelemetry.Instrumentation.Runtime, we'd make some breaking change if we attempted to make them consistent with general semantic conventions requirements:

metrics would be renamed (process.runtime.dotnet.* -> dotnet.*)
attribute names would need to be namespaced

I.e. I don't think there is a benefit in documenting existing semantic conventions for this package and if runtime is instrumented natively, this package would either need to be deprecated or would need to come with v2 compatible with new metrics.

stevejgordon · 2024-05-07T08:43:46Z

@lmolkova Over on the runtime issue, I have the go-ahead to implement the metrics inside the runtime itself. This will likely be a port of the existing metrics from the instrumentation package, renamed to align with whatever we agree here for the naming. I like @gregkalapos's proposal to prefix these as clr.* since that describes that actual runtime. dotnet could be too general as arguably code could be running on the CLR or something like mono for Blazor etc.

My plan going forward is to create an initial PR with experimental metrics and attributes based on those from the runtime instrumentation library while also working to see if we can implement it in the runtime before .NET 9 is locked down.

I agree that once this all happens, the runtime library could then likely be deprecated.

AlexanderWert · 2024-05-07T12:59:43Z

Looking into https://github.com/open-telemetry/opentelemetry-dotnet-contrib/tree/main/src/OpenTelemetry.Instrumentation.Runtime, we'd make some breaking change if we attempted to make them consistent with general semantic conventions requirements:

metrics would be renamed (process.runtime.dotnet.* -> dotnet.*)

attribute names would need to be namespaced

I.e. I don't think there is a benefit in documenting existing semantic conventions for this package and if runtime is instrumented natively, this package would either need to be deprecated or would need to come with v2 compatible with new metrics.

@lmolkova Not sure I follow your point here. Are you saying we should not define semantic conventions when there is already existing instrumentation / implementation that emits data that is inconsistent with semantic convention guidelines because we would break the implementation?

In semantic conventions we have the JVM metrics. IMHO, for consistency and usability reasons we should follow that pattern and define meaningful runtime metrics for other runtimes / languages in semantic conventions as well. So SemConv should be the single point of specification / documentation that users and vendors can rely on (rather then docs spread over different implementations).

Or do I understand you comment wrong?

pyohannes · 2024-05-07T14:37:45Z

Are you saying we should not define semantic conventions when there is already existing instrumentation / implementation that emits data that is inconsistent with semantic convention guidelines because we would break the implementation?

@AlexanderWert The main point here is that the OpenTelemetry.Instrumentation.Runtime package emits runtime metrics and is declared stable. If we would now add semantic conventions for .NET runtime metrics that differ from the metrics emitted by this package, .NET maintainers would need to deal with a stable instrumentation library that doesn't conform with semantic conventions.

stevejgordon · 2024-05-07T16:11:53Z

I think it's reasonable to implement the metrics in the runtime to align with the conventions established for Java. Consumers can opt into those from the runtime or stick with the instrumentation package. At least then, we have semantic convention consistency for those being added.

lmolkova · 2024-05-07T16:54:46Z

My plan going forward is to create an initial PR with experimental metrics and attributes based on those from the runtime instrumentation library while also working to see if we can implement it in the runtime before .NET 9 is locked down.

@stevejgordon I think it would be a great way to start!

I think it's reasonable to implement the metrics in the runtime to align with the conventions established for Java.

I believe there were a lot of discussions wrt consistency between runtimes and the decision was to put metrics into individual namespaces and not target consistency. So JVM metrics can be used as an inspiration, but not as a guidance.

//cc @trask if I got it wrong.

lmolkova · 2024-05-07T17:01:10Z

Are you saying we should not define semantic conventions when there is already existing instrumentation / implementation that emits data that is inconsistent with semantic convention guidelines because we would break the implementation?

@pyohannes explained my concern much better than I would #956 (comment)

The only thing to add is that if we defined proper CLR semconv right now we would need to keep them experimental until the .NET runtime (hopefully) emits them natively since there usually are a lot of nuances and changes coming out of the native implementation process.

stevejgordon · 2024-05-13T10:08:01Z

@lmolkova Starting with these as experimental makes sense. I'm hoping to also contribute to the implementation in runtime so we can move to stable once those are merged and we know the release timeframe. Ideally, we'll get these in for .NET 9.

gregkalapos added enhancement New feature or request experts needed This issue or pull request is outside an area where general approvers feel they can approve triage:needs-triage labels Apr 24, 2024

github-actions bot assigned AlexanderWert Apr 24, 2024

github-actions bot added the area:system label Apr 24, 2024

joaopgrassi assigned gregkalapos and stevejgordon and unassigned AlexanderWert Apr 30, 2024

joaopgrassi removed the triage:needs-triage label Apr 30, 2024

stevejgordon mentioned this issue May 3, 2024

Metrics for System.Runtime dotnet/runtime#85372

Open

stevejgordon linked a pull request May 14, 2024 that will close this issue

Add initial experimental .NET CLR runtime metrics #1035

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create CLR metrics semantic convention #956

Create CLR metrics semantic convention #956

gregkalapos commented Apr 24, 2024

stevejgordon commented Apr 24, 2024

joaopgrassi commented Apr 30, 2024

lmolkova commented May 1, 2024

antonfirsov commented May 1, 2024 •

edited

pyohannes commented May 2, 2024

stevejgordon commented May 2, 2024

stevejgordon commented May 2, 2024

pyohannes commented May 2, 2024

stevejgordon commented May 3, 2024

lmolkova commented May 3, 2024

stevejgordon commented May 7, 2024

AlexanderWert commented May 7, 2024

pyohannes commented May 7, 2024

stevejgordon commented May 7, 2024

lmolkova commented May 7, 2024

lmolkova commented May 7, 2024

stevejgordon commented May 13, 2024

Create CLR metrics semantic convention #956

Create CLR metrics semantic convention #956

Comments

gregkalapos commented Apr 24, 2024

Area(s)

Is your change request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

stevejgordon commented Apr 24, 2024

joaopgrassi commented Apr 30, 2024

lmolkova commented May 1, 2024

antonfirsov commented May 1, 2024 • edited

pyohannes commented May 2, 2024

stevejgordon commented May 2, 2024

stevejgordon commented May 2, 2024

pyohannes commented May 2, 2024

stevejgordon commented May 3, 2024

lmolkova commented May 3, 2024

stevejgordon commented May 7, 2024

AlexanderWert commented May 7, 2024

pyohannes commented May 7, 2024

stevejgordon commented May 7, 2024

lmolkova commented May 7, 2024

lmolkova commented May 7, 2024

stevejgordon commented May 13, 2024

antonfirsov commented May 1, 2024 •

edited