Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exception.stacktrace attribute isn't included in exception events #5139

Open
mikk2168 opened this issue Feb 23, 2024 · 8 comments · May be fixed by #5251
Open

exception.stacktrace attribute isn't included in exception events #5139

mikk2168 opened this issue Feb 23, 2024 · 8 comments · May be fixed by #5251
Labels
area: instrumentation Related to an instrumentation package bug Something isn't working instrumentation: otelmux

Comments

@mikk2168
Copy link

Description

When a panic occurs in a request, an exception event is created, but the exception.stacktrace attribute isn't set.
According to the otel spec on exceptions, the exception.stacktrace attribute SHOULD be set. Currently it isn't, and I haven't been able to find a way to have it included.

Environment

  • OS: macOS
  • Architecture: arm64
  • Go Version: 1.22.0
  • otelmux version: 0.48.0

Steps To Reproduce

  1. Use the otelmux middleware
  2. Call an end-point that panics
  3. See that the trace created contains an exception event, with no stacktrace attribute

Expected behavior

I would expect the exception event to include the stacktrace. I see how this may not always be desired, so an alternative to always including it, could be a way to explicitly tell the middleware to include it.

@mikk2168 mikk2168 added area: instrumentation Related to an instrumentation package bug Something isn't working instrumentation: otelmux labels Feb 23, 2024
@dmathieu
Copy link
Member

dmathieu commented Mar 6, 2024

The call to RecordError probably needs to add the WithStackTrace option.

@makeavish
Copy link

The call to RecordError probably needs to add the WithStackTrace option.

So it's an issue with otelmux and otelmux instrumentation needs to be updated?

As calling RecordError manually, attaches 2nd duplicate exception event to the span, other than the exception event automatically added by otelmux.

@dmathieu
Copy link
Member

dmathieu commented Mar 6, 2024

I'm not seeing the otelmux instrumentation calling RecordError: https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/instrumentation/github.com/gorilla/mux/otelmux/mux.go

So no, I'm not sure this is even otelmux related.
A panic is mentioned in the description, but the only instrumentation with a recover() is otelgin, for template errors.
Could you provide a small reproduction case, or even a test on the otelmux instrumentation?

@mikk2168
Copy link
Author

mikk2168 commented Mar 6, 2024

Thanks for looking into this. I hope I understand your question correctly - if not, please let me know.
I've created as simple example showcasing the behaviour I described. If you run the below code and make a request to http://127.0.0.1:8000/Hello, you will trigger a panic. A span will be created and an exception event will be added to the span.
Since I'm not using any other libraries than mux, net/http, the official otel libraries and otelmux, and since I'm not manually adding any spans or events, I would assume that the event is added by otelmux.
Do you agree that this seems to be the case, or could the added event come from somewhere else?

package main

import (
	"github.com/gorilla/mux"
	"go.opentelemetry.io/contrib/instrumentation/github.com/gorilla/mux/otelmux"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/attribute"
	"go.opentelemetry.io/otel/exporters/stdout/stdouttrace"
	"go.opentelemetry.io/otel/sdk/resource"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
	semconv "go.opentelemetry.io/otel/semconv/v1.24.0"
	"net/http"
)

func main() {
	exp, _ := stdouttrace.New(stdouttrace.WithPrettyPrint())
	r, _ := resource.Merge(
		resource.Default(),
		resource.NewWithAttributes(
			semconv.SchemaURL,
			semconv.ServiceName("MyService"),
			semconv.ServiceVersion("1.0.0"),
			attribute.String("library.language", "go"),
		),
	)

	tp := sdktrace.NewTracerProvider(
		sdktrace.WithSampler(sdktrace.AlwaysSample()),
		sdktrace.WithBatcher(exp),
		sdktrace.WithResource(r),
	)
	otel.SetTracerProvider(tp)

	router := mux.NewRouter()

	router.HandleFunc("/Hello", hello)
	router.Use(otelmux.Middleware("MyService"))

	srv := &http.Server{
		Handler: router,
		Addr:    "127.0.0.1:8000",
	}

	_ = srv.ListenAndServe()
}

func hello(w http.ResponseWriter, req *http.Request) {
	panic("Something went wrong!")
}

The created span will contain the following:

...
        "Events": [
                {
                        "Name": "exception",
                        "Attributes": [
                                {
                                        "Key": "exception.type",
                                        "Value": {
                                                "Type": "STRING",
                                                "Value": ".string"
                                        }
                                },
                                {
                                        "Key": "exception.message",
                                        "Value": {
                                                "Type": "STRING",
                                                "Value": "Something went wrong!"
                                        }
                                }
                        ],
                        "DroppedAttributeCount": 0,
                        "Time": "2024-03-06T13:26:00.039023+01:00"
                }
        ],
...

@dmathieu
Copy link
Member

dmathieu commented Mar 6, 2024

Thank you.
So, the recover() comes from the SDK itself, with the recover for span.End().

https://github.com/open-telemetry/opentelemetry-go/blob/fe9bab54b7a1852da0001b91ec37a417e96d8e7c/sdk/trace/span.go#L389

The solution here would be to close the span with a trace.WithStackTrace() option.
That is not currently doable for the gin instrumentation, as it doesn't take any options for End.

#5250 should be the fix you need though.

@mikk2168
Copy link
Author

mikk2168 commented Mar 6, 2024

That makes sense - thanks for the explanation.
Thanks for creating a fix. There's just one issue though, we're not using gin, but gorilla mux and the otelmux middleware. Could something analogous to #5250 be added to otelmux?

@dmathieu
Copy link
Member

dmathieu commented Mar 6, 2024

Urgh my bad, I did it on the wrong instrumentation 🤦
I won't be able to do another PR right away. Feel free to do so yourself.

@mikk2168 mikk2168 linked a pull request Mar 7, 2024 that will close this issue
@mikk2168
Copy link
Author

mikk2168 commented Mar 7, 2024

Thanks for all the help - it's greatly appreciated.
I added a PR heavily inspired by yours: #5251.
Any feed-back on that will be much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: instrumentation Related to an instrumentation package bug Something isn't working instrumentation: otelmux
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants