New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apiserver tracing: port forwarding broke #113791
Comments
/kind regression |
/assign @dashpole |
since the error is context timeout related, it's also possible #113591 is related, or intersects with the tracing enablement (though the context change merged earlier, not in the noted range) |
I had that thought as well. But the fact that disabling the APIServerTracing feature gate fixes the problem also points in the direction of it being related to enabling APIServerTracing |
/sig instrumentation |
I was able to reproduce it. Narrowed it down to the trace filter in staging/src/k8s.io/apiserver/pkg/endpoints/filters/traces.go |
The PR which enabled APIServerTracing by default broke port forwarding between e2e.test and the kind cluster (kubernetes#113791). Disabling the feature works around that problem.
@dashpole can you check if this failure reported by @alculquicondor https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/113794/pull-kubernetes-integration/1590410434526056448 can be related?
|
That doesn't look related. From my testing thus far, it impacts requests with |
There is something wrong with how the request body is wrapped in otelhttp. This diff fixes it: --- a/vendor/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp/handler.go
+++ b/vendor/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp/handler.go
@@ -20,7 +20,6 @@ import (
"time"
"github.com/felixge/httpsnoop"
-
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/metric"
@@ -156,22 +155,22 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
ctx, span := tracer.Start(ctx, h.spanNameFormatter(h.operation, r), opts...)
defer span.End()
- readRecordFunc := func(int64) {}
- if h.readEvent {
- readRecordFunc = func(n int64) {
- span.AddEvent("read", trace.WithAttributes(ReadBytesKey.Int64(n)))
- }
- }
+ // readRecordFunc := func(int64) {}
+ // if h.readEvent {
+ // readRecordFunc = func(n int64) {
+ // span.AddEvent("read", trace.WithAttributes(ReadBytesKey.Int64(n)))
+ // }
+ // }
var bw bodyWrapper
- // if request body is nil we don't want to mutate the body as it will affect
- // the identity of it in an unforeseeable way because we assert ReadCloser
- // fulfills a certain interface and it is indeed nil.
- if r.Body != nil {
- bw.ReadCloser = r.Body
- bw.record = readRecordFunc
- r.Body = &bw
- }
+ // // if request body is nil we don't want to mutate the body as it will affect
+ // // the identity of it in an unforeseeable way because we assert ReadCloser
+ // // fulfills a certain interface and it is indeed nil.
+ // if r.Body != nil {
+ // bw.ReadCloser = r.Body
+ // bw.record = readRecordFunc
+ // r.Body = &bw
+ // }
writeRecordFunc := func(int64) {}
if h.writeEvent { bodyWrapper is defined here: |
I'll keep digging |
what I find more concerning is that the promotion to beta apparently put tracing in the request path even when it wasn't configured... I thought modifying the request/filter path was only supposed to happen when the apiserver had --tracing-config-file set |
Looks like a few more tests are broken. I think a revert is the best course for now. |
Thanks, once the issue is resolved in otel, it would be good to figure out what presubmit signal we were missing that could have caught the issue… sounded like some of the folks hitting this did so quite reliably |
I've opened the upstream fix for this, and verified that it fixes this reproduction case: open-telemetry/opentelemetry-go-contrib#2983 |
But I think it is still safer to leave it out of the release. |
weirdly, there was a post-submit periodic job that turned super red because of this issue - https://testgrid.k8s.io/sig-node-containerd#containerd-e2e-ubuntu&width=20 it would be great to have had that signal before merge, not sure what is different in that job 😕 |
Yep. For future reference (for myself), an easy way to check would be to run the |
What happened?
This is a regression caused by #113693. It worked with master a14601a (last merge commit before that PR) and is broken in b2c72fe.
Disabling the APIServerTracing feature gate works avoids the problem.
The problem is that port forwarding in https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/drivers/proxy/portproxy.go fails when used against a kind cluster built from Kubernetes master. #111023 depends on that for testing in pull-kubernetes-kind-dra.
The symptom is:
What did you expect to happen?
Should work as before...
How can we reproduce it (as minimally and precisely as possible)?
Anything else we need to know?
CSI mock tests should use the same port forwarding, but don't seem to be affected.
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: