Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] agent tries to connect over http when unix socket endpoint is enabled #2658

Open
pdeva opened this issue Apr 13, 2024 · 4 comments
Open
Assignees
Labels
bug unintended behavior that has to be fixed

Comments

@pdeva
Copy link

pdeva commented Apr 13, 2024

Version of dd-trace-go

1.62.0
Describe what happened:

in kubernetes, constantly seeing these errors for our golang services when tracing is enabled:
Screenshot 2024-04-12 at 7 33 33 PM

Describe what you expected:
while the apm functionality is still working for these services, this error shouldnt be present. it seems while its still sending traces over unix socket, the tracer is still trying to talk to the datadog agent over http port for some reason.

Steps to reproduce the issue:

this is how we init our tracer:

import(
	"go.opentelemetry.io/otel"
	ddotel "gopkg.in/DataDog/dd-trace-go.v1/ddtrace/opentelemetry"
	"gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer"
	"gopkg.in/DataDog/dd-trace-go.v1/profiler"
)

func initTracer(cfg config.ServiceConfig) *ddotel.TracerProvider {
	if !cfg.IsLocalProfile() && !cfg.TracingDisabled {
		traceProvider := ddotel.NewTracerProvider(tracer.WithRuntimeMetrics())
		otel.SetTracerProvider(traceProvider)

		if config.ShouldProfile(cfg) {
			err := profiler.Start(
				profiler.WithService(cfg.ServiceName),
				profiler.WithVersion(cfg.ServiceVersion),
			)
			if err != nil {
				log.Fatal().Err(err)
			}
		}
		datastreams.Start()

		return traceProvider
	}

	return nil
}

here is the relevant config of the k8s deployment of each of the services:

      volumeMounts:
        - mountPath: /var/run/datadog
          name: apmsocketpath

      volumes:
      - hostPath:
          path: /var/run/datadog/
        name: apmsocketpath


      - env:
        - name: DD_TRACE_AGENT_URL
          value: unix:///var/run/datadog/apm.socket
        - name: DD_TRACE_SAMPLE_RATE
          value: "1.0"
        - name: DD_SERVICE
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['tags.datadoghq.com/service']
        - name: DD_VERSION
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['tags.datadoghq.com/version']

this is the relevant config of the datadog helm chart:

Screenshot 2024-04-12 at 7 37 58 PM

Additional environment details (Version of Go, Operating System, etc.):
EKS 1.29
Go 1.22.1

@pdeva pdeva added the bug unintended behavior that has to be fixed label Apr 13, 2024
@github-actions github-actions bot added the needs-triage New issues that have not yet been triaged label Apr 13, 2024
@darccio
Copy link
Contributor

darccio commented Apr 18, 2024

Thanks for reporting this @pdeva. We'll take a look in the next two weeks.

@darccio darccio removed the needs-triage New issues that have not yet been triaged label Apr 18, 2024
@darccio
Copy link
Contributor

darccio commented Apr 24, 2024

@pdeva I just wrote a unit test to check if I was able to reproduce it but the communication occurs through the UNIX socket, as expected:

	t.Run("unix socket", func(t *testing.T) {
		if runtime.GOOS == "windows" {
			t.Skip("Unix domain sockets are non-functional on windows.")
		}
		srv := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
			w.Write([]byte(`{"endpoints":["/v0.6/stats"],"client_drop_p0s":true,"statsd_port":9999}`))
		}))
		udsPath := "/tmp/com.datadoghq.dd-trace-go.test.sock"
		l, err := net.Listen("unix", udsPath)
		if err != nil {
			t.Fatal(err)
		}
		defer l.Close()

		srv.Listener = l
		srv.Start()
		defer srv.Close()

		t.Setenv("DD_TRACE_AGENT_URL", "unix://"+udsPath)
		cfg := newConfig()
		assert.Equal(t, "UDS__tmp_com.datadoghq.dd-trace-go.test.sock", cfg.agentURL.Host)

		assert.True(t, cfg.agent.DropP0s)
		assert.True(t, cfg.agent.Stats)
		assert.Equal(t, 9999, cfg.agent.StatsdPort)
	})

The most possible scenario is that some service is not seeing the environment variable, thus defaulting to the HTTP connection. WDYT?

@pdeva
Copy link
Author

pdeva commented Apr 24, 2024

i cannot tell you what the bug is, its not our agent. i can only tell you what we are observing. and we are seeing this issue for every single Golang service using datadog agent.

@darccio
Copy link
Contributor

darccio commented Apr 25, 2024

@pdeva I didn't ask you that. Sorry for that. What you are observing doesn't match with my first insights. I'll try to reproduce it with a realistic setup.

@darccio darccio self-assigned this Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unintended behavior that has to be fixed
Projects
None yet
Development

No branches or pull requests

2 participants