Skip to content

Guide: Routing traces from the Datadog Agent to Vector within Kubernetes

neuronull edited this page Dec 5, 2022 · 5 revisions

Requirements

  1. Install minikube: https://minikube.sigs.k8s.io/docs/start/

  2. Install kubectl: https://kubernetes.io/docs/tasks/tools/

  3. Install helm: https://helm.sh/docs/intro/install/

  4. Install docker: https://docs.docker.com/engine/install/

Setup helm charts and values

  1. Add the datadog-agent and vector helm charts to your local repo with:

    helm repo add datadog https://helm.datadoghq.com && helm repo add vector https://helm.vector.dev
    
  2. sync the local repo:

    helm repo update
    
  3. verify:

    helm repo list
    

Start minikube

  1. !Important! eval the docker-env so the ports are aligned.

    eval $(minikube -p minikube docker-env)
    
  2. Start the minikube cluster:

    minikube start
    
  3. Verify running

    minikube status
    

Install the Datadog Agent into the minikube cluster

  1. Create helm values file for the Datadog Agent

Note: For this example we will setup the agent to send traces to Vector, and configure the k8s namespace labels to be added as tags to the events.

Save the following in a file "agent.values.yaml", filling out <your_datadog_api_key> :

datadog:
  apiKey: <your_datadog_api_key>
  containerExclude: "name:vector"
  logs:
    enabled: true
    containerCollectAll: true
  apm:
    enabled: true
    ## datadog.apm.portEnabled -- Enable APM over TCP communication (port 8126 by default)
    ## ref: https://docs.datadoghq.com/agent/kubernetes/apm/
    portEnabled: true
clusterAgent:
  enabled: false
agents:
  useConfigMap: true
  customAgentConfig:
    kubelet_tls_verify: false
    vector:
    apm_config:
      apm_dd_url: "http://vector.default:8282"
      max_traces_per_second: 0
      errors_per_second: 0
    dogstatsd_non_local_traffic: true
    kubernetes_namespace_labels_as_tags:
      kubernetes.io/metadata.name: "kube_namespace"
  1. Install the Datadog Agent container into the cluster

     helm install datadog-agent datadog/datadog -f agent.values.yaml
    
  2. Verify it is running

     kubectl get pods
    

    It should look something like this:

NAME                                                READY   STATUS      RESTARTS   AGE
datadog-agent-cluster-agent-767d89c9c5-tbngs        1/1     Running     0          4m31s
datadog-agent-kube-state-metrics-658d989649-j7jt8   1/1     Running     0          4m31s
datadog-agent-rjqxg                                 3/3     Running     0          4m31s
  1. Check logs for errors

     kubectl logs datadog-agent-rjqxg
    

Install Vector into the minikube cluster

  1. Create helm values file for Vector

Save the following in a file "vector.values.yaml", filling out <your_datadog_api_key>:

## See Vector helm documentation to learn more:
## https://vector.dev/docs/setup/installation/package-managers/helm/

# nameOverride -- Override name of app
fullnameOverride: vector

image:
  tag: 0.23.3-debian

# resources -- Set Vector resource requests and limits.
resources:
  ## Required for HPA to function
  requests:
    cpu: 1000m
    memory: 512Mi
  # limits:
  #   cpu: 200m
  #   memory: 256Mi

# customConfig -- Override Vector's default configs, if used **all** options need to be specified
## This section supports using helm templates to populate dynamic values
## Ref: https://vector.dev/docs/reference/configuration/
customConfig:
  data_dir: /vector-data-dir
  api:
    enabled: true
    address: 0.0.0.0:8686
    playground: false
  sources:
    datadog_agent:
      address: 0.0.0.0:8282
      type: datadog_agent
      multiple_outputs: true
      trace_proto: v1v2
    internal_metrics:
      type: internal_metrics
  sinks:
    datadog_logs:
      type: datadog_logs
      inputs:
        - datadog_agent.logs
      default_api_key: <your_datadog_api_key>
      compression: gzip
    datadog_metrics:
      type: datadog_metrics
      inputs:
        - datadog_agent.metrics
        - internal_metrics
      default_api_key: <your_datadog_api_key>
    datadog_traces:
      type: datadog_traces
      inputs:
        - datadog_agent.traces
      default_api_key: <your_datadog_api_key>
    dbg:
      type: console
      encoding:
        codec: json
      inputs:
      - datadog_agent.traces

# livenessProbe -- Override default liveness probe settings, if customConfig is used requires customConfig.api.enabled true
## Requires Vector's API to be enabled
livenessProbe:
  httpGet:
    path: /health
    port: api

# readinessProbe -- Override default readiness probe settings, if customConfig is used requires customConfig.api.enabled true
## Requires Vector's API to be enabled
readinessProbe:
  httpGet:
    path: /health
    port: api
  1. Install the vector image into the cluster:

     helm install vector vector/vector -f ./vector.values.yaml
    
  2. Verify vector is running:

     kubectl get pods
    

    It should look something like this:

$ kubectl get pods                                                                                                                                                                                                                                                                 
NAME                                                READY   STATUS      RESTARTS   AGE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
vector-0                                            1/1     Running     0          4m26s
  1. Check logs for errors

     kubectl logs vector-0
    

(Optional) Selecting a private/local vector image.

In the "vector.values.yaml" file above, we specified the image tag "0.23.3-debian". This corresponds to a vector release containing a published image.

If you are developing changes locally that you would like to test, you can create a vector container with your local changes with the following steps:

  1. Build vector with make package-<...> for your appropriate architecture. This creates ./targets/artificats .

     make package-deb-x86_64-unknown-linux-gnu
    
  2. !Important! Configure the docker image to be visible to minikube and the ports are aligned.

     eval $(minikube -p minikube docker-env)
    
  3. Create the container image, replace <your_tag> with some meaningful name, perhaps the same name as your git branch.

     docker build --tag "timber.io/vector:<your_tag>" target/artifacts -f ./distribution/docker/debian/Dockerfile
    
  4. Import the image into minikube:

     minikube image load timber.io/vector:<your_tag>
    
  5. Verify the image presence:

      minikube image ls
    

    Should contain a line with your image: "timber.io/vector:<your_tag>"

  6. Edit the helm values to use the local image:

image:
  repository: timber.io/vector
  tag: <your_tag>
  1. Install the private vector image into the cluster:

     helm install vector vector/vector -f ./vector.values.yaml
    

    Note: If you already have installed vector into the cluster via the prior steps, add the --replace option.

Sending a trace event through

At this point the Datadog Agent and Vector should both be running in the minikube cluster.

For this example we will create a python process that generates a trace event, and runs inside the minikube cluster.

  1. Create the python script. Add the following to "trace.py":
#!/usr/bin/python3
####### dummy_trace_gen.py / need pip install ddtrace
import os
from ddtrace import tracer

tracer.configure(
    hostname="datadog-agent",
    port="8126",
)

top_level_tags = {
    "foo": "bar",
    "env": "my-dev-env",
}
tracer.set_tags(top_level_tags)

# Top level span
span = tracer.trace("operations.of.interest", service="trace-test-app")
span.set_tag("env", "my-dev-env")
span.set_tag("numeric", 1.234)
span.context.sampling_priority = 10

def nest_span(tracer, n):
    if n == 0:
        return
    with tracer.trace("child_%d"%n, service='trace-test-app'):
        time.sleep(0.1)
        nest_span(tracer, n - 1)

for i in range(2):
  import time
  span
  nest_span(tracer, 10)
span.finish()

print("traces sent")
  1. Create the Dockerfile. Add the following to "Dockerfile":
from python:3

RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir ddtrace
COPY trace.py ./trace.py
RUN chmod +x ./trace.py

ENTRYPOINT ["python3", "trace.py"]
  1. !Important! eval the docker-env so the ports are aligned.

    eval $(minikube -p minikube docker-env)
    
  2. Build the image:

     docker build -t tracer .
    
  3. Add the image to minikube:

     minikube image load tracer:latest
    
  4. Verify the image presence:

     minikube image ls
    

    Should contain an entry: "docker.io/library/tracer:latest"

  5. Create a helm values file for the tracer program. Add the following to "tracer.values.yaml":

apiVersion: batch/v1
kind: Job
metadata:
  name: tracer
spec:
  template:
    metadata:
      name: tracer-pod
    spec:
      containers:
      - name: tracer
        image: tracer:latest
        imagePullPolicy: Never
      restartPolicy: Never
  1. In another terminal, tail the vector logs:

     kubectl logs -f vector-0
    

    This will display the trace event after the next step is completed.

  2. Install the tracer program into the cluster:

     kubectl create -f ./tracer.values.yaml
    
  3. Verify the container ran:

     kubectl get pods
    

    Should output something like this, tracer should show "Completed":

NAME                                                READY   STATUS      RESTARTS   AGE
datadog-agent-cluster-agent-767d89c9c5-tbngs        1/1     Running     0          4m31s
datadog-agent-kube-state-metrics-658d989649-j7jt8   1/1     Running     0          4m31s
datadog-agent-rjqxg                                 3/3     Running     0          4m31s
tracer-pn4b4                                        0/1     Completed   0          4m18s  <<<<<<<<<<
vector-0                                            1/1     Running     0          4m26s
If it does not, check the logs using the method outlined above, and confirm the step #3 was performed.
  1. Check the vector logs in the terminal from step #8 to confirm the trace event was received.

    Should show the healthcheck passed, and the trace event should look something like this:

{"agent_version":"7.38.2","app_version":"","container_id":"7415d81d7f4554f677ac989b85556b73e162c02fc9f6c8f7f339767acd8c2351","dropped":false,"env":"none","error_tps":0.0,"host":"minikube","language_name":"python","language_version":"3.10.6","origin":"","payload_version":"v2","priority":10,"runtime_id":"","source_type":"datadog_agent","spans ":[{"duration":2003930490,"error":0,"meta":{"_dd.p.dm":"-0","env":"my-dev-env","foo":"bar","runtime-id":"1d0c8d97b50e45cca78d6a99ac8cfb18"},"meta_struct":{},"metrics":{"_dd.agent_psr":1.0,"_dd.top_level":1.0,"_dd.tracer_kr":1.0,"_sampling_priority_v1":10.0,"_top_level":1.0,"numeric":1.234,"system.pid":1.0},"name":"operations.of.interest","par ent_id":0,"resource":"operations.of.interest","service":"trace-test-app","span_id":8013505727887431227,"start":"2022-08-24T17:08:28.722038709Z","trace_id":-2120733611673413870,"type":""},{"duration":1002066383,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_10","parent_id":8013505727887431227,"res ource":"child_10","service":"trace-test-app","span_id":-6014030435588199007,"start":"2022-08-24T17:08:28.722095920Z","trace_id":-2120733611673413870,"type":""},{"duration":901902559,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_9","parent_id":-6014030435588199007,"resource":"child_9","service":" trace-test-app","span_id":4182555350259562811,"start":"2022-08-24T17:08:28.822257232Z","trace_id":-2120733611673413870,"type":""},{"duration":801700905,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_8","parent_id":4182555350259562811,"resource":"child_8","service":"trace-test-app","span_id":-1529 159581253984277,"start":"2022-08-24T17:08:28.922456136Z","trace_id":-2120733611673413870,"type":""},{"duration":701496138,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_7","parent_id":-1529159581253984277,"resource":"child_7","service":"trace-test-app","span_id":2564879243480693181,"start":"2022- 08-24T17:08:29.022658071Z","trace_id":-2120733611673413870,"type":""},{"duration":601291015,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_6","parent_id":2564879243480693181,"resource":"child_6","service":"trace-test-app","span_id":3923357355904970171,"start":"2022-08-24T17:08:29.122860246Z","tra ce_id":-2120733611673413870,"type":""},{"duration":501093972,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_5","parent_id":3923357355904970171,"resource":"child_5","service":"trace-test-app","span_id":1032045299762712103,"start":"2022-08-24T17:08:29.223054136Z","trace_id":-2120733611673413870,"ty pe":""},{"duration":400881571,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_4","parent_id":1032045299762712103,"resource":"child_4","service":"trace-test-app","span_id":2152095997608724519,"start":"2022-08-24T17:08:29.323262885Z","trace_id":-2120733611673413870,"type":""},{"duration":300684508," error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_3","parent_id":2152095997608724519,"resource":"child_3","service":"trace-test-app","span_id":-1785016045258492890,"start":"2022-08-24T17:08:29.423456109Z","trace_id":-2120733611673413870,"type":""},{"duration":200487939,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_2","parent_id":-1785016045258492890,"resource":"child_2","service":"trace-test-app","span_id":6657987248375815165,"start":"2022-08-24T17:08:29.523646774Z","trace_id":-2120733611673413870,"type":""},{"duration":100097826,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct" :{},"metrics":{},"name":"child_1","parent_id":6657987248375815165,"resource":"child_1","service":"trace-test-app","span_id":-416479733790656545,"start":"2022-08-24T17:08:29.623978831Z","trace_id":-2120733611673413870,"type":""},{"duration":1001783781,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child _10","parent_id":8013505727887431227,"resource":"child_10","service":"trace-test-app","span_id":4700929287043722318,"start":"2022-08-24T17:08:29.724181743Z","trace_id":-2120733611673413870,"type":""},{"duration":901693355,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_9","parent_id":4700929287043 722318,"resource":"child_9","service":"trace-test-app","span_id":7091980570785734600,"start":"2022-08-24T17:08:29.824269273Z","trace_id":-2120733611673413870,"type":""},{"duration":801558600,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_8","parent_id":7091980570785734600,"resource":"child_8","se rvice":"trace-test-app","span_id":-3441724011161494891,"start":"2022-08-24T17:08:29.924401068Z","trace_id":-2120733611673413870,"type":""},{"duration":701433174,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_7","parent_id":-3441724011161494891,"resource":"child_7","service":"trace-test-app","span _id":-4978423517783410612,"start":"2022-08-24T17:08:30.024523413Z","trace_id":-2120733611673413870,"type":""},{"duration":601224912,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_6","parent_id":-4978423517783410612,"resource":"child_6","service":"trace-test-app","span_id":-6058157343109761121,"st art":"2022-08-24T17:08:30.124728708Z","trace_id":-2120733611673413870,"type":""},{"duration":500996288,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_5","parent_id":-6058157343109761121,"resource":"child_5","service":"trace-test-app","span_id":-4014924905939389512,"start":"2022-08-24T17:08:30.224 954400Z","trace_id":-2120733611673413870,"type":""},{"duration":400766078,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_4","parent_id":-4014924905939389512,"resource":"child_4","service":"trace-test-app","span_id":-7259823233518481964,"start":"2022-08-24T17:08:30.325181315Z","trace_id":-21207336 11673413870,"type":""},{"duration":300571391,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_3","parent_id":-7259823233518481964,"resource":"child_3","service":"trace-test-app","span_id":-2411037428787107394,"start":"2022-08-24T17:08:30.425372457Z","trace_id":-2120733611673413870,"type":""},{"dura tion":200361771,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_2","parent_id":-2411037428787107394,"resource":"child_2","service":"trace-test-app","span_id":1000856630072972936,"start":"2022-08-24T17:08:30.525575359Z","trace_id":-2120733611673413870,"type":""},{"duration":100166767,"error":0,"met a":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_1","parent_id":1000856630072972936,"resource":"child_1","service":"trace-test-app","span_id":-6545389230048719278,"start":"2022-08-24T17:08:30.625729113Z","trace_id":-2120733611673413870,"type":""}],"tags":{"_dd.tags.container":"pod_phase:running,kube_qos:BestEffo rt,kube_container_name:tracer,image_name:tracer,short_image:tracer,kube_ownerref_kind:job,kube_job:tracer,kube_namespace:default,image_tag:latest,image_id:docker://sha256:b82f19b98a77d3570162713e3d1e2909d66325ba5bfe01eff24cf6336dc0a969,docker_image:tracer:latest,pod_name:tracer-gpxcc,kube_ownerref_name:tracer,container_id:7415d81d7f4554f677ac989b85556b73e162c02fc9f6c8f7f339767acd8c2351,display_container_name:tracer_tracer-gpxcc,container_name:tracer"},"target_tps":0.0,"tracer_version":"1.4.1"}

NOTE: As an alternative to steps 5-7, you can run the tracer container directly with:

kubectl run tracer --image=tracer:latest --image-pull-policy=Never

Other useful commands

  • View all resources:
kubectl get -A all
  • uninstall a project:
helm uninstall <vector/datadog-agent/...>