Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error calling MarshalJSON / unsupported value: NaN #479

Open
AnthonMS opened this issue Jun 29, 2022 · 7 comments
Open

error calling MarshalJSON / unsupported value: NaN #479

AnthonMS opened this issue Jun 29, 2022 · 7 comments

Comments

@AnthonMS
Copy link

Hello, I am facing an issue trying to set up the prometheus-to-sd container.

This is the container config in the deployment:

        - name: prometheus-to-sd
          image: gcr.io/google-containers/prometheus-to-sd:v0.9.2
          ports:
            - name: profiler
              containerPort: 6060
          command:
            - /monitor
            - --stackdriver-prefix=custom.googleapis.com
            - --monitored-resource-type-prefix=k8s_
            - --source=:http://localhost:9253
            - --pod-id=$(POD_NAME)
            - --namespace-id=$(POD_NAMESPACE)
            - --cluster-location=REDACTED
          resources:
            requests:
              cpu: 10m
            limits:
              cpu: 10m
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace

And this is the php-fpm exporter container:

        - name: fpm-metrics
          image: hipages/php-fpm_exporter
          imagePullPolicy: IfNotPresent
          ports:
            - name: exporter
              containerPort: 9253
          env:
              # FastCGI address where FPM listens on, we're connecting over TCP
            - name: PHP_FPM_SCRAPE_URI
              value: "tcp://localhost:9000/fpm-status"
              # Enabled to calculate process numbers via php-fpm_exporter since PHP-FPM sporadically reports wrong active/idle/total process numbers.
            - name: PHP_FPM_FIX_PROCESS_COUNT
              value: "true"
              # Only log messages with the given severity or above. Valid levels: [debug, info, warn, error, fatal] (default "error")
            - name: PHP_FPM_LOG_LEVEL
              value: info
          resources:
            requests:
              cpu: 20m
              memory: 32Mi
            limits:
              cpu: 20m
              memory: 32Mi

I can see the fpm-metrics container working, since I have set up proxy pass in the nginx container to forward /metrics to the fpm-metrics container on port 9253 just as a test, and I can see some data when accessing it.

The logs from the prometheus-to-sd container after running for 15-30 minutes:

I0629 12:00:06.326609       1 main.go:121] GCE config: &{REDACTED}
I0629 12:00:06.326701       1 main.go:182] Taking source configs from flags
I0629 12:00:06.326714       1 main.go:184] Taking source configs from kubernetes api server
I0629 12:00:06.326722       1 main.go:124] Built the following source configs: [0xc000390270]
I0629 12:00:06.425637       1 main.go:193] Running prometheus-to-sd, monitored target is  http://localhost:9253
E0629 12:02:10.025888       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:03:06.926664       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:03:07.125572       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:04:07.426957       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:05:10.625747       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:05:10.626850       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:05:21.354563       1 stackdriver.go:60] Error while sending request to Stackdriver googleapi: Error 503: Deadline expired before operation could complete., backendError
E0629 12:06:07.325596       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:06:09.525891       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:07:06.926569       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:08:07.226769       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:09:07.225912       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:10:07.326342       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:10:20.652022       1 stackdriver.go:60] Error while sending request to Stackdriver googleapi: Error 503: Deadline expired before operation could complete., backendError
E0629 12:11:07.626358       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:11:07.726183       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:12:07.726443       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:12:08.825681       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:13:07.425616       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:13:08.926036       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:14:07.825910       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:14:08.925774       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:15:09.026367       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:15:19.644443       1 stackdriver.go:60] Error while sending request to Stackdriver googleapi: Error 503: Deadline expired before operation could complete., backendError
E0629 12:16:07.226192       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:16:09.526655       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:17:08.025625       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:18:07.826717       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:18:09.726555       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:19:06.926358       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:20:08.826345       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:21:07.326439       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:21:07.425685       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:22:06.927493       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:22:08.126282       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
E0629 12:23:09.226728       1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN

I have googled but have come up empty handed, and the closest issue I could find was from 2018-2019 and hasn't had any activity in a long time. So thought I would try here.

I apologize if this is the wrong place to ask or if it has been resolved I just hadn't found it.

Hope someone is able to help. Thanks in advance.

@AnthonMS
Copy link
Author

AnthonMS commented Jul 7, 2022

I have tried adding these two HPA configurations:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-prod-phpfpm
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: valinor-api-prod
  minReplicas: 4
  maxReplicas: 10
  metrics:
    - type: Pods
      pods:
        metricName: phpfpm_active_processes
        targetAverageValue: 6
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-prod-external
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: valinor-api-prod
  minReplicas: 4
  maxReplicas: 10
  metrics:
    - type: External
      external:
        metric: 
          name: custom.googleapis.com|phpfpm_active_processes
        target:
          averageValue: 6
          type: AverageValue

And none of them seem to work as they should. The one using external type does however look a little more weird than the other. (Edit: Doesn't it actually look like it's correctly finding the custom metrics?) Here is the result:
image

I know there are some kind of statistics coming into Google. Since I can see a phpfpm_active_processes in the metrics explorer, as a custom metric. It does however look a bit sketchy, since the active processes doesn't seem to distribute across the different pods. At least according to metrics explorer in google. But when I look directly at the /fpm-status and keep refreshing, then I can see the active processes are different on each pod in most cases.
This might be the phpfpm exporter container or the google prometheus-to-sd container, as that is still throwing the error:

1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN

I was hoping I could scale based on active fpm processes, but if that is not the case. Will it be possible to scale based on request-per-second to each pod? Something like in the first example in this custom metrics adapter?

Edit: But then the hpa using external type sometimes look like this. And that is what I find weird.
image

@igoooor
Copy link

igoooor commented Jul 10, 2022

I have the same logs gcr.io/google-containers/prometheus-to-sd and as a result I could indeed not use the phpfpm_active_processes as HPA metric.
You can also see that if you call: kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/YOUR_NAMESPACE/pods/*/phpfpm_active_processes" the items will be empty.
The HPA uses that same endpoint, that's why it's not working since there are no items.
I could solve this issue by using the version v0.9.0 instead of v0.9.2
The logs ..MarshalJSON... is still there, however the metrics is now properly retrieved by the HPA (and by the kubectl command).
I can now use phpfpm_active_processes for my HPA.
I hope this will help you

@AnthonMS
Copy link
Author

AnthonMS commented Jul 12, 2022

I could solve this issue by using the version v0.9.0 instead of v0.9.2

If you are talking about the prometheus-to-sd image version, then as you can see in my issue post that I am already using
gcr.io/google-containers/prometheus-to-sd:v0.9.2

Edit: Ahh shit sorry, I read it in the wrong order. I have been looking at yaml configs for too long. I will try it out, thank you.

And I can not use the phpfpm_active_processes as a HPA metrics. I did also try to get the raw by running a command like the one you suggest. And as you also say, the items are empty and I figured that was the reason the metrics not working.

Can you give me any insights in what you might have done differently in your setup?

@igoooor
Copy link

igoooor commented Jul 12, 2022

Here is my full yaml for the prometheus-to-sdsidecar:

- name: prometheus-to-sd
  image: gcr.io/google-containers/prometheus-to-sd:v0.9.0
  ports:
    - name: profiler
      containerPort: 6060
  command:
    - /monitor
    - --stackdriver-prefix=custom.googleapis.com
    - --source=:http://localhost:9253
    - --pod-id=$(POD_NAME)
    - --namespace-id=$(POD_NAMESPACE)
    - --cluster-location=$(CLUSTER_REGION)
    - --monitored-resource-type-prefix=k8s_
    - --scrape-interval=10s
    - --export-interval=10s
  resources:
    requests:
      cpu: 10m
    limits:
      cpu: 10m
  env:
    - name: POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace
    - name: CLUSTER_REGION
      value: REDACTED

I think it looks pretty similar to yours.
But as I said previously, I was using v0.9.2 initially, which resulted in HPA not working.
Then I switched to v0.9.0 and then HPA was working.
My HPA yaml is like so:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: REDACTED
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: REDACTED
  minReplicas: 3
  maxReplicas: 12
  metrics:
    - type: Pods
      pods:
        metric:
          name: phpfpm_active_processes
        target:
          type: AverageValue
          averageValue: 80 # or whatever fits your case

And I'm using kubernetes v1.24.1

@AnthonMS
Copy link
Author

It does look very similar. I am afraid though, that it is not only this container/adapter that is causing me trouble. I don't know if you are using the Google Custom Metrics stackdriver adapter?

I had some trouble setting it up to begin with, but had some more success after installing it like this from this issue

gcloud iam service-accounts create custom-metrics-sd-adapter --project "$GCP_PROJECT_ID"

gcloud projects add-iam-policy-binding "$GCP_PROJECT_ID" \
  --member "serviceAccount:custom-metrics-sd-adapter@$GCP_PROJECT_ID.iam.gserviceaccount.com" \
  --role "roles/monitoring.editor"

gcloud iam service-accounts add-iam-policy-binding \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:$GCP_PROJECT_ID.svc.id.goog[custom-metrics/custom-metrics-stackdriver-adapter]" \
  "custom-metrics-sd-adapter@$GCP_PROJECT_ID.iam.gserviceaccount.com"

kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter.yaml

kubectl annotate serviceaccount custom-metrics-stackdriver-adapter \
  "iam.gke.io/gcp-service-account=custom-metrics-sd-adapter@$GCP_PROJECT_ID.iam.gserviceaccount.com" \
  --namespace custom-metrics

But I am getting errors like this:

E0712 08:46:48.948553       1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E0712 08:46:48.948614       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0712 08:46:48.949679       1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0712 08:46:48.951290       1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta2" audit-ID="5b1dd951-6bba-4174-afd6-b4dccf87180d"
E0712 08:46:48.951344       1 timeout.go:135] post-timeout activity - time-elapsed: 3.869µs, GET "/apis/custom.metrics.k8s.io/v1beta2" result: <nil>
E0712 08:46:49.148354       1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E0712 08:46:49.148403       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0712 08:46:49.148460       1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta1" audit-ID="7a3c4de3-18f0-4846-94e2-104be13e4ff1"
E0712 08:46:49.149401       1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E0712 08:46:49.149460       1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0712 08:46:49.149801       1 writers.go:111] apiserver was unable to close cleanly the response writer: http2: stream closed
E0712 08:46:49.149857       1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta1" audit-ID="7d8e9fdb-3f23-4fa8-952a-8f6d3daeb258"
E0712 08:46:49.150521       1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta2" audit-ID="ed232e12-b6e4-4f1e-a407-73c65c7d23ab"
E0712 08:46:49.152893       1 timeout.go:135] post-timeout activity - time-elapsed: 4.392273ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0712 08:46:49.156903       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0712 08:46:49.157252       1 timeout.go:135] post-timeout activity - time-elapsed: 7.365381ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0712 08:46:49.163731       1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E0712 08:46:49.163798       1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta1" audit-ID="3ff962e1-5437-4282-a35a-5c5d7e5e5c5a"
E0712 08:46:49.164040       1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E0712 08:46:49.164435       1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E0712 08:46:49.164811       1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta2" audit-ID="b69d7b5d-fef9-4ed0-9de7-1474505b516e"
E0712 08:46:49.165787       1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E0712 08:46:49.166040       1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta1" audit-ID="7afb8bb4-c92e-4004-9a9e-d5573060011e"
E0712 08:46:49.166264       1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0712 08:46:49.166924       1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta1" audit-ID="fd644be2-e351-4ad3-8c09-7018cda0c978"
E0712 08:46:49.167584       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0712 08:46:49.171427       1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta2" audit-ID="6494d9df-d5bf-402e-ae48-58b130b11625"
E0712 08:46:49.171478       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0712 08:46:49.243937       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0712 08:46:49.243949       1 writers.go:117] apiserver was unable to write a JSON response: http: Handler timeout
E0712 08:46:49.246406       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0712 08:46:49.248929       1 timeout.go:135] post-timeout activity - time-elapsed: 98.366807ms, GET "/apis/custom.metrics.k8s.io/v1beta2" result: <nil>
E0712 08:46:49.251310       1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0712 08:46:49.253716       1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0712 08:46:49.254860       1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0712 08:46:49.255997       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
E0712 08:46:49.257172       1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0712 08:46:49.259616       1 timeout.go:135] post-timeout activity - time-elapsed: 95.781565ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0712 08:46:49.260765       1 timeout.go:135] post-timeout activity - time-elapsed: 94.56363ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0712 08:46:49.261896       1 timeout.go:135] post-timeout activity - time-elapsed: 95.660411ms, GET "/apis/custom.metrics.k8s.io/v1beta2" result: <nil>
E0712 08:46:49.262878       1 writers.go:130] apiserver was unable to write a fallback JSON response: http: Handler timeout
E0712 08:46:49.264115       1 timeout.go:135] post-timeout activity - time-elapsed: 96.887285ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0712 08:46:49.268715       1 timeout.go:135] post-timeout activity - time-elapsed: 97.200859ms, GET "/apis/custom.metrics.k8s.io/v1beta2" result: <nil>

Are you using this custom metrics adapter or are you using another one? If you are using this one, are you also getting these errors? If not, how did you set that up?

@AnthonMS
Copy link
Author

AnthonMS commented Jul 12, 2022

I have just tried setting it up again with the different version number. And my items is still empty unfortunately.

As I mention above, I'm pretty sure it's the google stackdriver adapter that's causing me trouble now. I have set it up as I said and I am not getting 403 forbidden errors anymore, like I did in the beginning. But as the errors above suggest, then there are still something wrong with the adapter. It is getting <nil> when fetching the custom metrics. I have no idea what is going on.

When running the command:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/phpfpm_active_processes"
The result I get back is:
{"kind":"MetricValueList","apiVersion":"custom.metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/phpfpm_active_processes"},"items":[]}
And my containers are running in the default namespace.

Is there a command to check all namespaces for that metric? Just for fun. I'm still new to k8s and trying to learn as much as I can. I have no idea at this point what can be causing issues other than the stackdriver. And I'm kinda over it at this point.

@JamesMarino
Copy link

I have a suspicion that there might be some issues when the phpfpm data is being sent to Metrics Explorer. I was getting the same error message similar to the below:

stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN

My PHP FPM metrics were not being sent at all to Metrics Explorer but after stepping through the code here -

defer wg.Done()
req := &v3.CreateTimeSeriesRequest{TimeSeries: ts[begin:end]}
and removing ts elements of the slice with the problematic Distribution values I could get the phpfpm metrics to work:

defer wg.Done()

var timeSeries []*v3.TimeSeries
for _, singleTimeSeries := range ts[begin:end] {
	anyDistributionValuesFound := false

	for _, point := range singleTimeSeries.Points {
		if point.Value.DistributionValue != nil {
			anyDistributionValuesFound = true
		}
	}

	if !anyDistributionValuesFound {
		timeSeries = append(timeSeries, singleTimeSeries)
	}
}

req := &v3.CreateTimeSeriesRequest{TimeSeries: timeSeries}

What I assume is happening is further down the line when this bulk ts []*v3.TimeSeries is being sent, depending on what element in the slice the phpfpm metrics are they will or will not be sent as it will error out once a bad Distribution value is to be sent thus not sending all the remaining metrics.

This is by no mean a fix for the underlying problem but just an observation / quick fix I was able to put in place - I assume these issues with the Distribution metrics are happening upstream somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants