Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP Pub/Sub Scaler reports negative metric values #5774

Closed
rc-bryanlinebaugh opened this issue May 2, 2024 · 7 comments
Closed

GCP Pub/Sub Scaler reports negative metric values #5774

rc-bryanlinebaugh opened this issue May 2, 2024 · 7 comments
Labels
bug Something isn't working good first issue Good for newcomers help wanted Looking for support from community

Comments

@rc-bryanlinebaugh
Copy link

Report

We have configured the GCP Pub/Sub Scaler to scale our deployments based on the reported "SubscriptionSize" of the configured Pub/Sub Subscription. We have experienced frequent reporting of negative metric values by the underlying HorizontalPodAutoscaler resource for each ScaledObject.

The reported negative values seems to be having the adverse affect of incorrectly scaling down our deployments.

Examples:

image
NAME↑                                                  REFERENCE                                                TARGETS                              MINPODS                  MAXPODS                   REPLICAS                   AGE
keda-hpa-gcp-log-ingest-go-foo1234567                  Deployment/gcp-log-ingest-go-foo1234567                  -3795005m/50 (avg)                   50                       2500                      1194                       12d

Scaler Config:

spec:
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          policies:
          - periodSeconds: 600
            type: Percent
            value: 5
          stabilizationWindowSeconds: 600
        scaleUp:
          policies:
          - periodSeconds: 60
            type: Percent
            value: 100
          stabilizationWindowSeconds: 100
  cooldownPeriod: 300
  fallback:
    failureThreshold: 5
    replicas: 200
  maxReplicaCount: 2500
  minReplicaCount: 50
  pollingInterval: 10
  scaleTargetRef:
    name: ...
  triggers:
  - authenticationRef:
      name: ...
    metadata:
      activationValue: "0"
      aggregation: sum
      mode: SubscriptionSize
      subscriptionName: ...
      value: "50"
    type: gcp-pubsub

Expected Behavior

There should not be any negative values reported for the HorizontalPodAutoscaler metric.

Actual Behavior

Observed the HorizontalPodAutoscaler consistently reporting negative values.

Steps to Reproduce the Problem

  1. Create a GCP Pub/Sub Topic and Subscription.
  2. Register GCP Pub/Sub Scaler with a similar configuration.
  3. Consistently publish messages to the Subscription.
  4. Observe metric values reported by the configured HPA resource.

Logs from KEDA operator

No response

KEDA Version

2.13.0

Kubernetes Version

1.28

Platform

Amazon Web Services

Scaler Details

GCP Pub/Sub

Anything else?

No response

@rc-bryanlinebaugh rc-bryanlinebaugh added the bug Something isn't working label May 2, 2024
@JorTurFer
Copy link
Member

Hello
Are you scrapping prometheus metrics from KEDA by change? Which is the value of this metric keda_scaler_metrics_value for those ScaledObjects?

@rc-bryanlinebaugh
Copy link
Author

Hello,

I had to do a port-forward for the KEDA Operator in the same cluster, but I was able to retrieve what would have been scraped by Prometheus. For example, here's what a collection of our ScaledObject resources is reporting:

Screenshot 2024-05-02 at 10 54 53 AM

@JorTurFer
Copy link
Member

Interesting information, so we are getting somehow negative values from Pub/Sub api somehow, maybe it's because the aggregation window or something so that we use is not correct. Could it be possible? Are you willing to take a look?
This is the scaler code: https://github.com/kedacore/keda/blob/main/pkg/scalers/gcp_pubsub_scaler.go

This is the relevant part of scaler code (and other calls executed within it):

func (s *pubsubScaler) GetMetricsAndActivity(ctx context.Context, metricName string) ([]external_metrics.ExternalMetricValue, bool, error) {
mode := s.metadata.mode
// SubscriptionSize is actually NumUndeliveredMessages in GCP PubSub.
// Considering backward compatibility, fallback "SubscriptionSize" to "NumUndeliveredMessages"
if mode == pubSubModeSubscriptionSize {
mode = "NumUndeliveredMessages"
}
prefix := prefixPubSubResource + s.metadata.resourceType + "/"
metricType := prefix + snakeCase(mode)
value, err := s.getMetrics(ctx, metricType)
if err != nil {
s.logger.Error(err, "error getting metric", "metricType", metricType)
return []external_metrics.ExternalMetricValue{}, false, err
}
metric := GenerateMetricInMili(metricName, value)
return []external_metrics.ExternalMetricValue{metric}, value > s.metadata.activationValue, nil
}

@rc-bryanlinebaugh
Copy link
Author

Not a Golang expert by any means, but I can take a look. It would be great if someone with more expertise could take a look as well.

@JorTurFer JorTurFer added help wanted Looking for support from community good first issue Good for newcomers labels May 7, 2024
@JorTurFer
Copy link
Member

Let's see if there is any other folk willing to help here too :)

@rc-bryanlinebaugh
Copy link
Author

@JorTurFer Thanks for the initial triage of this issue!

It looks like we were receiving negative values because we were mistakenly passing the aggregation function (sum) in our Scaler configuration. This is an issue because the SubscriptionSize metric we configured is not a Distribution type metric and not supported by the available aggregation methods. This is mentioned in a comment in the documented Scaler example. With the included aggregation, the clauses for the MQL query added here on the Gauge metric resulted in negative values (I don't fully understand why, but I was able to replicate it in Metrics Explorer).

Once we removed the aggregation parameter from our Scaler config, values were reported as expected.

@JorTurFer
Copy link
Member

Nice to read that it's working well 😄 Thanks a lot for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Looking for support from community
Projects
None yet
Development

No branches or pull requests

2 participants