Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus exporter undefined behaviour when not all labels are set #3391

Open
soundofspace opened this issue Jul 24, 2023 · 7 comments · May be fixed by #3416
Open

Prometheus exporter undefined behaviour when not all labels are set #3391

soundofspace opened this issue Jul 24, 2023 · 7 comments · May be fixed by #3416
Labels
bug Something isn't working

Comments

@soundofspace
Copy link
Contributor

Describe your environment

  • opentelemetry-exporter-prometheus==1.12.0rc1
  • opentelemetry-sdk==1.19.0
  • opentelemetry-api==1.19.0
  • Python 3.11.0

Steps to reproduce

from opentelemetry import metrics

test_counter = metrics.get_meter("module").create_counter('test_counter', description='counter used for testing')

test_counter.add(1, {'location': 'here', 'source': 'external'})
test_counter.add(1, {'source': 'internal', 'reason': 'error'})

What is the expected behavior?
I expect labels to be the ones specified. But it seems like it only looks at the last call to counter.add to figure out label names, and then just looks at the position of the labels. I also expect these the be the same metric. (no duplicate HELP, TYPE). This also has other weird behaviour when the dict size is different.

The otlp api for counter says this: Users can provide attributes to associate with the increment value, but it is up to their discretion. Therefore, this API MUST be structured to accept a variable number of attributes, including none. This suggest to me that it should be possible to set only certain labels (as far as I understand prometheus also supports missing labels). If this is not the case it should probably be made possible to define label names when creating a counter so this can at least by static type checked. This would also be helpful if missing labels are supported to prevent typos when specifying labels, but that should be probably a separate issue?

What is the actual behavior?

# HELP test_counter_total counter used for testing
# TYPE test_counter_total counter
test_counter_total{reason="external",source="here"} 1.0
# HELP test_counter_total counter used for testing
# TYPE test_counter_total counter
test_counter_total{reason="error",source="internal"} 1.0
@soundofspace soundofspace added the bug Something isn't working label Jul 24, 2023
@aabmass
Copy link
Member

aabmass commented Aug 24, 2023

Thanks for reporting. There is definitely a bug here but I'm not sure the best fix it. We are actually wrapping Prometheus client with a custom collector to avoid reimplementing the text format.

The *MetricFamily constructors expect a sequence of label names, then to have calls to add_metric() with label values. So afaict the prometheus client library doesn't really support this, probably because OM recommends against it.

I expect labels to be the ones specified. But it seems like it only looks at the last call to counter.add to figure out label names, and then just looks at the position of the labels.

OpenTelemetry fully supports this. It looks like a bug in the Prometheus exporter where the label names are being rewritten in a for loop. Draft fix here #3416

I also expect these the be the same metric. (no duplicate HELP, TYPE). This also has other weird behaviour when the dict size is different.

Unfortunately that PR does create duplicates (see the added test case). I don't see a way around it with the Prometheus client unless we build a superset of label names across all points. However that has slightly different semantics.

@aabmass aabmass linked a pull request Aug 24, 2023 that will close this issue
@aabmass
Copy link
Member

aabmass commented Aug 25, 2023

@soundofspace

(as far as I understand prometheus also supports missing labels).

I don't think the official prometheus client does. It requires specifying all label names at instrument creation and then requires passing all of them every time. For example

import time
from prometheus_client import Counter, start_http_server

c = Counter(
    name="promclient_counter",
    documentation="Inconsistent label names with prometheus client",
    labelnames=("a", "b", "newa", "newb"),
)
c.labels(a="a", b="b").inc(10)
c.labels(newa="newa", newb="newb").inc(10)

fails with

Traceback (most recent call last):
  File "repro.py", line 63, in <module>
    c.labels(a="a", b="b").inc(10)
    ^^^^^^^^^^^^^^^^^^^^^^
  File "venv/lib/python3.11/site-packages/prometheus_client/metrics.py", line 182, in labels
    raise ValueError('Incorrect label names')
ValueError: Incorrect label name

@dashpole
Copy link

From https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md#metric-metadata-1:

Prometheus SDK exporters MUST NOT allow duplicate UNIT, HELP, or TYPE comments for the same metric name to be returned in a single scrape of the Prometheus endpoint. Exporters MUST drop entire metrics to prevent conflicting TYPE comments, but SHOULD NOT drop metric points as a result of conflicting UNIT or HELP comments. Instead, all but one of the conflicting UNIT and HELP comments (but not metric points) SHOULD be dropped. If dropping a comment or metric points, the exporter SHOULD warn the user through error logging.

This is something we should be able to fix in the exporter. If its helpful, this is how it was fixed in the go exporter: open-telemetry/opentelemetry-go#3469

@aabmass
Copy link
Member

aabmass commented Aug 25, 2023

Ok, I think we will have to move away from using the prometheus client. I don't see a way to prevent it from generating UNIT, HELP, or TYPE for each Metric family. The Golang client seems to be a bit smarter and dedups them for you.

@sproberts92
Copy link

I wonder if, in the interests of pragmatism, we should merge @aabmass's fix regardless of the spec compliance mentioned by @dashpole?

The Prometheus output breaks the spec with or without the fix, but we can at least benefit from having the labels behave correctly until we can find a way to improve the spec compliance as well.

@LeonChadwick
Copy link

I am getting this bug also. I came to try OpenTelemetry BECAUSE the official prometheus_client is inadequate for this scenario but found that OpenTelemetry is similary crippled until this is fixed.

@soundofspace
Copy link
Contributor Author

It's even worse then I first reported, it's not only having issues with missing labels, but also empty string labels if order is not the same. This was already something that I did to work around this issue.

Subtle bugs like this are really breaking the user experience, and I'm hoping a workaround or permanent fix for this can be deployed rather quickly as this issue has been open for over 6 months and is quite annoying and difficult to work around.

test_counter.add(1, {'cause': '', 'reason': 'blbal'})
test_counter.add(1, {'reason': 'something', 'cause': 'blbal'})

# HELP test_counter_total counter used for testing
# TYPE test_counter_total counter
test_counter_total{cause="blbal",reason=""} 1.0
# HELP test_counter_total counter used for testing
# TYPE test_counter_total counter
test_counter_total{cause="blbal",reason="something"} 1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants