Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move required OpenTelemetry packages to prometheus/prometheus #13842

Closed
thampiotr opened this issue Mar 26, 2024 · 10 comments
Closed

Move required OpenTelemetry packages to prometheus/prometheus #13842

thampiotr opened this issue Mar 26, 2024 · 10 comments

Comments

@thampiotr
Copy link

thampiotr commented Mar 26, 2024

Problem

Currently, Prometheus has a copy/paste code of two OpenTelemetry translators. The rationale for this is described in README.md in the otlptranslator package:

This is done instead of adding a go.mod dependency because OpenTelemetry depends on prometheus/prometheus and a cyclic dependency will be created.

This becomes a blocker for projects that depend on both open-telemetry/opentelemetry-collector-contrib and prometheus/prometheus, because the code contains registrations of feature gates against a global registry (example), which will panic on duplicate registrations. The symptom would be a panic like this:

panic: failed to register "pkg.translator.prometheus.PermissiveLabelSanitization": gate is already registered

Proposed solution

As mentioned in the README.md in the otlptranslator package:

This is just a temporary solution and the long-term solution is to move the required packages from OpenTelemetry into prometheus/prometheus.

To address the problem, Prometheus should become the new home for required packages and OpenTelemetry should import them instead from their Prometheus dependency.

@aknuds1
Copy link
Contributor

aknuds1 commented Mar 26, 2024

@jesusvazquez WDYT of this problem? You have more history on OTLP in Prometheus than I do.

In particular, I'd like to understand if there are talks about moving the storage/remote/otlptranslator/ directory contents into Prometheus and out of opentelemetry-collector-contrib (so the latter depends on the former).

@bwplotka
Copy link
Member

I don't have opinion on moving packages, I recall one discussion if owners of OTel contrib would like to "donate" this lib to us and how maintainership would work for it etc. I am sure @gouthamve and @dashpole know more.

In the mean time, since we don't expect people to import or use this library we vendor in Otel collector or SDKs, why not removing code that uses any global variables in our otlptranslator?

@jesusvazquez
Copy link
Member

I'll bring this topic to today's prometheus - otel workgroup to see if we can expedite owning the lib now that its been months of updating it, having the endpoint live in our release etc

Knowing that this is causing issues to some users is even more reason to pursue it and then we'll be more in control to make the changes we need.

@dashpole
Copy link
Contributor

dashpole commented Mar 28, 2024

@dashpole
Copy link
Contributor

Reading back through, the primary concerns with the move were:

  1. Dependency tree: OTel contrib -> Prometheus -> OTel core dependency will make breaking changes to OTel core modules extremely difficult to handle updates for.
  2. Stability guarantees. OpenTelemetry components would potentially have breaking changes pushed on them via dependency updates, which are harder to review.

The dependency tree for both translator packages currently contains these collector dependencies:

  • go.opentelemetry.io/collector/pdata @ v1.4.1
  • go.opentelemetry.io/collector/semconv @ v0.97.1
  • go.opentelemetry.io/collector/featuregate @ v1.4.1
  • github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal @ v0.97.0

Semconv is used for AttributeServiceName, AttributeServiceInstanceID, and AttributeServiceNamespace constants. Those could be copied, or we could use constants from opentelemetry-go.

coreinternal is used for test data generation, and could be copied.

Given other dependencies are stable, I think it eliminates (1) as a concern.

For Stability guarantees, we had discussed these as potential parts of the agreement:

  • The package will follow the OTel --> Prometheus specification
  • The package will release with 0.* versioning, but will not make breaking changes to the Go API it offers except for breaking changes to the protobuf it inherits from prometheus/prometheus/prompb.
  • The package will still be owned by members of the OpenTelemetry Prometheus-wg to start. Proposed: @dashpole and @Aneurysm9. Over time, we expect members of the prometheus to take a more active role in maintaining the package.
  • If changes to the package break collector functionality and block prometheus dependency updates, the Prometheus community will help with any needed rollbacks, cherrypicks, or patch releases needed to address them.
  • [New] Significant behavior changes will be gated by collector feature gates.

@jesusvazquez jesusvazquez self-assigned this Apr 1, 2024
@dashpole
Copy link
Contributor

Notes from today's WG meeting:

  • It doesn't actually help at all to move it to a third, neutral repo because we still have cyclical dependencies.
  • collector feature gates don't make sense in a "neutral" package
  • Prometheus may not want to translate to the proto format anyways. It is convenient, since it can then re-use PRW ingestion support, but it may want more direct translation or storage. So maybe it makes sense to diverge.
  • If we don't use the PRW protos directly in the translation package, there isn't a cyclical dependency for Prometheus anymore.
  • Prometheus is happy to allow OTel folks to be maintainers, even if the package is prometheus/prometheus

@jesusvazquez
Copy link
Member

👋

  • It doesn't actually help at all to move it to a third, neutral repo because we still have cyclical dependencies.
  • collector feature gates don't make sense in a "neutral" package

I brought this topic up with the team and the above two things are the main ones that make me think it's best to have different code bases.

  • Having a separate neutral package is a bit clunky and it would also not solve the cyclical dependency per-se it would still require refactoring the use of PRW protos.
  • As mentioned the collector feature gates don't make sense in Prometheus which makes sharing the code base less ideal.
  • The prometheus remote write exporter code exists to do the otlp to prw translation but the code that will now live in prometheus could know more about Prometheus internals to try to make the native ingestion faster. So this divergence with less guarantees might be helpful down the road.

Although this means divergence we're still very involved in the Prometheus - Opentelemetry calls and if we make any meaningful improvements over time we'll bring them up 💪 cc @dashpole

Sometime soon @aknuds1 and I plan to work on a PR to make this a reality and finally close this issue.

@dashpole
Copy link
Contributor

Here is the first improvement to PRW translation :)

https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32605/files#diff-57a6ba7af4a99bae6ac52e8adb5c974559e0bcd46ce597488cd0031facb32b89

jesusvazquez added a commit that referenced this issue Apr 25, 2024
Relates to #13842

After a lot of productive discussion between the Prometheus and
OpenTelemetry community we decided that it made sense for Prometheus to
own its own copy of the code in charge for handling OTLP ingestion
traffic.

This commit is removing the README and update-copy.sh files that had the
previous steps to update the code.

Also it is updating the licensing of all the files to make sure the
OpenTelemetry provenance is explicit and to state the new ownership.

Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
@jesusvazquez
Copy link
Member

@dashpole @Aneurysm9 I'm trying to diverge code while respecting the licensing in this PR #13991

Please have a look mostly at the file header where the relicensing is happening.

jesusvazquez added a commit that referenced this issue Apr 26, 2024
Relates to #13842

After a lot of productive discussion between the Prometheus and
OpenTelemetry community we decided that it made sense for Prometheus to
own its own copy of the code in charge for handling OTLP ingestion
traffic.

This commit is removing the README and update-copy.sh files that had the
previous steps to update the code.

Also it is updating the licensing of all the files to make sure the
OpenTelemetry provenance is explicit and to state the new ownership.

Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
@jesusvazquez
Copy link
Member

This is now completed and this issue can be closed. Thanks everyone for the help 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants