Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft proposal for additional metrics write endpoints #458

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

marcolan018
Copy link

This PR adds a proposal draft to update observatorium api support additional write endpoints besides Thanos receiver
Signed-off-by: Marco llan@redhat.com

Signed-off-by: Marco llan@redhat.com
@matej-g matej-g requested a review from a team March 23, 2022 08:21
Copy link
Contributor

@matej-g matej-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @marcolan018, thanks for preparing the draft! This is looking like a nice starting point.

I have added few more suggestions. I think we could also add the alternatives that have been considered (I believe in our community discussions we mentioned e.g. using some existing tool) to justify the proposal even further.


## TLDR

We propose to support additional metrics write endpoints. The metrics write requests can be forwarded to additional backend endpoints, which support prometheus remote write protocol, besides thanos receivers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We propose to support additional metrics write endpoints. The metrics write requests can be forwarded to additional backend endpoints, which support prometheus remote write protocol, besides thanos receivers.
We propose to support additional metrics write endpoints. The metrics write requests can be forwarded to additional backend endpoints, which support Prometheus remote write protocol, besides Thanos receiver.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

## Goals

* Observatorium API can pushes data to configured additional metrics write endpoints, secured or non-secured
* Observatorium API continues to push data to Thanos Receivers even when the additional metrics write endpoints blocked.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is simply retaining existing functionality, I don't think we necessarily need to have this as a goal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


## Non-Goals

* Other types of observability data(e.g. logs) to multiple endpoints not considered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be considered in this proposal, but I wonder if we'll want to explore analogous option for other signals as well. We might want to add note on this, i.e. we might explore this in the future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


## How

Currently Observatorium API only supports one metrics write endpoint. We propose an update in Observatorium API, to add a new component RemoteWrite Proxy, to clone the request body of incoming metrics write requests, then forward to additional write endpoints, which support prometheus remote write protocol.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to go too much into implementation details, but could it be clarified here what is RemoteWrite Proxy? Will it be a true new stand-alone component or this will be merely internal part of the API service?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the term "component" to "handler" to avoid confusing


## How

Currently Observatorium API only supports one metrics write endpoint. We propose an update in Observatorium API, to add a new component RemoteWrite Proxy, to clone the request body of incoming metrics write requests, then forward to additional write endpoints, which support prometheus remote write protocol.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Currently Observatorium API only supports one metrics write endpoint. We propose an update in Observatorium API, to add a new component RemoteWrite Proxy, to clone the request body of incoming metrics write requests, then forward to additional write endpoints, which support prometheus remote write protocol.
Currently Observatorium API only supports one metrics write endpoint. We propose an update in Observatorium API, to add a new component RemoteWrite Proxy, to clone the request body of incoming metrics write requests, then forward to additional write endpoints, which support Prometheus remote write protocol.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

The proposed sequence diagrame for incoming metrics write request is as below:
![sequence-diagram](../../assets/additional-write-endpoints.png)

The client will send metrics write request to observatorium API, the request will reach the RemoteWrite Proxy after some middlewares handling. RemoteWrite Proxy will return once it receives the request, to notifify the client that the request has reached, already passed some necessary checking such as authentication, and ready to be forwarded to backend write endpoints. In the meantime, it will send the request to the backend write endpoints, including Thanos receiver. If any request fails finally, the RemoteWrite Proxy will record the errors in the logs. Also, there will be a new metrics named remote_write_requests_count to expose the status for the requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be beneficial here to go a bit more into details on how to handle partially failed requests, i.e. if one remote write succeeds but other(s) will fail. Will this be considered a success or how would we handle this in API response?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client will not be aware of partially success.

RemoteWrite Proxy will return once it receives the request, to notifify the client that the request has reached, already passed some necessary checking such as authentication, and ready to be forwarded to backend write endpoints.

Admin can monitor the status for the metrics write based on the logs or the remote_write_requests_count metrics.
We can also consider to add retry logic to bypass temporary problems.

@bwplotka
Copy link
Member

bwplotka commented Apr 5, 2022

@marcolan018 This looks promising! Any update on this? (:

@marcolan018
Copy link
Author

@matej-g @bwplotka
Thanks for your valuable comments. I will update the pr based on that.
Have to require some more time for the update due to some errands recently.

We propose to support additional metrics write endpoints. The metrics write requests can be forwarded to additional backend endpoints, which support prometheus remote write protocol, besides thanos receivers.

## Why
Currently the observatorium API always forwards the metrics to Thanos. Given that users have multiple consumers of metric data, they have a requirement for the collected metrics to be made available to these additional consumers. Although the users can pull the metrics from Thanos side directly, but that's not a realtime solution. Also it will lead to heavy workload on Thanos if users consistently query massive metrics from it. We need a way to let observatorium API to forward the metrics to more than one targets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps capitalize Observatorium throughout? (Except when referring the the package observatorium).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: marcolan018 <llan@redhat.com>
@marcolan018
Copy link
Author

@matej-g @bwplotka @esnible
thanks a lot for your comments. I have revised the proposal and added more content based on your comments.
can you take a look on the new version? Thanks

Copy link
Contributor

@matej-g matej-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now in a good shape! Sorry for the delay @marcolan018.

I'd like to get other's final view on the proposal as well though, I'll try to push it for another round of reviews.

Currently the Observatorium API always forwards the metrics to Thanos. Given that users have multiple consumers of metric data, they have a requirement for the collected metrics to be made available to these additional consumers. We need a way to let Observatorium API to forward the metrics to more than one targets.
There are some other options to fulfill this requirement. But all of them have problems/restrictions:
1. The clients send metrics to the additional consumers directly. This option is not workable in some scenarios. e.g. there is no network connection between the clients and the additional consumers. Also, some clients have very limited resources and this option will lead to more consumption of cpu and bandwidth.
2. The users of the additional consumers can pull the metrics from Thanos side directly, but that's not a realtime solution. Also it will lead to heavy workload on Thanos if users consistently query massive metrics from it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me of one more alternative which is described in https://rhobs-handbook.netlify.app/services/rhobs/analytics.md/#existing-production-pipelines - by leveraging thanos replicate tool (although that is also not a real-time solution).


## Non-Goals

* Other types of observability data(e.g. logs) to multiple endpoints not considered. We might explore this in the future
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Other types of observability data(e.g. logs) to multiple endpoints not considered. We might explore this in the future
* Other types of observability data (e.g. logs) to multiple endpoints not considered. We might explore this in the future

@matej-g matej-g requested a review from a team May 25, 2022 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants