-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
draft proposal for additional metrics write endpoints #458
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Marco llan@redhat.com
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @marcolan018, thanks for preparing the draft! This is looking like a nice starting point.
I have added few more suggestions. I think we could also add the alternatives that have been considered (I believe in our community discussions we mentioned e.g. using some existing tool) to justify the proposal even further.
|
||
## TLDR | ||
|
||
We propose to support additional metrics write endpoints. The metrics write requests can be forwarded to additional backend endpoints, which support prometheus remote write protocol, besides thanos receivers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We propose to support additional metrics write endpoints. The metrics write requests can be forwarded to additional backend endpoints, which support prometheus remote write protocol, besides thanos receivers. | |
We propose to support additional metrics write endpoints. The metrics write requests can be forwarded to additional backend endpoints, which support Prometheus remote write protocol, besides Thanos receiver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
## Goals | ||
|
||
* Observatorium API can pushes data to configured additional metrics write endpoints, secured or non-secured | ||
* Observatorium API continues to push data to Thanos Receivers even when the additional metrics write endpoints blocked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is simply retaining existing functionality, I don't think we necessarily need to have this as a goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
## Non-Goals | ||
|
||
* Other types of observability data(e.g. logs) to multiple endpoints not considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not be considered in this proposal, but I wonder if we'll want to explore analogous option for other signals as well. We might want to add note on this, i.e. we might explore this in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
## How | ||
|
||
Currently Observatorium API only supports one metrics write endpoint. We propose an update in Observatorium API, to add a new component RemoteWrite Proxy, to clone the request body of incoming metrics write requests, then forward to additional write endpoints, which support prometheus remote write protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to go too much into implementation details, but could it be clarified here what is RemoteWrite Proxy? Will it be a true new stand-alone component or this will be merely internal part of the API service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the term "component" to "handler" to avoid confusing
|
||
## How | ||
|
||
Currently Observatorium API only supports one metrics write endpoint. We propose an update in Observatorium API, to add a new component RemoteWrite Proxy, to clone the request body of incoming metrics write requests, then forward to additional write endpoints, which support prometheus remote write protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently Observatorium API only supports one metrics write endpoint. We propose an update in Observatorium API, to add a new component RemoteWrite Proxy, to clone the request body of incoming metrics write requests, then forward to additional write endpoints, which support prometheus remote write protocol. | |
Currently Observatorium API only supports one metrics write endpoint. We propose an update in Observatorium API, to add a new component RemoteWrite Proxy, to clone the request body of incoming metrics write requests, then forward to additional write endpoints, which support Prometheus remote write protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
The proposed sequence diagrame for incoming metrics write request is as below: | ||
![sequence-diagram](../../assets/additional-write-endpoints.png) | ||
|
||
The client will send metrics write request to observatorium API, the request will reach the RemoteWrite Proxy after some middlewares handling. RemoteWrite Proxy will return once it receives the request, to notifify the client that the request has reached, already passed some necessary checking such as authentication, and ready to be forwarded to backend write endpoints. In the meantime, it will send the request to the backend write endpoints, including Thanos receiver. If any request fails finally, the RemoteWrite Proxy will record the errors in the logs. Also, there will be a new metrics named remote_write_requests_count to expose the status for the requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be beneficial here to go a bit more into details on how to handle partially failed requests, i.e. if one remote write succeeds but other(s) will fail. Will this be considered a success or how would we handle this in API response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The client will not be aware of partially success.
RemoteWrite Proxy will return once it receives the request, to notifify the client that the request has reached, already passed some necessary checking such as authentication, and ready to be forwarded to backend write endpoints.
Admin can monitor the status for the metrics write based on the logs or the remote_write_requests_count metrics.
We can also consider to add retry logic to bypass temporary problems.
@marcolan018 This looks promising! Any update on this? (: |
We propose to support additional metrics write endpoints. The metrics write requests can be forwarded to additional backend endpoints, which support prometheus remote write protocol, besides thanos receivers. | ||
|
||
## Why | ||
Currently the observatorium API always forwards the metrics to Thanos. Given that users have multiple consumers of metric data, they have a requirement for the collected metrics to be made available to these additional consumers. Although the users can pull the metrics from Thanos side directly, but that's not a realtime solution. Also it will lead to heavy workload on Thanos if users consistently query massive metrics from it. We need a way to let observatorium API to forward the metrics to more than one targets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps capitalize Observatorium throughout? (Except when referring the the package observatorium
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Signed-off-by: marcolan018 <llan@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is now in a good shape! Sorry for the delay @marcolan018.
I'd like to get other's final view on the proposal as well though, I'll try to push it for another round of reviews.
Currently the Observatorium API always forwards the metrics to Thanos. Given that users have multiple consumers of metric data, they have a requirement for the collected metrics to be made available to these additional consumers. We need a way to let Observatorium API to forward the metrics to more than one targets. | ||
There are some other options to fulfill this requirement. But all of them have problems/restrictions: | ||
1. The clients send metrics to the additional consumers directly. This option is not workable in some scenarios. e.g. there is no network connection between the clients and the additional consumers. Also, some clients have very limited resources and this option will lead to more consumption of cpu and bandwidth. | ||
2. The users of the additional consumers can pull the metrics from Thanos side directly, but that's not a realtime solution. Also it will lead to heavy workload on Thanos if users consistently query massive metrics from it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me of one more alternative which is described in https://rhobs-handbook.netlify.app/services/rhobs/analytics.md/#existing-production-pipelines - by leveraging thanos replicate
tool (although that is also not a real-time solution).
|
||
## Non-Goals | ||
|
||
* Other types of observability data(e.g. logs) to multiple endpoints not considered. We might explore this in the future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Other types of observability data(e.g. logs) to multiple endpoints not considered. We might explore this in the future | |
* Other types of observability data (e.g. logs) to multiple endpoints not considered. We might explore this in the future |
This PR adds a proposal draft to update observatorium api support additional write endpoints besides Thanos receiver
Signed-off-by: Marco llan@redhat.com