New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus SNS receiver proposal #2559
Comments
Thanks. I would sponsor this. We would more likely move sigv4 out of prometheus/prometheus to prometheus/common. I am willing to accept this in a dedicated go module in prometheus/common, so users depending on common do not depend on aws sdk.
|
Thank you. |
Are you willing to work on this? It seems the first step is to extract the sigv4 code from prometheus in a new go mod in prometheus/common. Happy to help/answer questions. |
This would be amazing. At SUSE we have multiple AWS customers and AWS SA's requesting this connection between AlertManager and SNS for SAP HA environments. We package a number of Prometheus Exporters which they leverage but all have requested TXT/SMS alerts for these prod environments via a native service (SNS.) They try to do this with Lambda and Cloudwatch to SNS but most that I've heard from want to use Prometheus. |
@treid314 is going to have a stab at this in the coming weeks! |
While implementing the Deduplication Key logic we found that the hashed group key, like we use for other notifers, is not unique enough to prevent us from de-duping sns messages that contain different labels and data sent from our notifier. That means that a user would not be able to publish to a topic until the SNS de-dupe timelimit is up. I'm weary to suggest using a hash of the message itself since that's content-based deduplication SNS does itself. Are there any suggestions to create a better deduplication key for this notifier? |
@maxbrodin I'm a bit confused by this truncation strategy here, it seems from the AWS docs that the message attributes and message length are unrelated to the size of the message itself such that removing message attributes won't effect the total message length. Is there something I'm missing? |
The content of the message can vary from one minute to the other for the same alert. Because annotations can contain changing data like values, query results, even from multiple prometheus servers. It can not be used for hashing. |
In the SNS SDK the API version is hard coded in the client metadata for the SNS client, making the SDK bound to a specific API Version. I propose we remove the |
I want to bring this back up before we complete this issue. Right now you would not be able to publish another message for 5 minutes (SNS de-dupe time limit) to a fifo queue with a hashed group key. I think we need to consider adding to what we use to compute the hash to get to a deduplication key that allows for us have some more control over what should be deduplicated on the SNS side. Are there any suggestions for what to add to our hash to handle this issue better? |
We could hash all the labels from all the alerts, but not the annotations. |
I think it's fine, initially, to just use group key hash as SNS' dedupe key and group key. 5 minute as minimum value for Later on if there is strong need, then we can choose to introduce new SNS receiver config like |
Implemented #2615 |
Prometheus SNS receiver
Problem
AlertManager allows to define receivers - notification integration with email, webhook and third-party integrations like PagerDuty, OpsGenie, Slack and others.
Currently there is no integration with Amazon Simple Notification Service which provides fully managed pub/sub messaging, SMS, email, and mobile push notifications. There is a workaround with webhook receiver as a proxy, but it lacks support of AlertManager templates and requires setup and maintenance of additional component.
Proposed solution
This proposal is to add Prometheus SNS receiver - native support of notification integration with Amazon SNS
Message destinations
Prometheus SNS receiver can publish messages to the following destinations:
SNS Publish API
In order to publish message to an SNS topic the following HTTP request parameters are required:
Common:
Specific for each request
Optional
Prometheus SNS receiver configuration
<sns_config>
<sigv4_config>
<attribute_config>
Examples
SNS Topic
SMS
Mobile notification
Email
Message size
Due to SNS message constraints:
With the exception of SMS, messages must be UTF-8 encoded strings and at most 256 KB in size (262,144 bytes, not 262,144 characters).
For SMS, each message can contain up to 140 characters. This character limit depends on the encoding schema. For example, an SMS message can contain 160 GSM characters, 140 ASCII characters, or 70 UCS-2 characters.
The total size limit for a single SMS
Publish
action is 1,600 characters.Prometheus SNS receiver must truncate message according to constraints:
Truncation strategy
If message doesn’t fit in 256Kb limit SNS receiver will truncate message content (Note, that message body is required and can’t be empty).
If message still can’t fit into the limit we will truncate message attributes one by one until message won’t fit the size limit.
If SNS receiver truncates the message a new SNS message attribute with key "truncated" and value "true" will be added to the message to indicate that the notification message was truncated.
Deduplication Key
In order to correlate alarm triggers and alarm resolves we publish a special “deduplicationKey” attribute with a value of the hash of GroupKey similar to PagerDuty, OpsGenie and VictorOps
Default SNS message format
Currently some receivers have default message template like
Default SNS message format will contain the following information (to be confirmed):
Similar issue :#2525
The text was updated successfully, but these errors were encountered: