Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parseDuration function to templates #3816

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

verdel
Copy link

@verdel verdel commented Apr 22, 2024

I add a duration function. This would, for example, allow generating links to a grafana dashboards where a time range for display can be passed.

We will pass a time offset in the annotation and use it to generate the link in the following way:

Alert data:

"annotations": {
        "title": "Alert Summary text",
        "description": "Detailed alert description",
        "log_url": "https://grafana.example.com ...from=%from%, to=%to%",
        "grafana_dashboard_range": "-2m"
      },

Alert template:

{{ define "test" }}
{{ .Annotations.message }}
{{- "\\n" -}}
{{ (index .Alerts 0).Annotations.log_url | reReplaceAll "%from%" (printf "%d000" ((index .Alerts 0).StartsAt.Add (duration (index .Alerts 0).Annotations.grafana_dashboard_range)).Unix) | reReplaceAll "%to%" (printf "%d000" (index .Alerts 0).StartsAt.Unix)}}
{{ end }}

The function itself is very simple:

// duration returns the time.Duration representation of a passed string
"duration": func(text string) (time.Duration, error) {
    d, err := time.ParseDuration(text)
    if err != nil {
        return 0, err
    }
    return d, nil
}

@verdel verdel force-pushed the add-duration-function-to-templates branch 2 times, most recently from ec4ab14 to 2d89845 Compare April 22, 2024 14:49
@grobinson-grafana
Copy link
Contributor

I have one question about the intended use case. Wouldn't it be better to have the log_url annotation have the correct URL before the alert is sent to Alertmanager. Then the URL works in visualization tools like Grafana and karma.

For example, instead of this:

"annotations": {
        "title": "Alert Summary text",
        "description": "Detailed alert description",
        "log_url": "https://grafana.example.com ...from=%from%, to=%to%",
        "grafana_dashboard_range": "-2m"
      },

Have this:

"annotations": {
        "title": "Alert Summary text",
        "description": "Detailed alert description",
        "log_url": "https://grafana.example.com ...from=now, to=-2h"
      },

@verdel
Copy link
Author

verdel commented Apr 22, 2024

We use Alertmanager not only with Prometheus but also with Loki, and in LogQL, there are no functions to get the current timestamp. There is an open issue describing such a case.

Therefore, we cannot form values for from and to on the Loki Ruler side. Currently, we generate a link to the grafana dashboard in a way similar to what I described in the example. However, since there is no function to work with duration in Alertmanager, we have to set the argument for .StartsAt.Add statically.

If we add a duration function, we will be able to pass the range through alert annotation.

@verdel verdel force-pushed the add-duration-function-to-templates branch from 2d89845 to 740a580 Compare April 22, 2024 15:10
@grobinson-grafana
Copy link
Contributor

OK! 👍

The reason I asked is I was looking at this example:

{{ (index .Alerts 0).Annotations.log_url | reReplaceAll "%from%" (printf "%d000" ((index .Alerts 0).StartsAt.Add (duration (index .Alerts 0).Annotations.grafana_dashboard_range)).Unix) | reReplaceAll "%to%" (printf "%d000" (index .Alerts 0).StartsAt.Unix)}}

and made the following observations:

  1. This is a complicated template, and I suspect there are other users who want to solve the same problem. I think this is a neat "hack" but is there a better way to solve this?
  2. If the grafana_dashboard_range annotation is missing for at least one alert then the entire template will fail and a notification will not be sent. You need to remember to add if statements or add a safe function like duration_or_default.
  3. Most requests I've seen for a duration function so far have been to take a time and print the time since (in other words – the opposite behavior). For example:
{{ .StartsAt|duration }}

Would print:

3 hours, 2 minutes and 1 seconds ago

@verdel
Copy link
Author

verdel commented Apr 22, 2024

This is a complicated template, and I suspect there are other users who want to solve the same problem. I think this is a neat "hack" but is there a better way to solve this?

I couldn't find another solution in the case with Loki that would allow getting the timestamp of the moment when the alert-triggering logic is activated, as well as a way to perform operations with this timestamp (adding or subtracting time intervals to get a range).

The only "hack" way is the fact that .StartsAt is of type time.Time and allows the use of corresponding methods within the template. However, for operations with time, we often need to use the time.Duration type, and all the data that comes into the template for processing are "string".

If the grafana_dashboard_range annotation is missing for at least one alert then the entire template will fail and a notification will not be sent. You need to remember to add if statements or add a safe function like duration_or_default.

This is a simplified version of the template, and the responsibility for preparing data in custom templates rests with the end user.

Most often, we use this approach in the template for creating buttons in messages sent to Slack. If there is an error in forming the URL, the button simply will not be displayed in the message.

Most requests I've seen for a duration function so far have been to take a time and print the time since (in other words – the opposite behavior)

I can change the name of the function to eliminate ambiguous interpretation of its functionality. For example, name it parseDuration.

@grobinson-grafana
Copy link
Contributor

I just want to make sure there is no-misunderstanding here too. I'm not against adding a function to parse time durations to do operations on StartsAt and EndsAt. I think we can call it something else like parseDuration as you suggested at the end of your comment.

However, I also think Prometheus, Mimir and Loki should make the StartsAt time and related functions available in annotation templates. This would allow you to template the annotation outside of the Alertmanager which is the correct way to do it for the following reasons:

  1. The correct link is shown when the alert is viewed in Alertmanager UI, Grafana, and other UIs like karma.
  2. I think your solution to this problem is clever but I want to avoid other users from having to come up with the same "hack". The reason I consider it a "hack" is you are basically using the annotation as a template to be templated again in the Alertmanager using regular expressions. This is what I want to avoid by making sure users have the necessary features in the right places.

I appreciate that you cannot do this right now because those features are missing, and why you're having to do it in the notification template instead.

I think we should create an issue in prometheus/prometheus for StartsAt and related time functions to be made available in annotation templates.

@gotjosh what do you think?

Signed-off-by: Vadim Aleksandrov <valeksandrov@me.com>
@verdel verdel force-pushed the add-duration-function-to-templates branch from 740a580 to 99674c4 Compare April 22, 2024 16:32
@verdel verdel changed the title Add duration function to templates Add parseDuration function to templates Apr 22, 2024
@verdel
Copy link
Author

verdel commented Apr 22, 2024

I agree that splitting the logic of forming alert messages and moving part of it to Alertmanager is not the best idea.

I would prefer to have this functionality available on the Prometheus or Loki side. Therefore, I will definitely support the idea of creating an issue in the repositories of these products.

While this functionality is not available on the Prometheus or Loki side, I would still request that my custom function be accepted into the code that handles template processing in Alertmanager. I have renamed the function to parseDuration.

@grobinson-grafana
Copy link
Contributor

Yeah I asked @gotjosh for a second opinion as he is an official maintainer and I'm not 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants