diff --git a/source/manual/alerts/email-alert-api-delivery-attempt-status.html.md b/source/manual/alerts/email-alert-api-delivery-attempt-status.html.md deleted file mode 100644 index 64328785a9..0000000000 --- a/source/manual/alerts/email-alert-api-delivery-attempt-status.html.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -owner_slack: "#govuk-2ndline" -title: 'Email Alert API: High number of delivery attempts' -section: Icinga alerts -subsection: Email alerts -layout: manual_layout -parent: "/manual.html" -last_reviewed_on: 2020-07-16 -review_in: 6 months ---- - -The first thing to do is determine what kind of failure is affecting the -delivery attempts. - -## Internal failures (`internal_failure`) - -This means that we’ve failed to make a request to Notify within the last hour -due to a problem in our code. The reason for this failure should be visible in -Sentry. - -## Technical failures (`technical_failure`) - -This means that we’ve received a technical failure status code back from Notify -or a request to send an email via Notify failed within the last hour. This -means that there may be a problem with our system or that Notify is unable to -send emails. - -Emails given a `technical_failure` status will **not** be retried automatically. -To retry these you will need to use the [resend tasks]. - -In non-production environments, this failure may also mean that we’re -attempting to send emails to people who are not members of the Notify team for -the relevant environment. - -In this case, ensure the contents of the -`govuk::apps::email_alert_api::email_address_override_whitelist` key in -[hieradata][] and [hieradata_aws][] matches the members of the -staging/integration Notify teams. - -You can login to the Notify account by going to the -[GOV.UK Notify Admin Interface][notify]. The login credentials are in the -[2nd line password store][password-store] under -`govuk-notify/2nd-line-support`. - -## Still stuck? - -Read [email troubleshooting]. - -[email troubleshooting]: /manual/email-troubleshooting.html -[notify]: https://www.notifications.service.gov.uk -[hieradata]: https://github.com/alphagov/govuk-puppet/blob/master/hieradata/common.yaml -[hieradata_aws]: https://github.com/alphagov/govuk-puppet/blob/master/hieradata_aws/common.yaml -[password-store]: https://github.com/alphagov/govuk-secrets/tree/master/pass/2ndline/govuk-notify -[resend tasks]: /apis/email-alert-api/support-tasks.html#resend-failed-emails diff --git a/source/manual/alerts/email-alert-api-high-retry-queue-size.html.md b/source/manual/alerts/email-alert-api-high-retry-queue-size.html.md index 97c269af0f..a35e651d1d 100644 --- a/source/manual/alerts/email-alert-api-high-retry-queue-size.html.md +++ b/source/manual/alerts/email-alert-api-high-retry-queue-size.html.md @@ -11,12 +11,14 @@ review_in: 6 months ### High retry queue size (retry_size) -This means there are a high number of items in the retry queue. The Email Alert -API relies on the retry queue for rate limiting, so it’s not unusual to see -items in the queue, but if it is very high it suggests that there may be a -problem down the line which is preventing jobs from being processed. It may -also imply the threshold is too low if a large number of emails have been sent -out due to a content change. +This means there are a high number of items in the retry queue. This happens +when an error occurs running a Sidekiq worker and the job is queued to retry. +A high amount of items indicates that there is a potential problem in +communicating with Notify. To investigate the cause you should consult the +Email Alert API Sidekiq logs in [Kibana][kibana] +(`application:email-alert-api AND tags:sidekiq`) and +[Email Alert API Sentry][sentry]. Note, that due to transitory network issues +communicating with Notify there are often a small amount of items in the queue. See the [sidekiq][sidekiq] section for more information about the Sidekiq queues, or read [email troubleshooting].