Skip to content

Emergency Runbook

Tabatha D Zeitke edited this page Dec 21, 2022 · 11 revisions

This site is hosted on Gatsby Cloud and is maintained / supported by New Relic's Docs Team team.

Helpful links

Troubleshooting

Scenario Severity Resolution
Site is not loading ❗ High Rollback a release
All localized pages are throwing 500s ❗ High Rollback a release
Functionality is broken ⚠️ Medium Rollback a release
Alert has been triggered ⚠️ Medium Respond to an incident
Copy needs to be adjusted 👀 Unknown Ping @hero in #help-documentation or Use leave a comment in Feedback form on the relevant doc page to generate a Jira ticket

Rollback a release

If the site is not loading, or a piece of functionality is broken, you will likely need to rollback to a stable release using the following steps. There are two ways to rollback a release:

Via Gatsby Cloud

  1. Log into Gatsby Cloud with Github two-factor.
  2. Select the docs-website - main site.
  3. Scroll down to Build history to see all the previous builds that have published.
  4. Find the appropriate build to roll back to. Click Publish to deploy that build of the site.

Via Github

If you do not have access to Gatsby Cloud, you can perform a rollback using Github:

  1. Find the pull request (into main) that you would like to rollback.
  2. Click Revert to create a new pull request that undoes this work.
  3. Have someone review the rollback and approve the pull request.
  4. Once the necessary checks have passed, merge into main.
  5. A build will be triggered in Gatsby Cloud. Once complete, the rollback will be released.

Respond to an incident

The following steps are for on-call engineers working at New Relic:

  1. Don't panic, you've got this!
  2. Check to see if there is already an ongoing incident in #emergency-room (or in 2, 3, and 4).
  3. If there is not an ongoing incident, start one by following the steps in the Incident Commander Runbook.
  4. Refer to the troubleshooting dashboard to get an idea for what could be going on.
  5. Look at the recent deployments to production to identify a PR that can be reverted.