Skip to content

Runbook: Security Coordination Emergency Bug Response with Chain Validators

Jessy Irwin edited this page May 3, 2023 · 5 revisions

Security Coordination: Incident Response for the Agoric Network

In the course of operating a node on any network, security incidents or adverse conditions will arise. During these times, whether the issue is a platform bug or a vulnerable dependency that requires emergency patching through an off-chain coordination process, it is important to have a rapid response plan in place to facilitate coordination between node operators and core protocol developers.

In the event of a security emergency, Agoric OpCo is committed to providing all node operators with equitable access to information, and a security notification distribution email list will be the primary contact tool to inform validators of recommended coordination activities to mitigate or remediate security risk to the network.

Security Disclosure Group

There are several tools and resources to support community-lead off-chain network coordination. To be notified of emergency security coordination, we recommend that all validators:

All participants in the security coordination group should refrain from sharing confidential information about security vulnerabilities with others outside of a network response disclosure group. Additionally, participants should avoid using confidential information about security vulnerabilities for individual gain.

Security Coordination Process

To support the operation of a robust, secure, decentralized network, the Agoric OpCo security team runs an on-call rotation, monitors network health, coordinates with external parties (e.g. Cosmos developers), and maintains emergency patch distribution processes.

If the Agoric core developers have knowledge of a software vulnerability or incident of active exploitation on the network:

  • Core developers will triage and reproduce the issue to validate impact and severity, and evaluate how to best mitigate or remediate the root cause of the issue.
  • Core developers may consider developing detection tools to investigate active exploitation of security vulnerabilities.
  • Core developers will release a security advisory for a High or Critical severity issue to notify impacted parties to prepare for an emergency patch.
  • If a software upgrade is necessary, core developers will work quickly to create, validate, and distribute an emergency patch with an install guide to validators.
  • If a temporary chain halt or node configuration changes are required, core developers may make recommendations to validators, but validators will decide and coordinate any activity required to protect the chain.

Once a patch is released or consensus on emergency response actions is reached:

  • Validators are responsible for deciding to test and apply security patches to their nodes.
  • Validators will lead any required coordination of on-chain governance, network upgrades, chain halts, hard forks, or timing for upgrades as necessary to resolve the security emergency.
  • For incidents involving the Agoric stack, core developers will collaborate with chain validators to gather facts, confirm a timeline of events, and publish a public retrospective about the security emergency within one month of the security patch release. Once the discussion period has closed, this retrospective may be shared as a signaling proposal as a way to document off-chain coordination on the Agoric chain.

While core developers are troubleshooting an issue or working directly with validators during upgrades, they will not be able to respond to direct messages and will aim to share information in the channels dedicated to validator collaboration. If validators have questions about incident response or emergency coordination, please reach out to the Agoric Security team at security@agoric.com.

Discussion of this topic, including recommendations and edits, is located in Issue #4012.

Clone this wiki locally