Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a DR overview focused on resiliency with comparison for HA & DR #18490

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kathancox
Copy link
Contributor

@kathancox kathancox commented Apr 18, 2024

Fixes DOC-9928, DOC-9929

This PR (in draft) adds a DR overview page to direct users toward establishing resiliency in their deployments. Currently included this as an overview page for DR page, but there are other options.

Rendered preview

Copy link

github-actions bot commented Apr 18, 2024

Files changed:

Copy link

netlify bot commented Apr 18, 2024

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit 46598c8
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-interactivetutorials-docs/deploys/664f7858df40d90008b536b6

Copy link

netlify bot commented Apr 18, 2024

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit 46598c8
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-api-docs/deploys/664f7858836cae0008664dfb

Copy link

netlify bot commented Apr 18, 2024

Netlify Preview

Name Link
🔨 Latest commit 46598c8
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-docs/deploys/664f78580c56d30008ebfd22
😎 Deploy Preview https://deploy-preview-18490--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Add overview with comparative strategies for DR & HA
@kathancox kathancox force-pushed the dr-resiliency-comp-overview branch from 3c7eeca to 52ebe0f Compare May 21, 2024 15:24
@kathancox kathancox marked this pull request as ready for review May 21, 2024 15:25
@kathancox kathancox requested a review from alicia-l2 May 21, 2024 15:26
Copy link

@alicia-l2 alicia-l2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments/proposed edits, thanks!

Resilient deployments aim for continuity in database operation to protect from data loss and down time. To maintain resiliency, it is necessary to build deployments with _high availability_ and _disaster recovery_ coverage.

- [High availability](#choose-a-high-availability-strategy): Continuous and uninterrupted access to data even in the presence of failures or disruptions to maximize uptime.
- [Disaster recovery](#choose-a-disaster-recovery-strategy): Recovery from a major incident or disaster to minimize downtime and data loss.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recover instead of Recovery?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, like it. To match the other bullet point, I have changed so they are both verbs.


As you evaluate CockroachDB's disaster recovery features, consider your organization's requirements for the amount of tolerable data loss and the acceptable length of time to recover.

- Recovery Point Objective (RPO): The maximum amount of time that an organization can tolerate losing data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"maximum amount of data loss – as measured by time – that an organization can tolerate." Maybe this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, with parenthesis.

<table class="comparison-chart">
<tr>
<th></th>
<th>Single-region replication</th>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding "synchronous" replication here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to both columns here.

<b>Fault tolerance</b>
</td>
<td>Zero RPO node, availability zone failures</td>
<td>Zero RPO node, availability zone failures</td>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multi-region one should also be able to survive a region failure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

<b>Fault tolerance</b>
</td>
<td>Not applicable</td>
<td>Zero RPO node, availability zone region failure with loss up to RPO</td>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i think a comma is needed after 'availability zone'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, looks like it! Added the comma!

@kathancox kathancox requested a review from alicia-l2 May 23, 2024 14:14
Copy link

@alicia-l2 alicia-l2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few more comments!


As you evaluate CockroachDB's disaster recovery features, consider your organization's requirements for the amount of tolerable data loss and the acceptable length of time to recover.

- Recovery Point Objective (RPO): The maximum amount of data loss (measured by time) that an organization can tolerate losing data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "losing data" should be removed?

toc: true
---

Resilient deployments aim for continuity in database operation to protect from data loss and down time. To maintain resiliency, it is necessary to build deployments with _high availability_ and _disaster recovery_ coverage.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database operation continuity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't changed this. Having "database operation" modify "continuity" feels a little harder to read. I have left as-is for now — hopefully my docs review partner may have an idea here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants