Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add job management docs for cutover in physical cluster replication jobs #18525

Merged
merged 4 commits into from May 16, 2024

Conversation

kathancox
Copy link
Contributor

Fixes DOC-8998

This PR adds detail on jobs management to the cutover page under physical cluster replication. This affects scheduled jobs and changefeeds.

Copy link

netlify bot commented May 7, 2024

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit 098ea87
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-interactivetutorials-docs/deploys/6646261384545400082f1ae3

Copy link

netlify bot commented May 7, 2024

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit 098ea87
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-api-docs/deploys/66462613409e880008d32520

Copy link

netlify bot commented May 7, 2024

Netlify Preview

Name Link
🔨 Latest commit 098ea87
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-docs/deploys/66462613a7c5e00008d6034a
😎 Deploy Preview https://deploy-preview-18525--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@kathancox kathancox force-pushed the pcr-schedule-pause-destination branch from 117611e to 57eda19 Compare May 7, 2024 17:21

[Changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) will fail on the promoted cluster immediately after cutover. We recommend that you recreate changefeeds on the promoted cluster.

[Scheduled changefeeds]({% link {{ page.version.version }}/create-schedule-for-changefeed.md %}) will continue on the promoted cluster. You will need to manage [pausing]({% link {{ page.version.version }}/pause-schedules.md %}) or [canceling]({% link {{ page.version.version }}/drop-schedules.md %}) the schedule on the original primary and promoted standby clusters.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msbutler I don't know if this is quite correct. I made an assumption of what would happen here because scheduled changefeeds are a one-time table scan rather than a continuous job like a regular changefeed. Please correct me!
Also, I have not added this as a limitation yet, do we want to do so?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you have here is fine! Perhaps you could explain why we recommend some manual intervention: we don't recommend two clusters writing changefeeds to the same sink.

We should definitely add a known limitation for this.

@kathancox kathancox requested a review from msbutler May 7, 2024 17:42
Copy link

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conducted a close read of the 23.2 version, assuming 24.1 version is basically the same.


### Changefeeds

[Changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) will fail on the promoted cluster immediately after cutover. We recommend that you recreate changefeeds on the promoted cluster.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not a cdc expert, but its probably worth mentioning why they fail (they fail after cluster restore, for example as well). I think we fail them because we don't want two seperate clusters running a changefeed to the same sink, right?


[Changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) will fail on the promoted cluster immediately after cutover. We recommend that you recreate changefeeds on the promoted cluster.

[Scheduled changefeeds]({% link {{ page.version.version }}/create-schedule-for-changefeed.md %}) will continue on the promoted cluster. You will need to manage [pausing]({% link {{ page.version.version }}/pause-schedules.md %}) or [canceling]({% link {{ page.version.version }}/drop-schedules.md %}) the schedule on the original primary and promoted standby clusters.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you have here is fine! Perhaps you could explain why we recommend some manual intervention: we don't recommend two clusters writing changefeeds to the same sink.

We should definitely add a known limitation for this.

@kathancox kathancox force-pushed the pcr-schedule-pause-destination branch from ab321b2 to 2d1acb9 Compare May 9, 2024 16:42
{{site.data.alerts.end}}

### Changefeeds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msbutler I added the known limitations around scheduled changefeeds for this. Also updated the changefeed text.
@rharding6373 could you take a look at the changefeed text here to confirm that "two clusters running the same changefeed to one sink" is the reason that we fail changefeeds on full cluster restore (and in this case cutover)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct. Thanks for checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Rachael!

@kathancox kathancox requested a review from msbutler May 9, 2024 16:44
@@ -0,0 +1 @@
After the [cutover process]({% link {{ page.version.version }}/cutover-replication.md %}) for [physical cluster replication]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}), [scheduled changefeeds]({% link {{ page.version.version }}/create-schedule-for-changefeed.md %}) will continue on the promoted cluster. You will need to manage [pausing]({% link {{ page.version.version }}/pause-schedules.md %}) or [canceling]({% link {{ page.version.version }}/drop-schedules.md %}) the schedule on the original primary and promoted standby clusters to avoid two clusters running the same changefeed to one sink. [Tracking GitHub issue](https://github.com/cockroachdb/cockroach/issues/123776)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to docs team reviewers: the known limitation tracking GH issue link has different formats between v23.2 + v24.1 following the update to known limitations for GA.

@@ -0,0 +1 @@
After the [cutover process]({% link {{ page.version.version }}/cutover-replication.md %}) for [physical cluster replication]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}), [scheduled changefeeds]({% link {{ page.version.version }}/create-schedule-for-changefeed.md %}) will continue on the promoted cluster. You will need to manage [pausing]({% link {{ page.version.version }}/pause-schedules.md %}) or [canceling]({% link {{ page.version.version }}/drop-schedules.md %}) the schedule on the original primary and promoted standby clusters to avoid two clusters running the same changefeed to one sink. [#123776](https://github.com/cockroachdb/cockroach/issues/123776)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i think we should only instruct the user to pause or cancel on the newly promoted cluster.

Copy link
Contributor Author

@kathancox kathancox May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msbutler What is the expectation for users when the scheduled backup is paused on the promoted cluster; that they pause or cancel the backup schedule on the original cluster? Assume cancel given the storage/collection possible collision?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msbutler Ah, I realize now (I think...) that I got the emphasis wrong on your comment; that is, let's only talk about the newly promoted cluster. I have updated to this effect! 🙃

@kathancox kathancox requested a review from msbutler May 15, 2024 14:29
Copy link
Contributor

@Amruta-Ranade Amruta-Ranade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kathancox kathancox force-pushed the pcr-schedule-pause-destination branch from e37f4d2 to 453cba9 Compare May 16, 2024 15:08
@kathancox kathancox force-pushed the pcr-schedule-pause-destination branch from 453cba9 to 098ea87 Compare May 16, 2024 15:28
@kathancox kathancox merged commit dd912ec into main May 16, 2024
6 checks passed
@kathancox kathancox deleted the pcr-schedule-pause-destination branch May 16, 2024 15:40
@kathancox
Copy link
Contributor Author

TFTRs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants