New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add job management docs for cutover in physical cluster replication jobs #18525
Conversation
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify site configuration. |
117611e
to
57eda19
Compare
|
||
[Changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) will fail on the promoted cluster immediately after cutover. We recommend that you recreate changefeeds on the promoted cluster. | ||
|
||
[Scheduled changefeeds]({% link {{ page.version.version }}/create-schedule-for-changefeed.md %}) will continue on the promoted cluster. You will need to manage [pausing]({% link {{ page.version.version }}/pause-schedules.md %}) or [canceling]({% link {{ page.version.version }}/drop-schedules.md %}) the schedule on the original primary and promoted standby clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msbutler I don't know if this is quite correct. I made an assumption of what would happen here because scheduled changefeeds are a one-time table scan rather than a continuous job like a regular changefeed. Please correct me!
Also, I have not added this as a limitation yet, do we want to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you have here is fine! Perhaps you could explain why we recommend some manual intervention: we don't recommend two clusters writing changefeeds to the same sink.
We should definitely add a known limitation for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conducted a close read of the 23.2 version, assuming 24.1 version is basically the same.
|
||
### Changefeeds | ||
|
||
[Changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) will fail on the promoted cluster immediately after cutover. We recommend that you recreate changefeeds on the promoted cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not a cdc expert, but its probably worth mentioning why they fail (they fail after cluster restore, for example as well). I think we fail them because we don't want two seperate clusters running a changefeed to the same sink, right?
|
||
[Changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) will fail on the promoted cluster immediately after cutover. We recommend that you recreate changefeeds on the promoted cluster. | ||
|
||
[Scheduled changefeeds]({% link {{ page.version.version }}/create-schedule-for-changefeed.md %}) will continue on the promoted cluster. You will need to manage [pausing]({% link {{ page.version.version }}/pause-schedules.md %}) or [canceling]({% link {{ page.version.version }}/drop-schedules.md %}) the schedule on the original primary and promoted standby clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you have here is fine! Perhaps you could explain why we recommend some manual intervention: we don't recommend two clusters writing changefeeds to the same sink.
We should definitely add a known limitation for this.
ab321b2
to
2d1acb9
Compare
{{site.data.alerts.end}} | ||
|
||
### Changefeeds | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msbutler I added the known limitations around scheduled changefeeds for this. Also updated the changefeed text.
@rharding6373 could you take a look at the changefeed text here to confirm that "two clusters running the same changefeed to one sink" is the reason that we fail changefeeds on full cluster restore (and in this case cutover)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct. Thanks for checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Rachael!
@@ -0,0 +1 @@ | |||
After the [cutover process]({% link {{ page.version.version }}/cutover-replication.md %}) for [physical cluster replication]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}), [scheduled changefeeds]({% link {{ page.version.version }}/create-schedule-for-changefeed.md %}) will continue on the promoted cluster. You will need to manage [pausing]({% link {{ page.version.version }}/pause-schedules.md %}) or [canceling]({% link {{ page.version.version }}/drop-schedules.md %}) the schedule on the original primary and promoted standby clusters to avoid two clusters running the same changefeed to one sink. [Tracking GitHub issue](https://github.com/cockroachdb/cockroach/issues/123776) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to docs team reviewers: the known limitation tracking GH issue link has different formats between v23.2 + v24.1 following the update to known limitations for GA.
@@ -0,0 +1 @@ | |||
After the [cutover process]({% link {{ page.version.version }}/cutover-replication.md %}) for [physical cluster replication]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}), [scheduled changefeeds]({% link {{ page.version.version }}/create-schedule-for-changefeed.md %}) will continue on the promoted cluster. You will need to manage [pausing]({% link {{ page.version.version }}/pause-schedules.md %}) or [canceling]({% link {{ page.version.version }}/drop-schedules.md %}) the schedule on the original primary and promoted standby clusters to avoid two clusters running the same changefeed to one sink. [#123776](https://github.com/cockroachdb/cockroach/issues/123776) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i think we should only instruct the user to pause or cancel on the newly promoted cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msbutler What is the expectation for users when the scheduled backup is paused on the promoted cluster; that they pause or cancel the backup schedule on the original cluster? Assume cancel given the storage/collection possible collision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msbutler Ah, I realize now (I think...) that I got the emphasis wrong on your comment; that is, let's only talk about the newly promoted cluster. I have updated to this effect! 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
e37f4d2
to
453cba9
Compare
453cba9
to
098ea87
Compare
TFTRs! |
Fixes DOC-8998
This PR adds detail on jobs management to the cutover page under physical cluster replication. This affects scheduled jobs and changefeeds.