Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CDCSDK] Fix race condition between tablet creation and addition of table to the stream in the background thread #22408

Closed
yugabyte-ci opened this issue May 15, 2024 · 0 comments
Assignees
Labels
area/cdcsdk CDC SDK jira-originated kind/bug This issue is a bug priority/high High Priority

Comments

@yugabyte-ci
Copy link
Contributor

yugabyte-ci commented May 15, 2024

Jira Link: DB-11311

@yugabyte-ci yugabyte-ci added area/cdcsdk CDC SDK jira-originated kind/bug This issue is a bug priority/low Low priority status/awaiting-triage Issue awaiting triage labels May 15, 2024
@yugabyte-ci yugabyte-ci changed the title [CDCSDK] Fix flaky test TestRetentionBarrierRaceWithUpdatePeersAndMetrics [CDCSDK] Fix race condition between tablet creation and addition of table to the stream in the background thread May 15, 2024
@yugabyte-ci yugabyte-ci added priority/high High Priority and removed priority/low Low priority status/awaiting-triage Issue awaiting triage labels May 15, 2024
Sumukh-Phalgaonkar added a commit that referenced this issue May 22, 2024
Summary:
There is a possible race condition when during dynamic table creation, the tablet creation is not completed i.e the retention barriers have not yet been set, but the background thread proceeds and adds the table to the stream. In such a case till the time the tablet creation succeeds, the retention barriers are not set on the tablets of this table, even though it has been added to the stream.

This was observed as flakiness in the test `TestRetentionBarrierRaceWithUpdatePeersAndMetrics` .

This diff introduces a fix for this race condition by calling GetTableLocations on the table before adding it to the stream. If the tablets are not yet initialized then this will fail and the addition of the table to the stream will take place in the next round of the background task.
Jira: DB-11311

Test Plan: Jenkins: test regex: .*CDCSDKConsumptionConsistentChangesTest.*

Reviewers: asrinivasan, stiwary, skumar

Reviewed By: asrinivasan

Subscribers: ybase, ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35160
svarnau pushed a commit that referenced this issue May 25, 2024
Summary:
There is a possible race condition when during dynamic table creation, the tablet creation is not completed i.e the retention barriers have not yet been set, but the background thread proceeds and adds the table to the stream. In such a case till the time the tablet creation succeeds, the retention barriers are not set on the tablets of this table, even though it has been added to the stream.

This was observed as flakiness in the test `TestRetentionBarrierRaceWithUpdatePeersAndMetrics` .

This diff introduces a fix for this race condition by calling GetTableLocations on the table before adding it to the stream. If the tablets are not yet initialized then this will fail and the addition of the table to the stream will take place in the next round of the background task.
Jira: DB-11311

Test Plan: Jenkins: test regex: .*CDCSDKConsumptionConsistentChangesTest.*

Reviewers: asrinivasan, stiwary, skumar

Reviewed By: asrinivasan

Subscribers: ybase, ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35160
svarnau pushed a commit that referenced this issue May 29, 2024
…ble creation.

Summary:
Original commit: 6e6cb2a / D35160
There is a possible race condition when during dynamic table creation, the tablet creation is not completed i.e the retention barriers have not yet been set, but the background thread proceeds and adds the table to the stream. In such a case till the time the tablet creation succeeds, the retention barriers are not set on the tablets of this table, even though it has been added to the stream.

This was observed as flakiness in the test `TestRetentionBarrierRaceWithUpdatePeersAndMetrics` .

This diff introduces a fix for this race condition by calling GetTableLocations on the table before adding it to the stream. If the tablets are not yet initialized then this will fail and the addition of the table to the stream will take place in the next round of the background task.

######Backport Description
No merge conflicts were encountered.

Jira: DB-11311

Test Plan: Jenkins: test regex: .*CDCSDKConsumptionConsistentChangesTest.*

Reviewers: asrinivasan, stiwary, skumar

Reviewed By: asrinivasan

Subscribers: ycdcxcluster, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35263
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cdcsdk CDC SDK jira-originated kind/bug This issue is a bug priority/high High Priority
Projects
None yet
Development

No branches or pull requests

2 participants