Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GODRIVER-2101 Direct read/write retries to another mongos if possible #1358

Merged
merged 11 commits into from
Sep 12, 2023

Conversation

prestonvasquez
Copy link
Collaborator

@prestonvasquez prestonvasquez commented Aug 18, 2023

GODRIVER-2101

Summary

When possible, deprioritize failed mongos during retry attempts.

Background & Motivation

Our current retry logic for sharded clusters can lead to an operation that failed with a retryable error being retried on the same mongos.

@prestonvasquez prestonvasquez marked this pull request as ready for review August 25, 2023 17:23
@prestonvasquez prestonvasquez requested a review from a team as a code owner August 25, 2023 17:23
@prestonvasquez prestonvasquez requested review from qingyang-hu and matthewdale and removed request for a team August 25, 2023 17:23
return nil, err
}

filteredServers := filterDeprioritizedServers(selectedServers, oss.deprioritizedServers)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was designed to add more filters if the need arrises in the future.

matthewdale
matthewdale previously approved these changes Sep 11, 2023
Copy link
Collaborator

@matthewdale matthewdale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with one question about the behavior when CSOT is enabled 👍

Comment on lines +293 to +296
// Note that setting this value greater than 2 will result in false
// negatives. The current specification does not account for CSOT, which
// might allow for an "inifinite" number of retries over a period of time.
// Because of this, we only track the "previous server".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a task for updating the retryable reads/writes "deprioritized mongos" behavior to account for multiple retries (i.e. CSOT)? The vast majority of sharded clusters have >2 mongos nodes, so that seems like a questionably useful feature for drivers that support CSOT.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, there is no task to do this. Here are a couple of reasons from discussions with @comandeo:

  • We do not want this new mechanism to replace SDAM/interfere with SDAM too much.
  • We believe that mongos may recover from the error fast enough, and there is no reason to exclude ones that failed earlier
  • It is rather a rare occasion that multiple mongoses fail with retryable errors. This looks like a network issue, and this is handled by SDAM

qingyang-hu
qingyang-hu previously approved these changes Sep 11, 2023
@github-actions
Copy link

API Change Report

No changes found!

@prestonvasquez prestonvasquez added this pull request to the merge queue Sep 12, 2023
Merged via the queue into mongodb:master with commit d92f20d Sep 12, 2023
21 checks passed
@prestonvasquez prestonvasquez deleted the GODRIVER-2101 branch September 12, 2023 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants