Direct retries to another mongos if one is available #1367

stIncMale · 2024-04-17T01:18:43Z

Given that the current specification wording is internally inconsistent, I confirmed the actual intent with the spec author and created DRIVERS-2901 Clarify the intent behind the list of deprioritized mongos'es and fix the pseudocode.

Performance considerations

Our ServerSelector and ClusterDescription API do not allow us to implement an efficient pipeline (CompositeServerSelector) of ServerSelectors: we can neither mutate the List<ServerDescription> in place, nor mutate ClusterDescription, nor even reuse the same ClusterDescription if a selector did not filter anything out. This PR added one more selector required by the specification logic, and introduced two more selectors by refactoring server selection code that wasn't expressed in terms of ServerSelectors. As a result, it is conceivable that the PR has negative performance impact.

Additionally, due to this refactoring d25010d, each server selection iteration now involves copying the map of Servers maintained by a Cluster. While that copying does not entail locking (all hail the CHM!), it may still have additional negative performance impact.

If we indeed notice performance degradation, we may try mitigating the issue by introducing InternalServerSelector extends ServerSelector (or a subclass of ClusterDescription that allows mutating getServerDescriptions, or both), which allows for a more optimal chaining, and use it for everything but the application-specific selector. This is assuming, of course, that the would be degradation is caused to a large extent by the inefficient selector chaining, and not by the CHM copying.

JAVA-4254, JAVA-5320

Implemented the change and unit tests JAVA-4254, JAVA-5320

JAVA-4254, JAVA-5320

driver-core/src/main/com/mongodb/internal/connection/OperationContext.java

driver-core/src/main/com/mongodb/internal/operation/AsyncOperationHelper.java

driver-core/src/main/com/mongodb/internal/connection/OperationContext.java

JAVA-4254

driver-core/src/main/com/mongodb/internal/connection/OperationContext.java

driver-core/src/main/com/mongodb/internal/connection/BaseCluster.java

driver-core/src/main/com/mongodb/internal/connection/OperationContext.java

katcharov · 2024-05-01T15:54:06Z

driver-core/src/main/com/mongodb/internal/operation/AsyncOperationHelper.java

+                (@Nullable Throwable previouslyChosenException, Throwable mostRecentAttemptException) ->
+                        CommandOperationHelper.onRetryableReadAttemptFailure(
+                                operationContext, previouslyChosenException, mostRecentAttemptException);
+        return new RetryingAsyncCallbackSupplier<>(retryState, onAttemptFailure,


I find it difficult to understand what is happening in these methods, though I am unsure why. I think part of it is the amount of boilerplate: onAttemptFailure = (priorException, currentException) ->, and then inlining, might help.

The other, more objective issue is that onAttemptFailure is being used as a failedResultTransformer, but I would expect such a thing to have no side effects (such as deprioritizing a server in OperationContext).

I updated the docs to reflect that side effects are allowed. The key to them being allowed is that the operator is guaranteed to be called once per failed attempt.

I find it difficult to understand what is happening in these methods...
...inlining, might help...

Could you please clarify what you propose to inline and where?

I am imagining at least something like:

static <R> AsyncCallbackSupplier<R> decorateReadWithRetriesAsync(final RetryState retryState, final OperationContext operationContext, final AsyncCallbackSupplier<R> asyncReadFunction) { return new RetryingAsyncCallbackSupplier<>(retryState, onRetryableReadAttemptFailure(operationContext), CommandOperationHelper::shouldAttemptToRetryRead, logRetryAndGet(retryState, operationContext, asyncReadFunction)); }

This is optional for this PR, and perhaps should be undertaken separately.

It looks like a code extraction into a method, rather than inlining. Done in 69b78ff.

…Context.java Let's put various checks (validation, preconditions...) at the top of methods, with the operation at the bottom. Co-authored-by: Maxim Katcharov <maxim.katcharov@mongodb.com>

vbabanin

LGTM!

JAVA-4254

… side effects JAVA-4254

…tion The new approach allows us to later refactor all other logic inside one or more `ServerSelector`s. See the comment left in the code for more details on the new approach. JAVA-4254

…verSelector` JAVA-4254

…uirements JAVA-4254

JAVA-4254

katcharov · 2024-05-08T16:09:29Z

driver-core/src/main/com/mongodb/internal/connection/BaseCluster.java

+                        .filter(serverDescription -> serversSnapshot.containsServer(serverDescription.getAddress()))
+                        .collect(toList());
+        List<ServerSelector> selectors = Stream.of(
+                raceConditionPreFiltering,


The comment and selector can be moved to their own method:

Suggested change

raceConditionPreFiltering,

inSnapshotSelector(serversSnapshot),

If getCompleteServerSelector gets too big in the future, that may be helpful. For now, however, it does not seem to be needed.

katcharov · 2024-05-08T16:40:11Z

driver-core/src/main/com/mongodb/internal/connection/BaseCluster.java

+        // are of those `Server`s that are known to both `clusterDescription` and `serversSnapshot`.
+        // This way we are guaranteed to successfully get `Server`s from `serversSnapshot` based on the selected `ServerDescription`s.
+        //
+        // The pre-filtering we do to deal with the race condition described above is achieved by this `ServerSelector`.


It seems that this comment should be in a PR comment, rather than in the code, and that there should be a test that ensures items missing from the snapshot get filtered out by this chain. The problem may have been complicated to identify and solve, but now that there is a solution, all the facts mentioned in this comment seem like they should be evident to unfamiliar readers.

I think this comment should be in the code, not somewhere else, because it explains the code, that is not that obvious otherwise.

but now that there is a solution

Note that it has been solved previously, i.e., it is solved in the master, just differently.

all the facts mentioned in this comment seem like they should be evident to unfamiliar readers

This is a huge overstatement in my opinion.

there should be a test that ensures items missing from the snapshot get filtered out by this chain

Done in 246353f.

katcharov · 2024-05-08T16:53:40Z

driver-core/src/main/com/mongodb/internal/connection/BaseCluster.java

+        ServerSelector raceConditionPreFiltering = clusterDescriptionPotentiallyInconsistentWithServerSnapshot ->
+                clusterDescriptionPotentiallyInconsistentWithServerSnapshot.getServerDescriptions()


Suggested change

ServerSelector raceConditionPreFiltering = clusterDescriptionPotentiallyInconsistentWithServerSnapshot ->

clusterDescriptionPotentiallyInconsistentWithServerSnapshot.getServerDescriptions()

ServerSelector raceConditionPreFiltering = clusterDescription ->

clusterDescription.getServerDescriptions()

We would not bother to put a comment saying "this is potentially inconsistent with the server snapshot".

We have a local variable here that has a long name explaining what it is, and the variable is used only once. Why do you think that the generic clusterDescription name is better here?

katcharov · 2024-05-08T17:06:59Z

driver-core/src/main/com/mongodb/internal/connection/BaseCluster.java

+                serverSelector,
+                serverDeprioritization.getServerSelector(),
+                settings.getServerSelector(), // may be null
+                new LatencyMinimizingServerSelector(settings.getLocalThreshold(MILLISECONDS), MILLISECONDS),


We should just pass the settings in to a constructor, and the constructor can figure out which settings are important to latency minimization.

This PR hasn't introduced this constructor, i.e., the instance is created exactly as it was created before the PR. I don't see a reason to make the change in this PR.

katcharov · 2024-05-08T17:20:07Z

driver-core/src/main/com/mongodb/internal/selector/OperationCountMinimizingServerSelector.java

+ * <p>This class is not part of the public API and may be removed or changed at any time</p>
+ */
+@ThreadSafe
+public final class OperationCountMinimizingServerSelector implements ServerSelector {


Something like MinimumOperationCountServerSelector would convey that this returns at most 1 result.

Done in 246353f.

driver-core/src/main/com/mongodb/internal/connection/OperationContext.java

katcharov · 2024-05-08T20:12:47Z

driver-core/src/main/com/mongodb/internal/connection/OperationContext.java

+        }
+
+        void updateCandidate(final ServerAddress serverAddress, final ClusterType clusterType) {
+            candidate = isEnabled(clusterType) ? serverAddress : null;


Suggested change

candidate = isEnabled(clusterType) ? serverAddress : null;

candidate = serverAddress;

I do not think the check is needed here. This potential "server on which the operation failed" will already be correctly excluded, because the ServerDeprioritization server selector is already a no-op when not sharded.

After thinking more about this, I agree that it's fine to omit the check in the updateCandidate method.

Done in 246353f.

katcharov · 2024-05-08T20:15:12Z

driver-core/src/main/com/mongodb/internal/connection/OperationContext.java

+                List<ServerDescription> serverDescriptions = clusterDescription.getServerDescriptions();
+                if (!isEnabled(clusterDescription.getType())) {
+                    return serverDescriptions;
+                } else {
+                    List<ServerDescription> nonDeprioritizedServerDescriptions = serverDescriptions
+                            .stream()
+                            .filter(serverDescription -> !deprioritized.contains(serverDescription.getAddress()))
+                            .collect(toList());
+                    return nonDeprioritizedServerDescriptions.isEmpty() ? serverDescriptions : nonDeprioritizedServerDescriptions;
+                }


Suggested change

List<ServerDescription> serverDescriptions = clusterDescription.getServerDescriptions();

if (!isEnabled(clusterDescription.getType())) {

return serverDescriptions;

} else {

List<ServerDescription> nonDeprioritizedServerDescriptions = serverDescriptions

.stream()

.filter(serverDescription -> !deprioritized.contains(serverDescription.getAddress()))

.collect(toList());

return nonDeprioritizedServerDescriptions.isEmpty() ? serverDescriptions : nonDeprioritizedServerDescriptions;

}

List<ServerDescription> serverDescriptions = clusterDescription.getServerDescriptions();

if (!isEnabled(clusterDescription.getType())) {

return serverDescriptions;

}

List<ServerDescription> nonDeprioritizedServerDescriptions = serverDescriptions

.stream()

.filter(serverDescription -> !deprioritized.contains(serverDescription.getAddress()))

.collect(toList());

return nonDeprioritizedServerDescriptions.isEmpty()

? serverDescriptions : nonDeprioritizedServerDescriptions;

Done in 246353f.

katcharov · 2024-05-08T20:29:50Z

driver-core/src/main/com/mongodb/internal/operation/AsyncOperationHelper.java

+                (@Nullable Throwable previouslyChosenException, Throwable mostRecentAttemptException) ->
+                        CommandOperationHelper.onRetryableReadAttemptFailure(
+                                operationContext, previouslyChosenException, mostRecentAttemptException);
+        return new RetryingAsyncCallbackSupplier<>(retryState, onAttemptFailure,


I am imagining at least something like:

static <R> AsyncCallbackSupplier<R> decorateReadWithRetriesAsync(final RetryState retryState, final OperationContext operationContext, final AsyncCallbackSupplier<R> asyncReadFunction) { return new RetryingAsyncCallbackSupplier<>(retryState, onRetryableReadAttemptFailure(operationContext), CommandOperationHelper::shouldAttemptToRetryRead, logRetryAndGet(retryState, operationContext, asyncReadFunction)); }

This is optional for this PR, and perhaps should be undertaken separately.

JAVA-4254

stIncMale added 2 commits April 16, 2024 00:48

Direct retries to another mongos if one is available

e95ca0b

Implemented the change and unit tests JAVA-4254, JAVA-5320

Implement specification prose tests

d1c1ed2

JAVA-4254, JAVA-5320

stIncMale self-assigned this Apr 17, 2024

stIncMale requested review from katcharov, a team and vbabanin and removed request for a team April 17, 2024 01:19

stIncMale added 2 commits April 16, 2024 21:51

Fix a typo

0875e5d

JAVA-4254, JAVA-5320

Expect MongoServerException instead of just RuntimeException

6fe3db1

JAVA-4254, JAVA-5320

vbabanin reviewed Apr 26, 2024

View reviewed changes

stIncMale added 3 commits April 29, 2024 19:48

Fix ServerDeprioritization.onAttemptFailure

7cba88b

JAVA-4254

Merge branch 'master' into JAVA-4254

da6d111

Replace BiFunction with BinaryOperator

e4ffab4

JAVA-4254

stIncMale requested a review from vbabanin April 30, 2024 05:59

katcharov requested changes May 1, 2024

View reviewed changes

Update driver-core/src/main/com/mongodb/internal/connection/Operation…

29ebd76

…Context.java Let's put various checks (validation, preconditions...) at the top of methods, with the operation at the bottom. Co-authored-by: Maxim Katcharov <maxim.katcharov@mongodb.com>

vbabanin approved these changes May 2, 2024

View reviewed changes

stIncMale added 9 commits May 2, 2024 17:56

Trivial code improvements

6574e24

JAVA-4254

Update the docs of the internal API for retries to reflect support of…

171d67e

… side effects JAVA-4254

Merge branch 'master' into JAVA-4254

4d883c1

Refactor the way BaseCluster.selectServer deals with the race condi…

d25010d

…tion The new approach allows us to later refactor all other logic inside one or more `ServerSelector`s. See the comment left in the code for more details on the new approach. JAVA-4254

Refactor the server selection logic that is implemented not as a `Ser…

cc8021e

…verSelector` JAVA-4254

Implement DeprioritizingSelector strictly according to the spec req…

b3430bd

…uirements JAVA-4254

Merge branch 'master' into JAVA-4254

16df106

Do minor touches

04880c7

JAVA-4254

Update the documentation of ClusterSettings.getServerSelector

c1e9e4e

JAVA-4254

stIncMale requested review from katcharov and vbabanin May 7, 2024 23:32

katcharov reviewed May 8, 2024

View reviewed changes

stIncMale added 2 commits May 8, 2024 17:04

Address review concerns

246353f

JAVA-4254

Implement the proposed code simplification

69b78ff

JAVA-4254

stIncMale requested a review from katcharov May 8, 2024 23:43

Link from BaseClusterTest back to BaseClusterSpecification

cbb1938

JAVA-4254

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Direct retries to another mongos if one is available #1367

Direct retries to another mongos if one is available #1367

stIncMale commented Apr 17, 2024 •

edited

katcharov May 1, 2024

stIncMale May 3, 2024

katcharov May 8, 2024

stIncMale May 8, 2024

vbabanin left a comment

katcharov May 8, 2024

stIncMale May 8, 2024

katcharov May 8, 2024

stIncMale May 8, 2024

stIncMale May 8, 2024

katcharov May 8, 2024

stIncMale May 8, 2024

katcharov May 8, 2024

stIncMale May 8, 2024

katcharov May 8, 2024

stIncMale May 8, 2024

katcharov May 8, 2024

stIncMale May 8, 2024

katcharov May 8, 2024

stIncMale May 8, 2024

katcharov May 8, 2024

	raceConditionPreFiltering,
	inSnapshotSelector(serversSnapshot),

		ServerSelector raceConditionPreFiltering = clusterDescriptionPotentiallyInconsistentWithServerSnapshot ->
		clusterDescriptionPotentiallyInconsistentWithServerSnapshot.getServerDescriptions()

	candidate = isEnabled(clusterType) ? serverAddress : null;
	candidate = serverAddress;

Direct retries to another mongos if one is available #1367

Are you sure you want to change the base?

Direct retries to another mongos if one is available #1367

Conversation

stIncMale commented Apr 17, 2024 • edited

Performance considerations

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vbabanin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stIncMale commented Apr 17, 2024 •

edited