Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(NODE-6090): Implement CSOT logic for server selection and connection checkout #4095

Open
wants to merge 58 commits into
base: main
Choose a base branch
from

Conversation

W-A-James
Copy link
Contributor

@W-A-James W-A-James commented Apr 23, 2024

Description

What is changing?

New Error
  • Added MongoOperationTimeoutError class that is thrown when a CSOT timeout is encountered
Changes to Timeout
  • Add Timeout.throwIfExpired() method
  • Add Timeout.remainingTime getter method
Updates to AbstractOperation
  • Add timeout field
    • timeout is set at construction if the timeoutMS option is provided
Implementing CSOT behaviour for server selection
  • Update Topology.selectServer to accept a timeout option which it will use determine whether it has timed out when defined. Otherwise, constructs a Timeout using the serverSelectionMS option as before
  • Update Topology.selectServer to throw a MongoOperationTimeoutError on timeout when options.timeout is provided and retain previous error behaviour otherwise.
  • Update Topology._connect to pass down timeout to Server.command call used to execute ping on first connection
Implementing CSOT behaviour for connection checkout
  • Update Server.command to accept timeout option.
  • Update ConnectionPool.checkOut to accept timeout option
    • only uses the passed in timeout if the configured serverSelectionTimeoutMS is greater than the duration on the timeout, otherwise, computes the time elapsed since server selection completed and creates timeout for the serverSelectionTimeoutMS deadline
Test changes
  • Implement Server Selection prose tests from CSOT spec. Left out last two since they require implementation of the connection creation cancellation behaviour that was decided against in DRIVERS-2347
  • Implement unit tests from CSOT spec related to server selection and connection checkout. Skipped tests with appropriate messages
Misc changes
  • Update resolveOptions to handle timeoutMS option propagation
    • Add csotMin helper method that implements the CSOT min algorithm described here
Is there new documentation needed for these changes?

What is the motivation for this change?

Release Highlight

Fill in title or leave empty for no highlight

Double check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

src/cmap/connect.ts Outdated Show resolved Hide resolved
src/cmap/connect.ts Outdated Show resolved Hide resolved
src/cmap/connection.ts Outdated Show resolved Hide resolved
src/cmap/connection_pool.ts Outdated Show resolved Hide resolved
src/cmap/connection_pool.ts Outdated Show resolved Hide resolved
src/error.ts Show resolved Hide resolved
src/operations/operation.ts Outdated Show resolved Hide resolved
@W-A-James W-A-James changed the title Node 6090 refactor(NODE-6090): Implement CSOT logic for server selection and connection checkout May 3, 2024
if (options?.timeout) {
// CSOT enabled
// Determine if we're using the timeout passed in or a new timeout
if (options.timeout.duration === 0 || serverSelectionTimeoutMS < options.timeout.duration) {
Copy link
Contributor Author

@W-A-James W-A-James May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if configured serverSelectionTimeoutMS < configured timeoutMS then: // i.e. we used serverSelectionTimeoutMS as the timeout for selectServer
	create a timeout that expires after (serverSelectionTimeoutMS - timeElapsed)
else:
	reuse timeout passed in via options

@aditi-khare-mongoDB aditi-khare-mongoDB added Team Review Needs review from team and removed Primary Review In Review with primary reviewer, not yet ready for team's eyes labels May 7, 2024
Copy link
Contributor

@nbbeeken nbbeeken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good

src/operations/operation.ts Show resolved Hide resolved
src/cmap/connection.ts Show resolved Hide resolved
src/cmap/connection_pool.ts Outdated Show resolved Hide resolved
src/cmap/connection_pool.ts Outdated Show resolved Hide resolved
Comment on lines 313 to 317
if (options.operationTimeout) {
conn = await this.pool.checkOut({ timeout: options.operationTimeout });
} else {
conn = await this.pool.checkOut();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (options.operationTimeout) {
conn = await this.pool.checkOut({ timeout: options.operationTimeout });
} else {
conn = await this.pool.checkOut();
}
conn = await this.pool.checkOut({ timeout: options.operationTimeout });

TS supports just calling this because the timeout is optional

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel Warren's current code is easier to read (and easier for someone editing the code later to not accidentally make the code not CSOT spec-compliant) , but if we do end up going with this suggestion can we leave a clarifying comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised because breaking this up into two calls to checkOut based on a condition that does matter is more to read without more meaningful context given. Whether or not timeout exists, there is no change to how checkOut is, practically, invoked because the typescript reports that field as optional.

I would actually take this further:

conn = await this.pool.checkOut(options);

Why do we need to make a new object here? passing through options should be fine right? Less branching paths the less there is to debug

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on what may accidentally break the spec without a test warning us?

src/cmap/connection_pool.ts Outdated Show resolved Hide resolved
src/cmap/connection_pool.ts Outdated Show resolved Hide resolved
@W-A-James W-A-James requested a review from nbbeeken May 10, 2024 16:06
// Determine if we're using the timeout passed in or a new timeout
if (options.timeout.duration > 0 || serverSelectionTimeoutMS > 0) {
if (
csotMin(options.timeout.duration, serverSelectionTimeoutMS) === serverSelectionTimeoutMS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still and equals check, sorry I think we discussed it but didn't leave a comment, if duration is the same then we'll create a new timeout when we can use the existing one.

src/sdam/topology.ts Outdated Show resolved Hide resolved
@W-A-James W-A-James requested a review from nbbeeken May 10, 2024 19:36
@@ -889,8 +922,6 @@ function drainWaitQueue(queue: List<ServerSelectionRequest>, drainError: MongoDr
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drainWaitQueue is called when the topology is closed. Is it correct not to clear timeouts when the client is closed? I don't think so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do we rely on drainWaitQueue rejecting each request in the wait queue, which would clear the timeout when the catch handler is run in selectServer?

@@ -61,6 +62,12 @@ export abstract class AbstractOperation<TResult = any> {

options: OperationOptions;

/** @internal */
timeout?: Timeout;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSOTContext class sounds somewhat similar to the OperationContext idea that you had explored earlier on in the CSOT design. I'm not against it on principle, but I think we scrapped that idea at some point. Will look for any documents we had on why and report back.

The context would be similar to the timeout factory, not the operation context. We decided not to implement an operation context because we can pass CSOT-related data on the options objects in the driver. I'm proposing a CSOT context to encapsulate CSOT logic, which we would then pass through the driver on the options objects.


I do not consider this thread resolved. I think the current implementation is more complicated than necessary because we only sometimes re-use the timeout.

  • This means server selection and connection checkout must be responsible for determining when to reuse the timeout.
  • We can't unconditionally clear the timeout because it may be re-used later.

Instead, I propose we either:

  1. Always create a new timeout for server selection and connection checkout.
  2. Always reuse the same Timeout object, but just reset the Timeout's interval for server selection and connection checkout.

Regardless of which is chosen, I think the resultant code is simpler because server selection and connection checkout 1) do not worry about whether or not they need to use a cached timeout or create a new one 2) they can always clear the timeout.

This works especially nicely with the TimeoutFactory or a TimeoutContext, because we can encapsulate all timeout related logic into a single place that's easily unit testable. I'm partial to the factory approach:

class TimeoutFactory {
  private timeoutMS: number | null;
  private started = now();

  getTimeoutForServerSelection(): Timeout {
    // returns a timeout, handling CSOT vs Legacy timeout logic
  }
}

class Topology {

  selectServer(options: { ..., timeoutFactory: TimeoutFactory }) {
    ...
    const timeout = timeoutFactory.getTimeoutForServerSelection();
    try {
		....
	} finally {
      timeout.clear();
    }
  }
}

Note that with an approach like this, whether or not we reuse a timeout can easily be encapsulated into the TimeoutFactory by instantiating a timeout when the factory is constructed and returning the cached timeout where needed.

But a context class could suffice too:

class TimeoutContext {
  private timeoutMS: number | null;
  private started = now();

  getTimeoutForServerSelection(): number {}
}


class Topology {

  selectServer(options: { ..., timeoutContext: TimeoutContext }) {
    ...
    const timeout = Timeout.expires(timeoutFactory.getTimeoutForServerSelection());
    try {
		....
	} finally {
      timeout.clear();
    }
  }
}

An approach like this consolidates CSOT logic and can be reused outside of the main code path (i.e., topology connect).

I don't think this work needs to block this PR. But I do want to make sure we discuss this, and I'd like to consider one of these approaches in a future ticket.

@@ -457,8 +458,14 @@ export class Topology extends TypedEventEmitter<TopologyEvents> {
}
}

const timeoutMS = this.client.options.timeoutMS;
const timeout = timeoutMS != null ? Timeout.expires(timeoutMS) : undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct if we're auto-connecting the client?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team Review Needs review from team
Projects
None yet
4 participants