Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makeBackoffMachine: Improve backoff sleep logic by making it local. #3841

Merged
merged 6 commits into from
Apr 10, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions .eslintrc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ rules:
# Formatting and naming
camelcase: off

no-await-in-loop: off # Promise.all is not always desirable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how turning this off will prevent – or even make less likely – the failure state you describe in the commit message. Our existing improper uses of array.forEach seem more likely to have been induced by AirBnB's anti-loop position, which we've since thoughtfully revoked. (See the bottom of this file.)

Which is not to say that I'm against removing this lint! I suspect we almost invariably do want sequential await whenever we have multiple Awaitables, simply because we rarely – if ever – want to fire off a hundred network requests at once.

Copy link
Contributor Author

@chrisbobbe chrisbobbe Mar 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how turning this off will prevent – or even make less likely – the failure state you describe in the commit message.

Well, with this rule, if you want to await sequentially in a loop, you have to choose between an eslint-disable and the failure of using a forEach. I think it's reasonable to assume that someone will choose wrong because they trust our choice of rules and they're not looking closely at the implementation of forEach. I'll see if I can make this clearer in the commit message.

Our existing improper uses of array.forEach seem more likely to have been induced by AirBnB's anti-loop position, which we've since thoughtfully revoked. (See the bottom of this file.)

That may be true, but it seems anecdotal; you're talking about the outbox issue, right, which I didn't know about when I committed this; in fact, my reasoning doesn't depend on any improper uses I saw. It sounds like you're not opposed to repealing no-await-in-loop, anyway, but I hope the reason I'd like to do so is clear.


# Permit dangling underscores in class property names, to denote "private"
# fields. (These should be replaced with true private fields per the TC39
# class fields proposal [1], once that's available to us.)
Expand Down
6 changes: 4 additions & 2 deletions src/events/eventActions.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import eventToAction from './eventToAction';
import eventMiddleware from './eventMiddleware';
import { tryGetAuth } from '../selectors';
import actionCreator from '../actionCreator';
import progressiveTimeout from '../utils/progressiveTimeout';
import { BackoffMachine } from '../utils/async';
import { ApiError } from '../api/apiErrors';

/** Convert an `/events` response into a sequence of our Redux actions. */
Expand Down Expand Up @@ -55,6 +55,8 @@ export const startEventPolling = (queueId: number, eventId: number) => async (
) => {
let lastEventId = eventId;

const backoffMachine = new BackoffMachine();

/* eslint-disable no-await-in-loop */
// eslint-disable-next-line no-constant-condition
while (true) {
Expand Down Expand Up @@ -86,7 +88,7 @@ export const startEventPolling = (queueId: number, eventId: number) => async (
}

// protection from inadvertent DDOS
await progressiveTimeout();
await backoffMachine.wait();

if (e instanceof ApiError && e.code === 'BAD_EVENT_QUEUE_ID') {
// The event queue is too old or has been garbage collected.
Expand Down
5 changes: 3 additions & 2 deletions src/outbox/outboxActions.js
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import * as api from '../api';
import { getSelfUserDetail, getUsersByEmail } from '../users/userSelectors';
import { getUsersAndWildcards } from '../users/userHelpers';
import { isStreamNarrow, isPrivateOrGroupNarrow } from '../utils/narrow';
import progressiveTimeout from '../utils/progressiveTimeout';
import { BackoffMachine } from '../utils/async';
import { NULL_USER } from '../nullObjects';

export const messageSendStart = (outbox: Outbox): Action => ({
Expand Down Expand Up @@ -99,8 +99,9 @@ export const sendOutbox = () => async (dispatch: Dispatch, getState: GetState) =
return;
}
dispatch(toggleOutboxSending(true));
const backoffMachine = new BackoffMachine();
while (!trySendMessages(dispatch, getState)) {
await progressiveTimeout(); // eslint-disable-line no-await-in-loop
await backoffMachine.wait(); // eslint-disable-line no-await-in-loop
}
dispatch(toggleOutboxSending(false));
};
Expand Down
60 changes: 60 additions & 0 deletions src/utils/__tests__/backoffMachine-test.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
/* @flow strict-local */
import { BackoffMachine } from '../async';
import { Lolex } from '../../__tests__/aux/lolex';

// Since BackoffMachine is in async.js, these tests *should* be in
// async-test.js. But doing that introduces some interference between these
// tests and the other Lolex-based tests, since Jest is running both of them in
// the same environment in parallel. This may be resolved out of the box in Jest
// 26, and it might even be safe in Jest 25.1.0 with a custom environment
// (https://github.com/facebook/jest/pull/8897). But as of 2020-03, putting them
// in a separate file is our workaround.

describe('BackoffMachine', () => {
const lolex: Lolex = new Lolex();

afterEach(() => {
lolex.clearAllTimers();
});

afterAll(() => {
lolex.dispose();
});

const measureWait = async (promise: Promise<void>) => {
const start = Date.now();
lolex.runOnlyPendingTimers();
await promise;
return Date.now() - start;
};

test('timeouts are random from zero to 100ms, 200ms, 400ms, 800ms...', async () => {
// This is a randomized test. NUM_TRIALS is chosen so that the failure
// probability < 1e-9. There are 2 * 11 assertions, and each one has a
// failure probability < 1e-12; see below.
const NUM_TRIALS = 100;
const expectedMaxDurations = [100, 200, 400, 800, 1600, 3200, 6400, 10000, 10000, 10000, 10000];

const trialResults: Array<number[]> = [];

for (let i = 0; i < NUM_TRIALS; i++) {
const resultsForThisTrial = [];
const backoffMachine = new BackoffMachine();
for (let j = 0; j < expectedMaxDurations.length; j++) {
const duration = await measureWait(backoffMachine.wait());
resultsForThisTrial.push(duration);
}
trialResults.push(resultsForThisTrial);
}

expectedMaxDurations.forEach((expectedMax, i) => {
const maxFromAllTrials = Math.max(...trialResults.map(r => r[i]));
const minFromAllTrials = Math.min(...trialResults.map(r => r[i]));

// Each of these assertions has a failure probability of:
// 0.75 ** NUM_TRIALS = 0.75 ** 100 < 1e-12
expect(minFromAllTrials).toBeLessThan(expectedMax * 0.25);
expect(maxFromAllTrials).toBeGreaterThan(expectedMax * 0.75);
});
});
});
16 changes: 0 additions & 16 deletions src/utils/__tests__/progressiveTimeout-test.js

This file was deleted.

93 changes: 84 additions & 9 deletions src/utils/async.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
/* @flow strict-local */
import progressiveTimeout from './progressiveTimeout';
import { isClientError } from '../api/apiErrors';

/** Like setTimeout(..., 0), but returns a Promise of the result. */
Expand All @@ -10,6 +9,74 @@ export function delay<T>(callback: () => T): Promise<T> {
export const sleep = (ms: number = 0): Promise<void> =>
new Promise(resolve => setTimeout(resolve, ms));

/**
rk-for-zulip marked this conversation as resolved.
Show resolved Hide resolved
* Makes a machine that can sleep for increasing durations, for network backoff.
*
* Call the constructor before a loop starts, and call .wait() in each iteration
* of the loop. Do not re-use the instance after exiting the loop.
*/
export class BackoffMachine {
_firstDuration: number;
_durationCeiling: number;
_base: number;

_startTime: number | void;
_waitsCompleted: number;

constructor() {
this._firstDuration = 100;
this._durationCeiling = 10 * 1000;
this._base = 2;

this._startTime = undefined;
this._waitsCompleted = 0;
}

/**
* How many waits have completed so far.
*
* Use this to implement "give up" logic by breaking out of the loop after a
* threshold number of waits.
*/
waitsCompleted = (): number => this._waitsCompleted;

/**
* Promise to resolve after the appropriate duration.
*
* The popular exponential backoff strategy is to increase the duration
* exponentially with the number of sleeps completed, with a base of 2, until a
* ceiling is reached. E.g., if firstDuration is 100 and durationCeiling is 10 *
* 1000 = 10000, the sequence is
*
* 100, 200, 400, 800, 1600, 3200, 6400, 10000, 10000, 10000, ...
*
* Instead of using this strategy directly, we also apply "jitter". We use
* capped exponential backoff for the *upper bound* on a random duration, where
* the lower bound is always zero. Mitigating "bursts" is the goal of any
* "jitter" strategy, and the larger the range of randomness, the smoother the
* bursts. Keeping the lower bound at zero maximizes the range while preserving
* a capped exponential shape on the expected value. Greg discusses this in more
* detail in #3841.
*/
wait = async (): Promise<void> => {
if (this._startTime === undefined) {
this._startTime = Date.now();
}

const duration =
Math.random() // "Jitter"
* Math.min(
// Upper bound of random duration should not exceed durationCeiling
this._durationCeiling,
this._firstDuration * this._base ** this._waitsCompleted,
);

await sleep(duration);

this._waitsCompleted++;
};
}

/**
* Calls an async function and if unsuccessful retries the call.
*
Expand All @@ -18,14 +85,22 @@ export const sleep = (ms: number = 0): Promise<void> =>
* handled further up in the call stack.
*/
export async function tryUntilSuccessful<T>(func: () => Promise<T>): Promise<T> {
try {
return await func();
} catch (e) {
if (isClientError(e)) {
// do not retry if error is 4xx (Client Error)
throw e;
const backoffMachine = new BackoffMachine();
// eslint-disable-next-line no-constant-condition
while (true) {
try {
return await func();
} catch (e) {
if (isClientError(e)) {
throw e;
}
await backoffMachine.wait();
}
await progressiveTimeout();
}
return tryUntilSuccessful(func);

// Without this, Flow 0.92.1 does not know this code is unreachable,
// and it incorrectly thinks Promise<undefined> could be returned,
// which is inconsistent with the stated Promise<T> return type.
// eslint-disable-next-line no-unreachable
throw new Error();
}
36 changes: 0 additions & 36 deletions src/utils/progressiveTimeout.js

This file was deleted.