Skip to content

Retrier and Backoff

John Goodall edited this page Jan 31, 2017 · 2 revisions

Retrier is an interface that users can implement to intercept failed requests.

type Retrier interface {
	Retry(ctx context.Context, retry int, req *http.Request, resp *http.Response, err error) (time.Duration, bool, error)
}

All requests in elastic finally call into PerformRequest in the Client instance. This is also where Retrier is used. It is called whenever an error happens in a conversation with Elasticsearch. E.g. when Elastic cannot find an active connection to Elasticsearch or whether a request failed.

Elastic then calls Retry on the Retrier, passing along information it has about the current state. The parameters passed are:

  • ctx is the context passed into elastic
  • retry is the current number of retries (1, 2, 3...)
  • req is the *http.Request which may be nil
  • resp is the *http.Response which may be nil
  • err is the error returned from e.g. the underlying HTTP request

Retry then needs to return three parameters:

  • A time.Duration which specifies the time to wait until the next request
  • A bool which indicates whether to continue with retries, i.e. use false to stop retries
  • An error which, if non-nil, will stop retrying immediately and will be returned to the service that called into PerformRequest in the first place

To calculate a time-to-wait, elastic comes with a list of Backoff implementations. All of these implementations are designed following github.com/cenkalti/backoff. We have:

By default, Elastic does no retries. Here is an example of implementing and specifying your own, custom Retry implementation:

type MyRetrier struct {
  backoff elastic.Backoff
}

func NewMyRetrier() *MyRetrier {
  return &MyRetrier{
    backoff: elastic.NewExponentialBackoff(10 * time.Millisecond, 8 * time.Second),
  }
}

func (r *MyRetrier) Retry(ctx context.Context, retry int, req *http.Request, resp *http.Response, err error) (time.Duration, bool, error) {
  // Fail hard on a specific error
  if err == syscall.ECONNREFUSED {
    return 0, false, errors.New("Elasticsearch or network down")
  }

  // Stop after 5 retries
  if retry >= 5 {
    return 0, false, nil
  }

  // Let the backoff strategy decide how long to wait and whether to stop
  wait, stop := r.backoff.Next(retry)
  return wait, stop, nil
}

...

client, err := elastic.NewClient(
  elastic.SetURL("http://127.0.0.1:9200"),
  elastic.SetRetrier(NewMyRetrier()),
)
if err != nil { ... }