retry middleware fails in parallel #2

BuonOmo · 2021-09-03T09:44:53Z

Basic Info

Faraday Version: 1.7.1 (and also main branch at fdf797b)
Ruby Version: 2.7.4

Issue description

The retry count middleware does not wait for completed requests when in parallel, hence retries are not issued.

Steps to reproduce

#!/usr/bin/env ruby
# frozen_string_literal: true

require "bundler/inline"
gemfile do
  source "https://rubygems.org"

  gem "faraday"
  gem "typhoeus"
end

count = 0
expected = 5

faraday = Faraday.new do |conn|
  retry_options = {
    max: expected,
    interval: 0.1,
    retry_statuses: [503],
    retry_block: proc { count += 1 }
  }
  conn.request :retry, **retry_options
  conn.adapter :typhoeus
end

faraday.in_parallel do
  faraday.get("https://httpbin.org/status/503")
end

exit 0 if count == expected

warn "Retried #{count} times, expected #{expected}"
exit 1

Fix

Note that I already found a fix working locally for typhoeus, I would just like to know if that's a bug for you and I spend my time wisely making it work in every cases! (I think the tricky part will concern chaining with the raise middleware)

Here's a working patch for the precise reproduction above:

diff --git a/lib/faraday/request/retry.rb b/lib/faraday/request/retry.rb
index 3ba50b7..460b683 100644
--- a/lib/faraday/request/retry.rb
+++ b/lib/faraday/request/retry.rb
@@ -145,10 +145,29 @@ module Faraday
         begin
           # after failure env[:body] is set to the response body
           env[:body] = request_body
-          @app.call(env).tap do |resp|
+          # TODO: distinguish the parallel and usual pipeline for this.
+          @app.call(env).on_complete do |resp|
+            if !resp.status
+              raise "bug!"
+            end
             if @options.retry_statuses.include?(resp.status)
               raise Faraday::RetriableResponse.new(nil, resp)
             end
+          rescue @errmatch => e
+            # TODO: dry...
+            if retries.positive? && retry_request?(env, e)
+              retries -= 1
+              rewind_files(request_body)
+              @options.retry_block.call(env, @options, retries, e)
+              if (sleep_amount = calculate_sleep_amount(retries + 1, env))
+                sleep sleep_amount
+                retry
+              end
+
+              raise unless e.is_a?(Faraday::RetriableResponse)
+
+              e.response
+            end
           end
         rescue @errmatch => e
           if retries.positive? && retry_request?(env, e)

If you're in I should be able to get that done within 2 weeks

Bonus

Note that the instrumentation middleware has basically the same issue, I can fix it with

diff --git a/lib/faraday/request/instrumentation.rb b/lib/faraday/request/instrumentation.rb
index c03ba1e..84822d6 100644
--- a/lib/faraday/request/instrumentation.rb
+++ b/lib/faraday/request/instrumentation.rb
@@ -44,10 +44,13 @@ module Faraday
       end
 
       # @param env [Faraday::Env]
-      def call(env)
-        @instrumenter.instrument(@name, env) do
-          @app.call(env)
-        end
+      def on_request(env)
+        @instrumenter.start(@name, env)
+      end
+
+      # @param env [Faraday::Env]
+      def on_complete(env)
+        @instrumenter.finish(@name, env)
       end
     end
   end

One issue though, the time (ends - starts) would be wrong, and I see that faraday does not add the request time to env data..

iMacTia · 2021-09-03T15:00:51Z

Thanks for raising this @BuonOmo, it does indeed look like something we should fix!

My guess is that those middleware have been written before the parallel feature, and were never updated after that.
Your fix for instrumentation looks straightforward, though the one for retry is probably going to be a bit more complicated.
The good news is that we recently introduced an additional request_body property on the env, so storing it explicitly in this middleware should not be necessary anymore, that might help simplifying the logic a bit.

It would be awesome if you could work on this, but please feel free to shout out if you need any help!

BuonOmo · 2021-09-03T19:29:51Z

@iMacTia

Your fix for instrumentation looks straightforward

Unfortunately, it would still be deceiving because the durations would pile up.

require "bundler/inline"
gemfile do
  source "https://rubygems.org"

  gem "faraday"
  gem "typhoeus"
  gem "activesupport", require: "active_support"
end

faraday = Faraday.new do |conn|
  conn.request :instrumentation
  conn.adapter :typhoeus
end

ActiveSupport::Notifications.subscribe('request.faraday') do |name, starts, ends, _, env|
  url = env[:url]
  http_method = env[:method].to_s.upcase
  duration = ends - starts
  $stdout.puts '[%s] %s %s (%.3f s)' % [url.host, http_method, url.request_uri, duration]
end

faraday.in_parallel(Typhoeus::Hydra.new(max_concurrency: 1)) do
  100.times {faraday.get("https://httpbin.org/status/200")}
end

[httpbin.org] GET /status/200 (0.462 s)
[httpbin.org] GET /status/200 (0.570 s)
[httpbin.org] GET /status/200 (0.677 s)
[httpbin.org] GET /status/200 (0.782 s)
[httpbin.org] GET /status/200 (0.890 s)
[httpbin.org] GET /status/200 (0.997 s)
[httpbin.org] GET /status/200 (1.102 s)
[httpbin.org] GET /status/200 (1.210 s)
[httpbin.org] GET /status/200 (1.317 s)
[httpbin.org] GET /status/200 (1.426 s)
[httpbin.org] GET /status/200 (1.537 s)
[httpbin.org] GET /status/200 (1.642 s)
[httpbin.org] GET /status/200 (1.749 s)
[httpbin.org] GET /status/200 (1.855 s)
[httpbin.org] GET /status/200 (1.959 s)
[httpbin.org] GET /status/200 (2.065 s)
[httpbin.org] GET /status/200 (2.170 s)
[httpbin.org] GET /status/200 (2.276 s)
[httpbin.org] GET /status/200 (2.381 s)
[httpbin.org] GET /status/200 (2.486 s)
[httpbin.org] GET /status/200 (2.873 s)
[httpbin.org] GET /status/200 (3.148 s)
...

Hence we would need timing information, which is already given by ethon and typhoeus, however faraday does not give access to the wrapper request or response, nor it has request timing information. Hence I think it would be wise to let you choose how do you want this information to be accessible, since I really don't have the whole adapter design in mind..

The good news is that we recently introduced an additional request_body property on the env, so storing it explicitly in this middleware should not be necessary anymore, that might help simplifying the logic a bit.

Ok thanks for the hint I'll look into that, I'm more confident on that one since it is clearly scoped 🙂

iMacTia · 2021-09-05T08:36:31Z

Mmmmh that's interesting, my guess is that's because start is called only once, while finish is called multiple times?
Definitely less simple than expected. I'd suggest you tackle these issues in two separate PRs rather then fixing them together if you're not doing that already!

iMacTia transferred this issue from lostisland/faraday Jan 2, 2022

avmnu-sng mentioned this issue Jan 13, 2022

Instrumentation middleware cannot work with async requests lostisland/faraday#1381

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

retry middleware fails in parallel #2

retry middleware fails in parallel #2

BuonOmo commented Sep 3, 2021

iMacTia commented Sep 3, 2021

BuonOmo commented Sep 3, 2021

iMacTia commented Sep 5, 2021

retry middleware fails in parallel #2

retry middleware fails in parallel #2

Comments

BuonOmo commented Sep 3, 2021

Basic Info

Issue description

Steps to reproduce

Fix

Bonus

iMacTia commented Sep 3, 2021

BuonOmo commented Sep 3, 2021

iMacTia commented Sep 5, 2021