Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry middleware fails in parallel #2

Open
BuonOmo opened this issue Sep 3, 2021 · 3 comments
Open

retry middleware fails in parallel #2

BuonOmo opened this issue Sep 3, 2021 · 3 comments

Comments

@BuonOmo
Copy link

BuonOmo commented Sep 3, 2021

Basic Info

  • Faraday Version: 1.7.1 (and also main branch at fdf797b)
  • Ruby Version: 2.7.4

Issue description

The retry count middleware does not wait for completed requests when in parallel, hence retries are not issued.

Steps to reproduce

#!/usr/bin/env ruby
# frozen_string_literal: true

require "bundler/inline"
gemfile do
  source "https://rubygems.org"

  gem "faraday"
  gem "typhoeus"
end

count = 0
expected = 5

faraday = Faraday.new do |conn|
  retry_options = {
    max: expected,
    interval: 0.1,
    retry_statuses: [503],
    retry_block: proc { count += 1 }
  }
  conn.request :retry, **retry_options
  conn.adapter :typhoeus
end

faraday.in_parallel do
  faraday.get("https://httpbin.org/status/503")
end

exit 0 if count == expected

warn "Retried #{count} times, expected #{expected}"
exit 1

Fix

Note that I already found a fix working locally for typhoeus, I would just like to know if that's a bug for you and I spend my time wisely making it work in every cases! (I think the tricky part will concern chaining with the raise middleware)

Here's a working patch for the precise reproduction above:

diff --git a/lib/faraday/request/retry.rb b/lib/faraday/request/retry.rb
index 3ba50b7..460b683 100644
--- a/lib/faraday/request/retry.rb
+++ b/lib/faraday/request/retry.rb
@@ -145,10 +145,29 @@ module Faraday
         begin
           # after failure env[:body] is set to the response body
           env[:body] = request_body
-          @app.call(env).tap do |resp|
+          # TODO: distinguish the parallel and usual pipeline for this.
+          @app.call(env).on_complete do |resp|
+            if !resp.status
+              raise "bug!"
+            end
             if @options.retry_statuses.include?(resp.status)
               raise Faraday::RetriableResponse.new(nil, resp)
             end
+          rescue @errmatch => e
+            # TODO: dry...
+            if retries.positive? && retry_request?(env, e)
+              retries -= 1
+              rewind_files(request_body)
+              @options.retry_block.call(env, @options, retries, e)
+              if (sleep_amount = calculate_sleep_amount(retries + 1, env))
+                sleep sleep_amount
+                retry
+              end
+
+              raise unless e.is_a?(Faraday::RetriableResponse)
+
+              e.response
+            end
           end
         rescue @errmatch => e
           if retries.positive? && retry_request?(env, e)

If you're in I should be able to get that done within 2 weeks


Bonus

Note that the instrumentation middleware has basically the same issue, I can fix it with

diff --git a/lib/faraday/request/instrumentation.rb b/lib/faraday/request/instrumentation.rb
index c03ba1e..84822d6 100644
--- a/lib/faraday/request/instrumentation.rb
+++ b/lib/faraday/request/instrumentation.rb
@@ -44,10 +44,13 @@ module Faraday
       end
 
       # @param env [Faraday::Env]
-      def call(env)
-        @instrumenter.instrument(@name, env) do
-          @app.call(env)
-        end
+      def on_request(env)
+        @instrumenter.start(@name, env)
+      end
+
+      # @param env [Faraday::Env]
+      def on_complete(env)
+        @instrumenter.finish(@name, env)
       end
     end
   end

One issue though, the time (ends - starts) would be wrong, and I see that faraday does not add the request time to env data..

@iMacTia
Copy link
Member

iMacTia commented Sep 3, 2021

Thanks for raising this @BuonOmo, it does indeed look like something we should fix!

My guess is that those middleware have been written before the parallel feature, and were never updated after that.
Your fix for instrumentation looks straightforward, though the one for retry is probably going to be a bit more complicated.
The good news is that we recently introduced an additional request_body property on the env, so storing it explicitly in this middleware should not be necessary anymore, that might help simplifying the logic a bit.

It would be awesome if you could work on this, but please feel free to shout out if you need any help!

@BuonOmo
Copy link
Author

BuonOmo commented Sep 3, 2021

@iMacTia

Your fix for instrumentation looks straightforward

Unfortunately, it would still be deceiving because the durations would pile up.

require "bundler/inline"
gemfile do
  source "https://rubygems.org"

  gem "faraday"
  gem "typhoeus"
  gem "activesupport", require: "active_support"
end

faraday = Faraday.new do |conn|
  conn.request :instrumentation
  conn.adapter :typhoeus
end

ActiveSupport::Notifications.subscribe('request.faraday') do |name, starts, ends, _, env|
  url = env[:url]
  http_method = env[:method].to_s.upcase
  duration = ends - starts
  $stdout.puts '[%s] %s %s (%.3f s)' % [url.host, http_method, url.request_uri, duration]
end

faraday.in_parallel(Typhoeus::Hydra.new(max_concurrency: 1)) do
  100.times {faraday.get("https://httpbin.org/status/200")}
end
[httpbin.org] GET /status/200 (0.462 s)
[httpbin.org] GET /status/200 (0.570 s)
[httpbin.org] GET /status/200 (0.677 s)
[httpbin.org] GET /status/200 (0.782 s)
[httpbin.org] GET /status/200 (0.890 s)
[httpbin.org] GET /status/200 (0.997 s)
[httpbin.org] GET /status/200 (1.102 s)
[httpbin.org] GET /status/200 (1.210 s)
[httpbin.org] GET /status/200 (1.317 s)
[httpbin.org] GET /status/200 (1.426 s)
[httpbin.org] GET /status/200 (1.537 s)
[httpbin.org] GET /status/200 (1.642 s)
[httpbin.org] GET /status/200 (1.749 s)
[httpbin.org] GET /status/200 (1.855 s)
[httpbin.org] GET /status/200 (1.959 s)
[httpbin.org] GET /status/200 (2.065 s)
[httpbin.org] GET /status/200 (2.170 s)
[httpbin.org] GET /status/200 (2.276 s)
[httpbin.org] GET /status/200 (2.381 s)
[httpbin.org] GET /status/200 (2.486 s)
[httpbin.org] GET /status/200 (2.873 s)
[httpbin.org] GET /status/200 (3.148 s)
...

Hence we would need timing information, which is already given by ethon and typhoeus, however faraday does not give access to the wrapper request or response, nor it has request timing information. Hence I think it would be wise to let you choose how do you want this information to be accessible, since I really don't have the whole adapter design in mind..

The good news is that we recently introduced an additional request_body property on the env, so storing it explicitly in this middleware should not be necessary anymore, that might help simplifying the logic a bit.

Ok thanks for the hint I'll look into that, I'm more confident on that one since it is clearly scoped 🙂

@iMacTia
Copy link
Member

iMacTia commented Sep 5, 2021

Mmmmh that's interesting, my guess is that's because start is called only once, while finish is called multiple times?
Definitely less simple than expected. I'd suggest you tackle these issues in two separate PRs rather then fixing them together if you're not doing that already!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants