Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to deprecate term_on_timeout: false #169

Closed
mtsmfm opened this issue Oct 21, 2020 · 7 comments
Closed

Proposal to deprecate term_on_timeout: false #169

mtsmfm opened this issue Oct 21, 2020 · 7 comments

Comments

@mtsmfm
Copy link

mtsmfm commented Oct 21, 2020

Hi, thank you for creating and maintaining useful gem.
I've read risks.md and timeout related PRs on Rails like rails/rails#17607 and understand timeout can break application state completely.
I've read Rails' codes and found critical case so I'd like to share and propose deprecating reusing timed out Ruby process.

Following example shows timeout can break even User.count

require "bundler/inline"

gemfile do
  source "https://rubygems.org"

  gem "rails", "6.0.3"
  gem "sqlite3"
  gem "rack-timeout", require: "rack/timeout/base"
end

require "rack/test"
require "active_record/railtie"
require "action_controller/railtie"

ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(File::NULL)
ActiveRecord::Base.logger.level = Logger::ERROR

ActiveRecord::Schema.define do
  create_table :users, force: true do |t|
    t.string :name
  end
end

class User < ActiveRecord::Base
  scope :sleep_a_while, -> { all.tap { sleep(rand / 1000000.0) } }
end

User.create!(name: "foo")
User.create!(name: "bar")

class TestApp < Rails::Application
  config.root = __dir__
  config.hosts << "example.org"
  config.session_store :cookie_store, key: "cookie_store_key"
  secrets.secret_key_base = "secret_key_base"

  config.logger = ActiveRecord::Base.logger
  Rails.logger  = config.logger

  config.middleware.insert_before Rack::Runtime, Rack::Timeout, service_timeout: 0.1

  config.exceptions_app = ->(*) { [500, {}, ['']] }

  routes.draw do
    get "/" => "test#index"
    get "/timeout" => "test#timeout"
  end
end

class TestController < ActionController::Base
  include Rails.application.routes.url_helpers

  def index
    render plain: User.count
  end

  def timeout
    loop do
      User.where(name: "foo").sleep_a_while
    end
  end
end

require "minitest/autorun"

class BugTest < Minitest::Test
  include Rack::Test::Methods

  def test_index
    get "/"

    # Make sure User.count is 2
    assert_equal "2", last_response.body

    until last_response.body == "1"
      get "/timeout"

      get "/"
    end

    get "/"
    # User.count must be always 2 but it's now 1
    assert_equal "2", last_response.body
  end

  private
    def app
      Rails.application
    end
end

I think in most cases we use this gem with Rails.
The above example shows ActiveRecord can be completely broken after timeout.
In addition, it doesn't throw any error so it's very difficult to be aware this problem.
We already have well written document risks.md though, I'd like to propose deprecating this.

What do you think?

@wuputah
Copy link
Collaborator

wuputah commented Mar 24, 2021

Hello and thank you for your thoughts and your test app!

I believe the default behavior is what people expect from historical versions of Rack Timeout: just raise an exception but continue on. I believe the TERM behavior is something that people should opt-into as it's highly dependent on the nature of your production environment. Additionally, while the application may enter a bad state when using Rails + ActiveRecord, the middleware does not know what you are using -- your app might only be making network calls, for instance. If you find that you are aborting database calls with Rack Timeout that might be a good indication that you should try setting a database-level timeout, like statement_timeout in Postgres.

@mtsmfm
Copy link
Author

mtsmfm commented Apr 2, 2021

@wuputah Thank you for responding.
I understand it's difficult to change the default behavior.

What do you think about showing warn if term_on_timeout is false?
As you can see on above example, active record scope which used just before timeout is injected into all other scopes.
User.count will be User.where(name: "foo").count even it's written as User.count.
I think it's so terrible and really difficult to be aware.

@dentarg
Copy link

dentarg commented Apr 2, 2021

@mtsmfm rails/rails#41356 was recently merged, does that change anything for you?

I'm not sure about warning on term_on_timeout: false, what should it say? That you should opt-in to the TERM behavior?

Why and how rack-timeout is used in an application, should be a well thought out decision by the application developers. Maybe rack-timeout should have a warning by default, "Are you really sure you need to use rack-timeout and do you understand the consequences?" Something similar to parallel --citation. Hehe.

@mtsmfm
Copy link
Author

mtsmfm commented Apr 2, 2021

@dentarg

rails/rails#41356 was recently merged, does that change anything for you?

No, the test I showed still fails.

# Running:

F

Failure:
BugTest#test_index [bar.rb:84]:
--- expected
+++ actual
@@ -1 +1,3 @@
-"2"
+# encoding: ASCII-8BIT
+#    valid: true
+"1"
Entire code (just changes gem "rails" line)
require "bundler/inline"

gemfile do
  source "https://rubygems.org"

  gem "rails", github: "rails/rails", ref: "main"
  gem "sqlite3"
  gem "rack-timeout", require: "rack/timeout/base"
end

require "rack/test"
require "active_record/railtie"
require "action_controller/railtie"

ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(File::NULL)
ActiveRecord::Base.logger.level = Logger::ERROR

ActiveRecord::Schema.define do
  create_table :users, force: true do |t|
    t.string :name
  end
end

class User < ActiveRecord::Base
  scope :sleep_a_while, -> { all.tap { sleep(rand / 1000000.0) } }
end

User.create!(name: "foo")
User.create!(name: "bar")

class TestApp < Rails::Application
  config.root = __dir__
  config.hosts << "example.org"
  config.session_store :cookie_store, key: "cookie_store_key"
  secrets.secret_key_base = "secret_key_base"

  config.logger = ActiveRecord::Base.logger
  Rails.logger  = config.logger

  config.middleware.insert_before Rack::Runtime, Rack::Timeout, service_timeout: 0.1

  config.exceptions_app = ->(*) { [500, {}, ['']] }

  routes.draw do
    get "/" => "test#index"
    get "/timeout" => "test#timeout"
  end
end

class TestController < ActionController::Base
  include Rails.application.routes.url_helpers

  def index
    render plain: User.count
  end

  def timeout
    loop do
      User.where(name: "foo").sleep_a_while
    end
  end
end

require "minitest/autorun"

class BugTest < Minitest::Test
  include Rack::Test::Methods

  def test_index
    get "/"

    # Make sure User.count is 2
    assert_equal "2", last_response.body

    until last_response.body == "1"
      get "/timeout"

      get "/"
    end

    get "/"
    # User.count must be always 2 but it's now 1
    assert_equal "2", last_response.body
  end

  private
    def app
      Rails.application
    end
end

Why and how rack-timeout is used in an application, should be a well thought out decision by the application developers.

Agree. But I hope the library uses safer option by default if it has.
In addition, if my understanding is correct:

I know application developers must investigate why a request timed out.
But we can't make sure every endpoint can respond fast enough before deploying.
rack-timeout can work as a safety valve I think.

rack-timeout can't cause any problem if we use application server with multi process mode and term_on_timeout: true.
I just hope this gem is safe by default or has a caution if it's not safe by default despite it has safe option.

@dentarg
Copy link

dentarg commented Apr 2, 2021

rack-timeout can't cause any problem if we use application server with multi process mode and term_on_timeout: true

Agree, but this gem can't know if your app server is using multiple processes...

As an example, Puma is not doing that by default – so then it wouldn't be a safe default IMHO

@mtsmfm
Copy link
Author

mtsmfm commented Apr 3, 2021

Puma is not doing that by default

Yes, and it's also difficult to change I guess.
That's why I proposed to show warning on term_on_timeout: false.

@mtsmfm
Copy link
Author

mtsmfm commented Apr 3, 2021

I think injecting unexpected scope can be a possibility of vulnerability stuff.

# Might be BlockedUser.where(user_id: timed_out_request_user_id).where(user_id: current_user.id).count after time out
if BlockedUser.where(user_id: current_user.id).count == 0
  # Run something
end

Of course I respect your, rack-timeout and puma maintainer's opinion.
I just want you to know the possibility and I felt it's underestimated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants