Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to avoid workers crash-looping #96

Open
grosser opened this issue Feb 12, 2019 · 0 comments
Open

Add option to avoid workers crash-looping #96

grosser opened this issue Feb 12, 2019 · 0 comments

Comments

@grosser
Copy link

grosser commented Feb 12, 2019

We run fluentd (which uses serverengine) in a container, sometimes the workers keep dying in a tight loop, which puts lots of stress on the system, but it's not visible to the outside since the server process just keeps restarting the workers.

I'd like to make a PR to add a "max_crash_frequency" or so flag, that would crash the server if the worker crash frequency goes above a certain value (like 10/minute or so)

/cc @repeatedly @tagomoris

Monkeypatch atm is:

# frozen_string_literal: true
# changes serverengine/lib/serverengine/multi_process_server.rb to crash the server when workers fail too often
#
# https://github.com/treasure-data/serverengine/issues/96

module PreventWorkerCrashloop
  MAX_WORKER_CRASHES = 5
  MAX_WORKER_CRASH_INTERVAL = 5 * 60

  def alive?
    alive = super

    if !alive && !@unrecoverable_exit
      now = Time.now.to_i
      cutoff = now - MAX_WORKER_CRASH_INTERVAL
      failures = (@@failures_timestamps ||= []) # rubocop:disable Style/ClassVars
      failures.reject! { |t| t < cutoff }
      failures << now
      if failures.size >= MAX_WORKER_CRASHES
        diff = now - failures.first
        @worker.logger.error(
          "PreventWorkerCrashloop killing server because of #{failures.size} worker crashes in #{diff}s"
        )
        @unrecoverable_exit = true
      end
    end

    alive
  end
end

ServerEngine::MultiProcessServer::WorkerMonitor.prepend(PreventWorkerCrashloop)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant