Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

./libev/ev.c:4043: ev_run: Assertion `("libev: ev_loop recursion during release detected", loop_done != EVBREAK_RECURSE' #2905

Open
bisrael opened this issue Jul 5, 2022 · 6 comments

Comments

@bisrael
Copy link

bisrael commented Jul 5, 2022

Describe the bug

puma: cluster worker 3: 19 [app]: ../libev/ev.c:4043: ev_run: Assertion ("libev: ev_loop recursion during release detected", loop_done != EVBREAK_RECURSE)' failed.`

And then it seems possibly have broken the worker and after some time the Puma server stops accepting connections.

Puma config:

workers Integer(ENV.fetch('PUMA_MAX_WORKERS', '3'))
worker_culling_strategy :oldest

wait_for_less_busy_worker(0.01)

force_shutdown_after 25

threads 1, 1 # NO Multithreading

puma_fork_worker_mode = ENV.fetch("PUMA_ENABLE_FORK_WORKER_MODE", "0") == "1"
preload_app!(!puma_fork_worker_mode)

nakayoshi_fork(true) unless ENV.fetch("PUMA_DISABLE_NAKAYOSHI_FORK", "0") == "1"

if puma_fork_worker_mode
  restart_randomness_base = ENV.fetch('PUMA_RESTART_WORKERS_AFTER_REQUESTS', 500.0).to_f
  restart_randomness_jitter = ENV.fetch('PUMA_RESTART_WORKERS_AFTER_REQUESTS_JITTER', 0.0).to_f

  restart_randomness = (restart_randomness_base + (rand * restart_randomness_jitter) - (restart_randomness_jitter / 2.0)).to_i
  puts "Will restart workers after #{restart_randomness} requests (base = #{restart_randomness_base}, jitter = #{restart_randomness_jitter})"

  # Due to the randomness of how requests are assigned, at any given time it seems we have workers with like 1k requests and
  # other workers with like 10 requests. So we'll tell puma to refork the process at some randomized interval.
  # This should help reduce memory footprint and optimize the copy-on-write memory benefits.
  #
  fork_worker(restart_randomness)
end

rackup DefaultRackup
port ENV.fetch('PORT', '3000')

before_fork do
  # we should just need to disconnect redis and it will reconnect on use
  disconnect_redis = -> (redis) {
    if redis.kind_of?(::Redis)
      redis.close
    elsif defined?(::MockRedis) && redis.kind_of?(::MockRedis)
      redis.flushdb
    end

    redis
  }

  disconnect_redis.(::StandaloneRedis.connect) if defined?(::StandaloneRedis)
  disconnect_redis.(::Resque.redis&.redis) if defined?(::Resque)
  disconnect_redis.(::Stoplight::Light.default_data_store.instance_variable_get(:@redis)) if defined?(::Stoplight)
  disconnect_redis.(::ActionCable.server.pubsub.redis_connection_for_subscriptions) if defined?(::ActionCable) && ::ActionCable.server.pubsub.kind_of?(::ActionCable::SubscriptionAdapter::Redis)
  disconnect_redis.($redis) if defined?($redis)

  begin
    ::Rails.cache.clear
  rescue NotImplementedError
    # Ignored
  end
end

To Reproduce

This just starts to happen after some time of running the application server.

In the above config, it happens with:
PUMA_ENABLE_FORK_WORKER_MODE=1
PUMA_DISABLE_NAKAYOSHI_FORK=0
PUMA_MAX_WORKERS=22
PUMA_RESTART_WORKERS_AFTER_REQUESTS=300
PUMA_RESTART_WORKERS_AFTER_REQUESTS_JITTER=50

Specifically if you change PUMA_ENABLE_FORK_WORKER_MODE=0 the error ceases.

The worker count fits on a Private-L dyno for Heroku (14gb ram) for our somewhat bloated app.

Expected behavior

I expect this error to not be raised.

Desktop (please complete the following information):

  • OS: Ubuntu 18 (heroku-18)
  • Puma Version: 5.6.4
@nateberkopec
Copy link
Member

Do you have more output? Usually a stack dump would have more lines of output.

@nateberkopec
Copy link
Member

Also I would recommend just not using fork worker mode if that fixes the issue for you.

@ioquatix
Copy link
Contributor

Is this coming from nio4r?

@dentarg
Copy link
Member

dentarg commented Nov 12, 2022

@bisrael do you have any more info about this issue?

@dentarg
Copy link
Member

dentarg commented Dec 8, 2022

Searched for part of the error message here and found digital-fabric/polyphony#6

Is something similar happening in Puma when KILL are sent to workers? (when workers are culled or phased out)

@dentarg
Copy link
Member

dentarg commented Jan 6, 2024

This failed CI run MRI: macos-13 2.7 logged Assertion failed: (("libev: kqueue found invalid fd", 0)), function kqueue_poll, file ev_kqueue.c, line 133.

Uploaded the logs from that run as they will eventually disappear: MRI macos-13 2.7.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants