./libev/ev.c:4043: ev_run: Assertion `("libev: ev_loop recursion during release detected", loop_done != EVBREAK_RECURSE' #2905

bisrael · 2022-07-05T22:01:00Z

Describe the bug

puma: cluster worker 3: 19 [app]: ../libev/ev.c:4043: ev_run: Assertion ("libev: ev_loop recursion during release detected", loop_done != EVBREAK_RECURSE)' failed.`

And then it seems possibly have broken the worker and after some time the Puma server stops accepting connections.

Puma config:

workers Integer(ENV.fetch('PUMA_MAX_WORKERS', '3'))
worker_culling_strategy :oldest

wait_for_less_busy_worker(0.01)

force_shutdown_after 25

threads 1, 1 # NO Multithreading

puma_fork_worker_mode = ENV.fetch("PUMA_ENABLE_FORK_WORKER_MODE", "0") == "1"
preload_app!(!puma_fork_worker_mode)

nakayoshi_fork(true) unless ENV.fetch("PUMA_DISABLE_NAKAYOSHI_FORK", "0") == "1"

if puma_fork_worker_mode
  restart_randomness_base = ENV.fetch('PUMA_RESTART_WORKERS_AFTER_REQUESTS', 500.0).to_f
  restart_randomness_jitter = ENV.fetch('PUMA_RESTART_WORKERS_AFTER_REQUESTS_JITTER', 0.0).to_f

  restart_randomness = (restart_randomness_base + (rand * restart_randomness_jitter) - (restart_randomness_jitter / 2.0)).to_i
  puts "Will restart workers after #{restart_randomness} requests (base = #{restart_randomness_base}, jitter = #{restart_randomness_jitter})"

  # Due to the randomness of how requests are assigned, at any given time it seems we have workers with like 1k requests and
  # other workers with like 10 requests. So we'll tell puma to refork the process at some randomized interval.
  # This should help reduce memory footprint and optimize the copy-on-write memory benefits.
  #
  fork_worker(restart_randomness)
end

rackup DefaultRackup
port ENV.fetch('PORT', '3000')

before_fork do
  # we should just need to disconnect redis and it will reconnect on use
  disconnect_redis = -> (redis) {
    if redis.kind_of?(::Redis)
      redis.close
    elsif defined?(::MockRedis) && redis.kind_of?(::MockRedis)
      redis.flushdb
    end

    redis
  }

  disconnect_redis.(::StandaloneRedis.connect) if defined?(::StandaloneRedis)
  disconnect_redis.(::Resque.redis&.redis) if defined?(::Resque)
  disconnect_redis.(::Stoplight::Light.default_data_store.instance_variable_get(:@redis)) if defined?(::Stoplight)
  disconnect_redis.(::ActionCable.server.pubsub.redis_connection_for_subscriptions) if defined?(::ActionCable) && ::ActionCable.server.pubsub.kind_of?(::ActionCable::SubscriptionAdapter::Redis)
  disconnect_redis.($redis) if defined?($redis)

  begin
    ::Rails.cache.clear
  rescue NotImplementedError
    # Ignored
  end
end

To Reproduce

This just starts to happen after some time of running the application server.

In the above config, it happens with:
PUMA_ENABLE_FORK_WORKER_MODE=1
PUMA_DISABLE_NAKAYOSHI_FORK=0
PUMA_MAX_WORKERS=22
PUMA_RESTART_WORKERS_AFTER_REQUESTS=300
PUMA_RESTART_WORKERS_AFTER_REQUESTS_JITTER=50

Specifically if you change PUMA_ENABLE_FORK_WORKER_MODE=0 the error ceases.

The worker count fits on a Private-L dyno for Heroku (14gb ram) for our somewhat bloated app.

Expected behavior

I expect this error to not be raised.

Desktop (please complete the following information):

OS: Ubuntu 18 (heroku-18)
Puma Version: 5.6.4

The text was updated successfully, but these errors were encountered:

nateberkopec · 2022-07-11T22:33:49Z

Do you have more output? Usually a stack dump would have more lines of output.

nateberkopec · 2022-07-11T22:34:13Z

Also I would recommend just not using fork worker mode if that fixes the issue for you.

ioquatix · 2022-08-29T01:09:19Z

Is this coming from nio4r?

dentarg · 2022-11-12T14:30:27Z

@bisrael do you have any more info about this issue?

dentarg · 2022-12-08T21:35:16Z

Searched for part of the error message here and found digital-fabric/polyphony#6

Is something similar happening in Puma when KILL are sent to workers? (when workers are culled or phased out)

dentarg · 2024-01-06T12:43:27Z

This failed CI run MRI: macos-13 2.7 logged Assertion failed: (("libev: kqueue found invalid fd", 0)), function kqueue_poll, file ev_kqueue.c, line 133.

Uploaded the logs from that run as they will eventually disappear: MRI macos-13 2.7.zip

nateberkopec added the needs-repro label Jul 11, 2022

dentarg mentioned this issue May 17, 2023

[CI] test_redirect_io.rb - fixup for intermittent failures #3157

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

./libev/ev.c:4043: ev_run: Assertion `("libev: ev_loop recursion during release detected", loop_done != EVBREAK_RECURSE' #2905

./libev/ev.c:4043: ev_run: Assertion `("libev: ev_loop recursion during release detected", loop_done != EVBREAK_RECURSE' #2905

bisrael commented Jul 5, 2022

nateberkopec commented Jul 11, 2022

nateberkopec commented Jul 11, 2022

ioquatix commented Aug 29, 2022

dentarg commented Nov 12, 2022

dentarg commented Dec 8, 2022

dentarg commented Jan 6, 2024

./libev/ev.c:4043: ev_run: Assertion `("libev: ev_loop recursion during release detected", loop_done != EVBREAK_RECURSE' #2905

./libev/ev.c:4043: ev_run: Assertion `("libev: ev_loop recursion during release detected", loop_done != EVBREAK_RECURSE' #2905

Comments

bisrael commented Jul 5, 2022

nateberkopec commented Jul 11, 2022

nateberkopec commented Jul 11, 2022

ioquatix commented Aug 29, 2022

dentarg commented Nov 12, 2022

dentarg commented Dec 8, 2022

dentarg commented Jan 6, 2024