Improve robustness of rake work_jobs #411

zlogic · 2016-02-09T20:32:49Z

First of all, many thanks on creating this awesome RSS reader! Just what I was looking for and was about to create myself, but Stringer is so much better than anything I wanted :)

I installed it into Heroku and noticed a problem with reliability of the Heroku-specific rake lazy_fetch approach:

While the bundle exec rake work_jobs is launched from config/unicorn.rb, it's never checked to be running and its delayed_job_pid is never used. It seems that Heroku kills long-running database connections either on purpose, or just randomly, which also causes an unhandled exception to kill the bundle exec rake work_jobs process (example log). While rake lazy_fetch continues to submit jobs, the worker process is dead and never restarted.

It seems the best way would be to run rake work_jobs from a Procfile. I've tried two other workarounds and can confirm they work:

Restart the rake process in bash:

diff --git config/unicorn.rb config/unicorn.rb
--- config/unicorn.rb
+++ config/unicorn.rb
@@ -9,7 +9,7 @@ before_fork do |_server, _worker|
   # as there's no need for the master process to hold a connection
   ActiveRecord::Base.connection.disconnect! if defined?(ActiveRecord::Base)

-  @delayed_job_pid ||= spawn("bundle exec rake work_jobs")
+  @delayed_job_pid ||= spawn("while true; do bundle exec rake work_jobs; sleep 60; done")

   sleep 1
 end

Catch exceptions in the work_jobs job and try to reconnect:

diff --git Rakefile Rakefile
--- Rakefile
+++ Rakefile
@@ -46,11 +46,25 @@ desc "Work the delayed_job queue."
 task :work_jobs do
   Delayed::Job.delete_all

-  3.times do
-    Delayed::Worker.new(
-      min_priority: ENV["MIN_PRIORITY"],
-      max_priority: ENV["MAX_PRIORITY"]
-    ).start
+  logger = Logger.new(STDOUT)
+  logger.level = Logger::DEBUG
+
+  loop do
+    begin
+      logger.info "(Re-)starting worker"
+      if !ActiveRecord::Base.connection.active?
+        logger.info "Restarting DB connection"
+        ActiveRecord::Base.connection.reconnect!
+      end
+      Delayed::Worker.new(
+        min_priority: ENV["MIN_PRIORITY"],
+        max_priority: ENV["MAX_PRIORITY"]
+      ).start
+    rescue => e
+      logger.error e.message
+      e.backtrace.each { |line| logger.error line }
+      sleep 60
+    end
   end
 end

The text was updated successfully, but these errors were encountered:

swanson · 2016-02-09T21:12:53Z

Yeah lazy_fetch is a big hack. The correct way, as you mentioned, is to run a separate dyno to work jobs -- but this means we can't run on the free plan (only 1 dyno).

Were the exceptions stopping your instance from updating the feeds? I agree that the robustness is not ideal, but I've not experience issues where my heroku instance stopped updating.

zlogic · 2016-02-09T21:23:52Z

Actually, it seems that Heroku counts a pair of worker + web as one "dyno":

I run some other apps on the free tier with a worker + web combo and it works without major problems.

zlogic · 2016-02-09T21:33:39Z

In my case, the worker process stopped after 10-40 minutes, which also stopped updating feeds. The only way to restart it was to wait until the web dyno sleeps and then restarts itself and the worker. It looked something like:

Open Stringer, starting the web and worker processes
Worker process crashes
Timer submits job, but worker is no longer active
Open Stringer again, nothing happens (worker is still crashed)
Wait until the dyno sleeps
Open Stringer again, which clears the job queue and starts the worker
If in luck, the timer will submit the jobs for the worker before it crashes

Maybe it was just a bad day and Heroku was installing security patches or something...

I also had an hourly timer, maybe it submitted jobs when the dynos were sent to sleep. Setting it to run every 10 minutes seems to have improved things a bit.

It will replace Scheduler (for deployment on Heroku), so that no extra step is needed due to another spawn command being called on unicorn.rb starting clockwork. Still on Heroku, Procfile is updated, describing the worker and clock components (if the user wants a more robust architecture). This fixes stringer-rss#411 Also, for development environments, it will allow fetching feeds without having to manually running rake commands. The default interval for fetching feeds is 60s and it can be customized by env var FETCH_INTERVAL.

It will replace Scheduler (for deployment on Heroku), so that no extra step is needed due to another spawn command being called on unicorn.rb starting clockwork. Still on Heroku, Procfile is updated, describing the worker and clock components (if the user wants a more robust architecture). This fixes stringer-rss#411 Also, for development environments, it will allow fetching feeds without having to manually running rake commands. The default interval for fetching feeds is 10 min and it can be customized by env var FETCH_INTERVAL.

zlogic mentioned this issue Feb 10, 2016

Proposal for fixing issue #411 #412

Closed

zlogic mentioned this issue Feb 27, 2016

Alternative method of running rake work_jobs #419

Closed

fearenales mentioned this issue Mar 30, 2016

Adding clockwork to recurrently fetch feeds #430

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve robustness of rake work_jobs #411

Improve robustness of rake work_jobs #411

zlogic commented Feb 9, 2016

swanson commented Feb 9, 2016

zlogic commented Feb 9, 2016

zlogic commented Feb 9, 2016

Improve robustness of rake work_jobs #411

Improve robustness of rake work_jobs #411

Comments

zlogic commented Feb 9, 2016

swanson commented Feb 9, 2016

zlogic commented Feb 9, 2016

zlogic commented Feb 9, 2016