Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve robustness of rake work_jobs #411

Open
zlogic opened this issue Feb 9, 2016 · 3 comments
Open

Improve robustness of rake work_jobs #411

zlogic opened this issue Feb 9, 2016 · 3 comments

Comments

@zlogic
Copy link

zlogic commented Feb 9, 2016

First of all, many thanks on creating this awesome RSS reader! Just what I was looking for and was about to create myself, but Stringer is so much better than anything I wanted :)

I installed it into Heroku and noticed a problem with reliability of the Heroku-specific rake lazy_fetch approach:

While the bundle exec rake work_jobs is launched from config/unicorn.rb, it's never checked to be running and its delayed_job_pid is never used. It seems that Heroku kills long-running database connections either on purpose, or just randomly, which also causes an unhandled exception to kill the bundle exec rake work_jobs process (example log). While rake lazy_fetch continues to submit jobs, the worker process is dead and never restarted.

It seems the best way would be to run rake work_jobs from a Procfile. I've tried two other workarounds and can confirm they work:

  1. Restart the rake process in bash:
diff --git config/unicorn.rb config/unicorn.rb
--- config/unicorn.rb
+++ config/unicorn.rb
@@ -9,7 +9,7 @@ before_fork do |_server, _worker|
   # as there's no need for the master process to hold a connection
   ActiveRecord::Base.connection.disconnect! if defined?(ActiveRecord::Base)

-  @delayed_job_pid ||= spawn("bundle exec rake work_jobs")
+  @delayed_job_pid ||= spawn("while true; do bundle exec rake work_jobs; sleep 60; done")

   sleep 1
 end
  1. Catch exceptions in the work_jobs job and try to reconnect:
diff --git Rakefile Rakefile
--- Rakefile
+++ Rakefile
@@ -46,11 +46,25 @@ desc "Work the delayed_job queue."
 task :work_jobs do
   Delayed::Job.delete_all

-  3.times do
-    Delayed::Worker.new(
-      min_priority: ENV["MIN_PRIORITY"],
-      max_priority: ENV["MAX_PRIORITY"]
-    ).start
+  logger = Logger.new(STDOUT)
+  logger.level = Logger::DEBUG
+
+  loop do
+    begin
+      logger.info "(Re-)starting worker"
+      if !ActiveRecord::Base.connection.active?
+        logger.info "Restarting DB connection"
+        ActiveRecord::Base.connection.reconnect!
+      end
+      Delayed::Worker.new(
+        min_priority: ENV["MIN_PRIORITY"],
+        max_priority: ENV["MAX_PRIORITY"]
+      ).start
+    rescue => e
+      logger.error e.message
+      e.backtrace.each { |line| logger.error line }
+      sleep 60
+    end
   end
 end
@swanson
Copy link
Collaborator

swanson commented Feb 9, 2016

Yeah lazy_fetch is a big hack. The correct way, as you mentioned, is to run a separate dyno to work jobs -- but this means we can't run on the free plan (only 1 dyno).

Were the exceptions stopping your instance from updating the feeds? I agree that the robustness is not ideal, but I've not experience issues where my heroku instance stopped updating.

@zlogic
Copy link
Author

zlogic commented Feb 9, 2016

Actually, it seems that Heroku counts a pair of worker + web as one "dyno":

I run some other apps on the free tier with a worker + web combo and it works without major problems.

@zlogic
Copy link
Author

zlogic commented Feb 9, 2016

In my case, the worker process stopped after 10-40 minutes, which also stopped updating feeds. The only way to restart it was to wait until the web dyno sleeps and then restarts itself and the worker. It looked something like:

  1. Open Stringer, starting the web and worker processes
  2. Worker process crashes
  3. Timer submits job, but worker is no longer active
  4. Open Stringer again, nothing happens (worker is still crashed)
  5. Wait until the dyno sleeps
  6. Open Stringer again, which clears the job queue and starts the worker
  7. If in luck, the timer will submit the jobs for the worker before it crashes

Maybe it was just a bad day and Heroku was installing security patches or something...

I also had an hourly timer, maybe it submitted jobs when the dynos were sent to sleep. Setting it to run every 10 minutes seems to have improved things a bit.

fearenales added a commit to run-project/stringer that referenced this issue May 6, 2016
It will replace Scheduler (for deployment on Heroku), so that no extra step is
needed due to another spawn command being called on unicorn.rb starting
clockwork.

Still on Heroku, Procfile is updated, describing the worker and clock
components (if the user wants a more robust architecture).
This fixes stringer-rss#411

Also, for development environments, it will allow fetching feeds without
having to manually running rake commands.

The default interval for fetching feeds is 60s and it can be customized
by env var FETCH_INTERVAL.
fearenales added a commit to run-project/stringer that referenced this issue May 6, 2016
It will replace Scheduler (for deployment on Heroku), so that no extra step is
needed due to another spawn command being called on unicorn.rb starting
clockwork.

Still on Heroku, Procfile is updated, describing the worker and clock
components (if the user wants a more robust architecture).
This fixes stringer-rss#411

Also, for development environments, it will allow fetching feeds without
having to manually running rake commands.

The default interval for fetching feeds is 10 min and it can be customized
by env var FETCH_INTERVAL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants