New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sidekiq worker is stuck, doesn't respond to TTIN #2796
Comments
I don't support Gitlab, please contact them for help. |
@mperham Right, I don't expect you to, I work at GitLab so that's actually my job :) This issue was reported to us as Sidekiq getting stuck, and from the symptoms ( Does the GDB output or the |
Sidekiq getting stuck is always due to application code. I can't remember the last time Sidekiq had an actual bug causing a lockup. If Sidekiq is not responding to TTIN, that's typically due to a native gem which is holding the GVL erroneously. According to your output, this thread is performing a rugged operation without releasing the GVL:
Other threads, including 9 and 17, are blocked, waiting for thread 20 to release the GVL. Make sure you are using the latest rugged and maybe open an issue with them. It's not safe to wait for an OS lock while also holding the GVL. As a side note, this is the type of diagnosis I usually charge for. This one's free but I'd encourage GitLab to purchase a license to get pro support. |
Thanks @mperham, that helps tremendously. I will continue debugging from here. It's a testament to Sidekiq's stability and robustness that we've never needed support until now. If we ever need help again we will gladly reach out to you via http://sidekiq.org/support and get a support contract. If I came off a little "demanding" with this issue, I didn't mean to. I understand that your time is valuable. Sidekiq Pro looks great, but is currently not interesting to us since we would need an Appliance license and it would only be available to our GitLab Enterprise Edition customers. Sidekiq "basic" is currently serving our users more than adequately, whether they be on the community or the enterprise edition. |
Glad you see my POV and thanks for the kind words. As a suggestion, you can buy a license just for the support, you don't have to distribute it. Travis CI and Discourse are Sidekiq Pro customers for the support, they actually don't ship or use the Pro bits in their products. $950/yr is a lot cheaper than an appliance license. |
@mperham Fair enough, I'll keep that in mind. |
In case anyone else stumbles upon this issue, the clue is in this thread dump:
As implausible as it sounds, After a lot of head scratching, someone on twitter pointed out to me that they had the same issue, and fixed it by updating glibc. Apparently, CentOS / Red Hat / Scientific Linux 6.7 ships with a broken glibc-2.12-1.166 which can cause a deadlock in malloc/free: https://bugzilla.redhat.com/show_bug.cgi?id=1244002, https://rhn.redhat.com/errata/RHBA-2015-1465.html. Updating glibc resolved the issue. |
Running GitLab, I'm seeing Sidekiq v4.0.1 get stuck periodically and stop processing jobs or responding at all.
I've never seen this kind of issue with any other GitLab instance, but on this particular one this happens about every day, and I'm at a loss.
ps aux | grep sidekiq
shows:The Processes list in Sidekiq Web is empty, the Busy counter reads 0, and the Jobs list is empty.
Nothing shows up in the Sidekiq log for
kill -TTIN 36646
GDB output per the Troubleshooting doc: https://gist.github.com/DouweM/0f15e8f841a7d5643255
(gdb) call (void)rb_backtrace()
output from the log: https://gist.github.com/DouweM/36a8e0bfbb230d876062The last frame is incorrectly ascribed to
gitlab_git
, in reality it'sRugged::Diff#each_patch
.Any idea what could be going on?
Thanks a lot!
The text was updated successfully, but these errors were encountered: