New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bull process stops consuming new messages #2612
Comments
Can you try without an async function please?
|
@qlereboursBS there would be no difference, in either using done (without async), or returning a promise, or using async. Just do not define an async function (or return a promise) and use done at the same time. In other words, the code you pasted above will not work. |
@qlereboursBS I have tried without async too, observing same issue. @manast I came across a similar issue : #890, I'm passing Can you please guide what to do here?? |
By any chance could you test with BullMQ instead? if BullMQ does not suffer from this problem then we know it is something specific to Bull and it may be easier to spot the reason for it. |
I can also see that both Bull and BullMQ use the same version of ioredis, so if the issue also exists in BullMQ then it could be some reconnection issue with IORedis, otherwise, the problem must come from the error handling logic in Bull. |
I'm able to narrow down the issue, Seems like the issue is with error handling logic of Bull, I'm able to reproduce this every time in below scenario occur:
Also, no error event is produced by bull in this case, but after hrs, bull emits below error event :
and after this, process's event loop is resumed, and it starts processing the messages again. -- Hope this information helps. cc: @manast |
I wonder if this is not an issue with ioredis actually. For instance, the setting "maxRetriesPerRequest: null" will work so that a command will never fail, "Set maxRetriesPerRequest to null to disable this behavior, and every command will wait forever until the connection is alive again (which is the default behavior before ioredis v4).". It is possible that a bug in ioredis prevents the brpoplpush command to be re-executed when the connection is alive again. |
We are using |
I understand, what I meant is that if the same problem does not exist in BullMQ then we can corner the bug easier. Right now we do not have a lot to go for as the issue is not easy to reproduce. |
@manast Sure I'll try to test it and will share the results. I noticed that when I'm running the same worker via npm for local testing, process does not get effected with the connection reset error of redis (it's connected with the same redis server, running via docker), but the issue is seen when I'm running the worker via docker only (connected with the same redis instance). It's very strange, can you tell what could be the reason? since bull's and ioRedis version is same in both cases. |
I don't know the reason but clearly, the type of connection error is different and therefore it could be an issue with ioredis that is not handling it correctly. |
But you are still not able to provide a reproducible case even using docker right? If I were you that's where I would put my efforts... |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@Shahtaj-Khalid Do you have any update on this issue? |
Description
I'm initialising bull's process at the start of my worker (running via docker and k8s), and it continues to listen to the message against the configured queue. Problem is, I have noticed that after some time (few hours, it's not some fix time), bull stops consuming new message, even though jobs do exists in the wait queue, I have checked in redis.
When I re-start my worker pod, it starts consuming those jobs again.
There are no errors in the 'error' event right away, when the consumer stop processing new jobs, but sometimes, I observed below error in my worker after few more hrs:
Minimal, Working Test code to reproduce the issue.
This is how I'm initialising bull, and activating process method
Also, please note : when I restart my redis pod, redis throws unreachable error which gets recovered and bull continues to consume new messages. It seems like this issue is happening only when redis throws
Connection reset by peer
error, and it's not getting caught by error event as well right away, it takes some hours, and once error event is received, bull starts processing the queued jobs again.Bull version
"bull": "^4.10.2",
"ioredis": "^4.28.5"
Nodejs: 14.15
Additional information
Since I'm not receiving any error event, it's hard to debug this, kindly let me know what could possibly trigger this issue, it's a severe issue in my case since we are relaying on bull for all data processing, thank you.
The text was updated successfully, but these errors were encountered: