Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconnection problem #890

Closed
ks-s-a opened this issue Mar 16, 2018 · 31 comments
Closed

Reconnection problem #890

ks-s-a opened this issue Mar 16, 2018 · 31 comments
Labels

Comments

@ks-s-a
Copy link

ks-s-a commented Mar 16, 2018

Description

I create a queue and add tasks to queue every 10 seconds. Use bluebird library to set a delay in a task processor.

The point is - I'm trying to check behavior when redis connection is lost. When I switch off the redis server in prcessors work phase, everything is ok, we finish the task, reconnect and continue task processing.

If I try to switch off the redis server, when there is no tasks in queue, task processing is stop and it doesn't reconnect to the server. The connection is in "end" status.

I need to get any way to establish a connection after redis server was restarted. It's vital for our project.

Logs when I restart redis server on waiting period of the time (non-task processing time):

➜  queue git:(master) ✗ node test.js
start task
resolve task!
start task
resolve task!
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: ReplyError: ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context

Logs when I restart redis server while a task is executing (processing time):

➜  queue git:(master) ✗ node test.js
start task
Error in bull queue happend: Error: read ECONNRESET
Error in bull queue happend: Error: read ECONNRESET
Error in bull queue happend: Error: read ECONNRESET
Error in bull queue happend: Error: read ECONNRESET
Error in bull queue happend: Error: read ECONNRESET
Error in bull queue happend: Error: read ECONNRESET
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: ReplyError: ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context
resolve task!
Error in bull queue happend: Error: Missing lock for job 1 failed
start task
resolve task!
start task
resolve task!

Look at the end of the copy-paste - new tasks are processing. Everything looks fine unlike the previous example.

Test code to reproduce

const Promise = require('bluebird');
const Queue = require('bull');

const queue = new Queue('test', {
  redis: {
    host: 'localhost',
    port: 6379
  }
});

queue
  .on('error', function (error) {
    console.error(`Error in bull queue happend: ${error}`);
  })
  .on('failed', function (job, error) {
    console.error(`Task was failed with reason: ${error}`);
  });

queue.process('test_task', () => {
  console.log('start task');
  return Promise.delay(5000).then(() => {
    console.log('resolve task!');
  });
});

setInterval(function () {
  queue.add('test_task', {});
}, 10000);

Bull version

bull - 3.3.10, node - v6.10.2

@manast
Copy link
Member

manast commented Mar 17, 2018

I have not tested further, but the signature to your process function is wrong, you can only have one argument, job, and done, but if you specify the done callback you need to use it!.

queue.process('test_task', function (job) {
  console.log('start task');
  return Promise.delay(5000).then(() => {
    console.log('resolve task!');
  });
});

@manast
Copy link
Member

manast commented Mar 17, 2018

actually, after fixing the signature I could test that the queue works well even with redis connection interruptions. please verify and close if it works for you.

@ks-s-a
Copy link
Author

ks-s-a commented Mar 19, 2018

@manast Ok. Throw out my small wrappers and use only vanilla library. The result is the same.

Added log outputs and cleaned code snippet. (see topic starter message)

@manast
Copy link
Member

manast commented Mar 19, 2018

please try version 3.3.10

@ks-s-a
Copy link
Author

ks-s-a commented Mar 19, 2018

@manast the same story. =(

Fixed version in description.

@ks-s-a
Copy link
Author

ks-s-a commented Mar 21, 2018

@manast Is it reproducible on your machine?

@manast
Copy link
Member

manast commented Mar 21, 2018

yes I did, seems like a bug in ioredis: seems like it handles incorrectly blocking commands in the event of a disconnection. I did not have time to report it as an issue and/or find a workaround.

@manast
Copy link
Member

manast commented Mar 25, 2018

I did some tests where using this redis option did help: {enableOfflineQueue: false} could be a temporary workaround until ioredis gets fixed.

@ks-s-a
Copy link
Author

ks-s-a commented Mar 26, 2018

@manast It didn't help me. Absolutely the same result of the test.

@ks-s-a
Copy link
Author

ks-s-a commented Apr 4, 2018

@manast Could you add a link to the ioredis issue? It's important for me to track it. May be if I have some time, I'll try to help them or will investigate the problem.

@manast
Copy link
Member

manast commented Apr 4, 2018

redis/ioredis#610

@lincoln-spiteri
Copy link

Is there any new info about this. The ticket raised with the ioredis maintainer has not been updated. Are there any known workarounds for this issue?

@andrelaszlo
Copy link

@lincoln-spiteri I tried the workaround suggested in ioredis#610 - disabling the built-in retryStrategy and reconnecting manually. It's not great, resending seems broken for example, but it stays connected. I'm also using it with another library, and this library is throwing a few exceptions every time I reconnect...

Also hoping for a better solution soon.

@luin
Copy link

luin commented May 1, 2018

Sorry for the late response. I tested locally using the code provided by @ks-s-a but the result looks correct for me:

bull-test ❯ node index.js                                                                                                                  ⏎
start task
resolve task!
start task
resolve task!
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6381
start task
resolve task!
start task

I restarted the redis server multiple times at different stages (non-task processing time & processing time) but the result stayed the same. Did I miss something?

Tested with bull 3.3.10 & 3.4.1.

@kakasal
Copy link

kakasal commented May 31, 2018

    at Object.exports._errnoException (util.js:1018:11)
    at exports._exceptionWithHostPort (util.js:1041:20)
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1086:14)```

@adogor
Copy link

adogor commented Oct 17, 2018

Hi,

We had a problem of re-connection on our queues, whenever the redis was stopped the "listening" queues could not restart and there was this message each time:

ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context

We solved it.

Problem is that when reconnecting ioredis will check if server is ready. This check (https://github.com/luin/ioredis/blob/6dd3730617ece16f874e3ce3e5da0e113ac272d3/lib/redis.js#L413) use the info command which is not allowed in the queue context.

So we just had to disable the enableReadyCheck option on our configuration to stop this check. After that our config look like :

Queue(config.queueName, {
  redis: {
    port: xxx,
    host: xxx,
    maxRetriesPerRequest: null,
    enableReadyCheck: false
  },
});

All the queues are now reconnecting has they should do and processing remaining jobs. 👍

We hope this option solves some of the connection problems in this thread (maybe not ...), @ks-s-a you might test this ;-)

@luin
Copy link

luin commented Oct 17, 2018

@adogor Thank you for pointing this out. I'm wondering how to reproduce the error since it works well on my machine with the code provided by @ks-s-a . The error is strange since ioredis should handle the case that ready check is enabled while a subscribe command is issued.

@leonluyong
Copy link

leonluyong commented Jul 4, 2019

@luin tested same code provided by @ks-s-a , still have problem with reconnecting, same output as @ks-s-a provided. Any updates for this issue?

Windows 10
Node v10.16.0 / npm 6.9.0
bull version 3.10.0
redis: Docker (Redis server v=5.0.5 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=7983a619928f1f2d)
D:\Code\nodetest>node test.js 
start task
resolve task!
start task
resolve task!
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: Error: connect ECONNREFUSED 127.0.0.1:6379
Error in bull queue happend: ReplyError: ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context

no more information even waiting a long time.

@zyf0330
Copy link

zyf0330 commented Jan 15, 2020

If I change Promise.delay(5000) to Promise.delay(1), then new job will not be processed anymore. New job is added one every 10s but not be processed.
image
And enableReadyCheck: false option makes it resume from this situation.

@psychonetic
Copy link

@adogor Does not work for me. (3.10.0).
I get another error: connect ETIMEDOUT when my application tries to add new jobs. Before it was the connection refused error.

Are there any updates on this?

@manast
Copy link
Member

manast commented Oct 26, 2020

@psychonetic if you have a specific use-case where your problem is reproduced I recommend you to post that, because most issues in this thread are obsolete or user-code specific.

@gigantedocil
Copy link

Hi, I'm having issues in two specific situations:

  • If I start the apps that try to create the queue and connect to Redis and Redis is not available at that time, the queues will never work even if Redis becomes available afterwards.
  • If I start the apps that try to create the queue and connect to Redis and Redis is available at that time but afterwards Redis becomes unavailable and longer down the line becomes available again, the queues cannot recover from the first disconnect and will not work from that moment onwards.

I have forked a very simple tutorial from Heroku and updated its dependencies. It is a very simple project that creates a dashboard and creates jobs with the use of Express.js and Bull that can be used to replicate both issues mentioned above.

I run Redis on a Docker container and I shut down the container and then start it again to simulate Redis availability.

I also tried adding the enableReadyCheck: false to the Queue constructor as mentioned in this thread but both problems still persist.

@manast
Copy link
Member

manast commented May 3, 2021

@gigantedocil I have tested this recently and I could not reproduce any connection issues. Could you just provide some code snippet (not a full repo), that isolates the issue so that I can test it?

@stale
Copy link

stale bot commented Jul 12, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@killthekitten
Copy link

Looks like this @adogor's approach is the default one for bullmq. @manast is there a reason this shouldn't be backported to bull?

@manast
Copy link
Member

manast commented Feb 23, 2022

@killthekitten backport what exactly?

@killthekitten
Copy link

killthekitten commented Feb 23, 2022

@manast I mean setting the default configuration of the ioredis instance to maxRetriesPerRequest: null, enableReadyCheck: false.

Sorry for not making it clear.

@manast
Copy link
Member

manast commented Feb 23, 2022

Check this and let me know if it clarifies this issue: https://docs.bullmq.io/bull/patterns/persistent-connections
AFAIC there is nothing else to do in the library.

@killthekitten
Copy link

killthekitten commented Feb 23, 2022

My mistake, I wasn't aware of the changes in this release and the ones above, and assumed it was only implemented for bullmq. Thanks for pointing at the doc and sorry for dragging you here!

@manast
Copy link
Member

manast commented Feb 23, 2022

Ok. I will close this issue then.

@Shahtaj-Khalid
Copy link

Hi, I'm observing similar issue, details added here : #2612

In my case, I'm adding jobs in queue directly using addJobs lua script, using separate redis client, and jobs are being added properly without any issue. but they are not getting consumed by bull. Once I encounter a EConnReset error in redis queue, consumer starts consuming all the messages again.

Kindly let me know if I'm missing anything, or how to fix this ? It's really important. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests