Blocking calls not working as expected in the case of disconnections #610

manast · 2018-03-25T13:16:56Z

We are having a serious issue in bull (OptimalBits/bull#890), where the queue stops processing commands in the event of disconnections. I have tracked it down to be an issue in ioredis. It seems that blocking commands are not handled properly in the case of disconnections. It is very easy to reproduce, but there are many cases to consider. Here I report the most obvious ones.

Code to reproduce:

const Redis = require('ioredis');

const redis = new Redis();

redis.brpoplpush('source', 'destination', 10).then(function(result){
  console.log(result)
}, function(err){
  console.error(err);
});

redis.on('error', function(err){
  // Outcommented to avoid noise.
  / /console.log('ERROR EVENT', err);
});

Case 1. Disconnect before calling blocking command.

Behaviour
Dangling call, nothing happens for ever.
Expected
Error or at least timeout after given timeout.

Case 2. Connected before calling command, disconnected afterwards.

Behaviour
Dangling call, nothing happens for ever.
Expected
Error or at least timeout after given timeout.

Case 3. Connected before calling blocking command, disconnected and then reconnected.

Behaviour
Dangling call, nothing happens for ever.
Expected
Error or at least timeout after given timeout.

Case 4. Disconnected before calling blocking command, connected afterwards.

Behaviour
Timeout after 10 seconds after reconnection.

Expected
Works as expected?

Since the blocking command is not cancelable (#516), there is currently no workaround I know of for this, and you may end with a dangling client, so I think this issue is quite serious but please lets discuss it.

luin · 2018-03-30T16:27:30Z

Hi @manast.
For the case 4, I tested locally and the behavior is working as expected (logs the result when reconnected). That's strange that ioredis just times out when the source list has elements. Could you try to call LLEN source on a redis-cli when reconnected to see whether there are elements in the source list?

ioredis reconnecting to the server forever, so all commands will be blocking when disconnected. This behavior makes sense when it comes to an application that the connection will recover shortly (<10s~1min).

Setting retryStrategy to null and handling reconnection manually in the close event may solve the problem:

case 1: prints errors immediately.
case 2: prints errors immediately.
case 3: prints errors immediately when disconnected.
case 4: prints errors immediately.

stale · 2018-04-29T17:06:56Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed.

manast · 2018-04-29T17:47:44Z

bump

ks-s-a · 2018-05-04T12:57:39Z

@luin It's quite strange - I use docker image with redis, istead of the pure local redis server. I tested local redis-server launching and bug doesn't exist in that case (external IP, protected-mode off).

We with my colleagues tested reconnection with product-like environment and couldn't repeat. We use kubernetes, may be it somehow affects the bug.

I'm not sure that I can investigate the issue further, I tried different redis options in the docker and local with the same result. It happens in docker container, but not with local redis server.

manast · 2018-05-04T18:12:21Z

ok, I try to test again with a reproducible environment.

lavarsicious · 2018-05-15T14:26:42Z

We're definitely seeing this issue occur when using an Azure Redis instance. If I scale the service, Azure will disconnect any clients when it cuts over.

stale · 2018-06-14T15:22:28Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed.

manast · 2018-06-14T16:29:13Z

bump to avoid auto close

carly · 2018-07-16T20:41:49Z

@manast have you been able to come up with a good workaround for this issue? I'm using bull for an internal app I'm building at work. Everything works as expected on my dev box, but I'm running into this issue when I try to configure my app to redis instances on different hosts.

manast · 2018-07-17T20:48:15Z

@carly not yet. I need to provide better test code for @luin but I did not have enough time for it, I will try to prioritize it.

elucidsoft · 2019-12-21T14:12:13Z

I use Kubernetes and have seen this issue. I think to re-create what you need to do is establish a healthy connection to redis, then kill your redis server, and send a command to it causing an exception, then start your redis server back up. Non-blocking calls will connect successfully, blocking calls will throw an exception. I can also confirm that this behavior even occurs if your using Sentinels, if you shutdown all of your sentinels the behavior is exactly the same.

d0x2f · 2020-06-09T00:57:05Z

Is this still an active issue? It may explain problems we've been seeing (also in kubernetes).

elucidsoft · 2020-06-10T13:06:26Z

Is this still an active issue? It may explain problems we've been seeing (also in kubernetes).

I was able to resolve this but it required a TON of tweaking of my redis instances in Kubernetes, and code hacks. So it's solvable with a lot of work. FWIW, I ended up dumping my custom redis configuration and went with the Bitnami Helm chart. I made sure to set Sentinel.staticID: true, also I made sure to use sysctlImage to set net.core.somaxconn=10000 and transparent_hugepage/enabled.

Doing those things appears to have fixed this issue entirely, I have not seen it happen in over 6 months. I also changed the redis config options.

connectTimeout: 10000, sentinelRetryStrategy: () => Math.min(10 * 10, 1000)

In addition, based on my testing I noticed even with those changes it still appears to happen if the redis instance doesn't have enough memory or cpu resources. So I doubled those as well.

shaharmor assigned luin Mar 26, 2018

luin added the discussion label Mar 30, 2018

manast mentioned this issue Apr 4, 2018

Reconnection problem OptimalBits/bull#890

Closed

stale bot added the wontfix label Apr 29, 2018

stale bot removed the wontfix label Apr 29, 2018

stale bot added the wontfix label Jun 14, 2018

stale bot removed the wontfix label Jun 14, 2018

luin added the pinned label Jun 22, 2018

manast mentioned this issue Dec 21, 2019

ECONNREFUSED Error when adding Job to Queue taskforcesh/bullmq#83

Closed

manast mentioned this issue Feb 5, 2021

Blocking not working properly in case of reconnections #1285

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blocking calls not working as expected in the case of disconnections #610

Blocking calls not working as expected in the case of disconnections #610

manast commented Mar 25, 2018 •

edited

luin commented Mar 30, 2018

stale bot commented Apr 29, 2018

manast commented Apr 29, 2018

ks-s-a commented May 4, 2018

manast commented May 4, 2018

lavarsicious commented May 15, 2018

stale bot commented Jun 14, 2018

manast commented Jun 14, 2018

carly commented Jul 16, 2018

manast commented Jul 17, 2018

elucidsoft commented Dec 21, 2019

d0x2f commented Jun 9, 2020

elucidsoft commented Jun 10, 2020 •

edited

Blocking calls not working as expected in the case of disconnections #610

Blocking calls not working as expected in the case of disconnections #610

Comments

manast commented Mar 25, 2018 • edited

Code to reproduce:

Case 1. Disconnect before calling blocking command.

Case 2. Connected before calling command, disconnected afterwards.

Case 3. Connected before calling blocking command, disconnected and then reconnected.

Case 4. Disconnected before calling blocking command, connected afterwards.

luin commented Mar 30, 2018

stale bot commented Apr 29, 2018

manast commented Apr 29, 2018

ks-s-a commented May 4, 2018

manast commented May 4, 2018

lavarsicious commented May 15, 2018

stale bot commented Jun 14, 2018

manast commented Jun 14, 2018

carly commented Jul 16, 2018

manast commented Jul 17, 2018

elucidsoft commented Dec 21, 2019

d0x2f commented Jun 9, 2020

elucidsoft commented Jun 10, 2020 • edited

manast commented Mar 25, 2018 •

edited

elucidsoft commented Jun 10, 2020 •

edited