Problems with reliability of Pub/Sub subscriptions in different Redis clients #7855

adrianpasternak · 2020-09-28T09:21:02Z

Describe the bug

I'm not sure if this is right place to report this issue, because it seems like a problem with Redis clients. But the same issue is present in all clients that I've checked (Lettuce, Redisson, Jedis, go-redis).

In a case of a sudden connection loss Redis clients are not able detect network problems, and will be listening for Pub/Sub messages on a broken TCP connection for hours, making Pub/Sub unusable.

To reproduce

Start a Redis on Host A
Connect to a Pub/Sub using one of the Redis clients from Host B
Block all traffic on Host A to a Redis server using iptables or other tool
Redis client will not discover that the connection is lost.
Now restart Redis on Host A, and restore network traffic.
Redis client will be listening on connection that no longer exist on the server-side.

I've managed to reproduce this behavior using three different Java clients, and go-redis. Ticket for Lettuce with more details: redis/lettuce#1428

Expected behavior

Redis clients subscribed to a Pub/Sub should be able to detect a broken network connection, and reconnect when necessary.

Additional information

The undocumented workaround for this issue is to tweak OS parameters on a client's host: SO_KEEPALIVE, TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT.
It's similar to what redis-cli client is doing in application layer:

redis/src/redis-cli.c

Line 908 in 1c71038

anetKeepAlive(NULL, context->fd, REDIS_CLI_KEEPALIVE_INTERVAL);

redis/src/anet.c

Line 95 in efb6495

int anetKeepAlive(char *err, int fd, int interval)

Is there is any other way of making reliable Pub/Sub subscriptions without changing OS parameters?
Shouldn't all Redis clients change socket parameters in application layer like redis-cli?

oranagra · 2020-09-28T10:29:28Z

Since redis (the server side) is no longer present, I don't presume anything can be done in the server side to mitigate it. It must be something on the client side, either the OS or client library.
TCP keepalive seems like the right solution (that's exactly what it was designed for AFAIK).

@yossigo do you see anything that can be done on our side other than document it? (which I'm not sure will help much)

yossigo · 2020-09-29T07:17:22Z

@oranagra Theoretically we could come up with an application level keepalive mechanism where Redis periodically sends a heartbeat message. This would involve a lot of backwards compatibility issues and I am not sure there's a significant benefit that justifies it.

I think the best we can do is raise awareness to this issue with client maintainers, who should consider setting TCP keepalive by default on Pub/Sub connections.

oranagra · 2020-09-29T07:33:35Z

if redis is sending keepalive messages it's the client's responsibility to detect that it's dead.
maybe instead the client can try to send some PING and detect a write failure when the socket is dead.
but i don't see any advantage for all of that over TCP KEEPALIVE.

@itamarhaber do you know where something like that can be documented? and how to bring this to the attention of existing client maintainers?

tzickel · 2020-09-30T07:26:41Z

This is a general issue with long-lived silent TCP connections, not specific to Redis nor Pub/Sub (What about a blocking operation with infinite timeout like BLPOP, there you can't even send PING but on Pub/Sub you can).

It can happen in many ways, think about a connection pool, where one of the connection has been stalled like above, then the client tries to send a command on that connection, and never receives a response (what is a good timeout for that ?)...
Clients should provide sensible ways to try to mitigate the variety of issues that can arise from this:

When taking a connection from a pool which have not been talked in awhile, to try a PING before using it (redis-py has that which is disabled by default):

https://github.com/andymccurdy/redis-py/blob/master/redis/connection.py#L676
When possible (like in Pub/Sub), send software keepalive PINGs (the problem with that is it depends on how easy / portable is it to send PING once in a while without involving the end user of the library...).
Allow for easy exposing of the OS level keepalive settings (most clients do this in a raw way which is not easy / portable), comparing:
where you have to know the options for your OS
https://github.com/andymccurdy/redis-py/blob/master/redis/connection.py#L590
vs.
Where you just tell it the keepalive interval and it tries to be smart about it.
https://github.com/tzickel/justredis/blob/master/justredis/sync/environments/threaded.py#L27

I had lots of strange issues in my code where sometimes some of the Redis connections would just hang for no good reason. It happened quite frequent that I ended enabling client side OS keepalive, which fixed the issue.

adrianpasternak mentioned this issue Sep 28, 2020

Lettuce cannot recover from connection problems redis/lettuce#1428

Closed

kamran-redis mentioned this issue Aug 4, 2021

Allow to set keepalive socket options for SentinelPool +switch-master subscriber connection redis/jedis#2614

Open

mjeffrey18 mentioned this issue Nov 10, 2021

Idle sub/sub connections jgaskins/redis#7

Open

adib0u mentioned this issue Jun 28, 2022

Add keepalive option for redis client CybercentreCanada/assemblyline-base#809

Merged

NomadXD mentioned this issue Feb 7, 2023

Add keepalive for redis client to make the connections reliable openresty/lua-resty-redis#263

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with reliability of Pub/Sub subscriptions in different Redis clients #7855

Problems with reliability of Pub/Sub subscriptions in different Redis clients #7855

adrianpasternak commented Sep 28, 2020

oranagra commented Sep 28, 2020

yossigo commented Sep 29, 2020

oranagra commented Sep 29, 2020

tzickel commented Sep 30, 2020

Problems with reliability of Pub/Sub subscriptions in different Redis clients #7855

Problems with reliability of Pub/Sub subscriptions in different Redis clients #7855

Comments

adrianpasternak commented Sep 28, 2020

oranagra commented Sep 28, 2020

yossigo commented Sep 29, 2020

oranagra commented Sep 29, 2020

tzickel commented Sep 30, 2020