[Bug] Clients running with connection pool configured cannot retry on transient network failures #1789

fullykubed · 2021-12-14T22:41:45Z

Version:

redis-py: 4.0.2
redis: Redis server v=6.2.6 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=7fdff9439e18e4a1

Platform:

Python 3.9.5 on Ubuntu 18.04 on MS Azure

Description:

When using the ConnectionPool class with the Redis client with a retry param, the client does not properly retry on network failures or redis server restarts.

This is because the call to pool.get_connection is not wrapped by retry.call_with_retry. As pool.get_connection calls connection.connect here, it will fail without a retry if the socket connection has been disrupted.

As using the client without a connection pool doesn't have this issue, I'd consider this a bug as I believe that the intent of providing a retry configuration to the client would be to avoid throwing exceptions for transient network failures even if running in pool mode.

It seems that this occurs because the retry object is bound only to the connection rather than the client or connection pool so the client cannot currently retry without first retrieving the connection from the pool.

Example Stack Trace

│ django_redis.exceptions.ConnectionInterrupted: Redis ConnectionError: Error 111 connecting to redis-cache-master.fusion.svc.cluster.local:6379. Connection refused.                                         │
│ During handling of the above exception, another exception occurred:                                                                                                                                         │
│ Traceback (most recent call last):                                                                                                                                                                          │
│   File "/usr/local/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner                                                                                                        │
│     response = get_response(request)                                                                                                                                                                        │
│   File "/usr/local/lib/python3.9/site-packages/django/utils/deprecation.py", line 116, in __call__                                                                                                          │
│     response = self.process_request(request)                                                                                                                                                                │
│   File "/usr/local/lib/python3.9/site-packages/django/middleware/cache.py", line 145, in process_request                                                                                                    │
│     cache_key = get_cache_key(request, self.key_prefix, 'GET', cache=self.cache)                                                                                                                            │
│   File "/usr/local/lib/python3.9/site-packages/django/utils/cache.py", line 362, in get_cache_key                                                                                                           │
│     headerlist = cache.get(cache_key)                                                                                                                                                                       │
│   File "/usr/local/lib/python3.9/site-packages/django_redis/cache.py", line 91, in get                                                                                                                      │
│     value = self._get(key, default, version, client)                                                                                                                                                        │
│   File "/usr/local/lib/python3.9/site-packages/django_redis/cache.py", line 38, in _decorator                                                                                                               │
│     raise e.__cause__                                                                                                                                                                                       │
│   File "/usr/local/lib/python3.9/site-packages/django_redis/client/default.py", line 258, in get                                                                                                            │
│     value = client.get(key)                                                                                                                                                                                 │
│   File "/usr/local/lib/python3.9/site-packages/redis/commands/core.py", line 1023, in get                                                                                                                   │
│     return self.execute_command('GET', name)                                                                                                                                                                │
│   File "/usr/local/lib/python3.9/site-packages/redis/client.py", line 1068, in execute_command                                                                                                              │
│     conn = self.connection or pool.get_connection(command_name, **options)                                                                                                                                  │
│   File "/usr/local/lib/python3.9/site-packages/redis/connection.py", line 1173, in get_connection                                                                                                           │
│     connection.connect()                                                                                                                                                                                    │
│   File "/usr/local/lib/python3.9/site-packages/redis/connection.py", line 571, in connect                                                                                                                   │
│     raise ConnectionError(self._error_message(e))

Potentials Fixes:

Since the various connection classes never really appear to be using the retry internally (the client lifts the retry object from the connection in order to wrap connection method calls), perhaps the retry object should live on the client rather than on each individual connection. This would allow the clients to retry both for operations on the connection pools and the connections.
If it were preferred to keep the retry bound to the connections, then perhaps the connection pool should also wrap the connection.connect() call similarly to how the client does.
You could also bind a retry to the connection pool so that clients can extract the retry similar to how they extract the retry from each connection.

Any of the above solutions would appear to fix this issue and seem straightforward to implement. Just a matter a picking the preferred style. I am happy to contribute a fix if given a direction on which approach is preferred.

Related Issues:

Connection pool doesn't recover from aborts or resets #1772: It seems like this issue might need to be resolved prior to the retry working since retry expects particular error classes by default

The text was updated successfully, but these errors were encountered:

dlarsen5 · 2021-12-14T23:48:41Z

also experiencing this issue

ugatenio · 2022-01-31T07:43:21Z

The fix was merged: #1895

fullykubed · 2022-02-08T13:32:33Z

LGTM

fullykubed changed the title ~~Connection~~ [Bug] Clients running with connection pool configured cannot retry on transient network failures Dec 14, 2021

Andrew-Chen-Wang mentioned this issue Dec 21, 2021

ConnectionResetError: [Errno 104] Connection reset by peer aio-libs-abandoned/aioredis-py#778

Open

fullykubed closed this as completed Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Clients running with connection pool configured cannot retry on transient network failures #1789

[Bug] Clients running with connection pool configured cannot retry on transient network failures #1789

fullykubed commented Dec 14, 2021

dlarsen5 commented Dec 14, 2021

ugatenio commented Jan 31, 2022

fullykubed commented Feb 8, 2022

[Bug] Clients running with connection pool configured cannot retry on transient network failures #1789

[Bug] Clients running with connection pool configured cannot retry on transient network failures #1789

Comments

fullykubed commented Dec 14, 2021

dlarsen5 commented Dec 14, 2021

ugatenio commented Jan 31, 2022

fullykubed commented Feb 8, 2022