Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass socket_keepalive_options to redis client for result_backend #8297

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

awmackowiak
Copy link

@awmackowiak awmackowiak commented Jun 5, 2023

Note: Before submitting this pull request, please review our contributing
guidelines
.

Description

In result_backend_transport_options documentation celery have description:

A dict of additional options passed to the underlying transport.
See your transport user manual for supported options (if any).

so after this change we will have possibility to pass socket_keepalive_options defined in result_backend_transport_options to result backend redis client. Also set up max_connection to 0 with is responsible to close conenction to redis after sending task to it.

    app.conf.result_backend_transport_options = {
        'socket_keepalive_options': {
            socket.TCP_KEEPIDLE: 300,
            socket.TCP_KEEPCNT: 5,
            socket.TCP_KEEPINTVL: 30
        }
    }
    app.conf.max_commections = 0

This options are supported in redis-py client.

@awmackowiak awmackowiak force-pushed the resul_broker_transport_options branch from 7b6df98 to 1253610 Compare June 5, 2023 08:04
@auvipy auvipy self-requested a review June 5, 2023 08:05
@auvipy
Copy link
Member

auvipy commented Jun 5, 2023

what this change will let to achieve?

@awmackowiak awmackowiak force-pushed the resul_broker_transport_options branch from f6f9940 to 8d0b6f7 Compare June 5, 2023 08:08
@awmackowiak
Copy link
Author

You will have possibility to manipulate sending tcp keepalive packet to redis from celery client.
It needed when you wist to keep connection open do redis server if you have some proxy between them and on that proxy are defined shorten ttl to keep open connection.

@awmackowiak awmackowiak force-pushed the resul_broker_transport_options branch from 2103b0d to 667278c Compare June 5, 2023 08:20
@awmackowiak awmackowiak force-pushed the resul_broker_transport_options branch from d4c063f to b743579 Compare June 5, 2023 09:12
Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not fully sure why we need this

@awmackowiak awmackowiak force-pushed the resul_broker_transport_options branch from 876d212 to 784981a Compare June 12, 2023 08:53
@awmackowiak awmackowiak force-pushed the resul_broker_transport_options branch from 117783b to 063bd57 Compare June 12, 2023 09:01
…support setup 0 max_connection in it as well
@awmackowiak awmackowiak force-pushed the resul_broker_transport_options branch from 08ae1d6 to 9b540d3 Compare June 14, 2023 11:00
@codecov
Copy link

codecov bot commented Jun 14, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (da1146a) 81.25% compared to head (5b6983d) 81.26%.

Files Patch % Lines
celery/backends/redis.py 88.88% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8297      +/-   ##
==========================================
+ Coverage   81.25%   81.26%   +0.01%     
==========================================
  Files         149      149              
  Lines       18553    18560       +7     
  Branches     3166     3169       +3     
==========================================
+ Hits        15075    15083       +8     
+ Misses       3191     3190       -1     
  Partials      287      287              
Flag Coverage Δ
unittests 81.24% <88.88%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@awmackowiak awmackowiak force-pushed the resul_broker_transport_options branch from 6a799e4 to 5f9c1d5 Compare June 14, 2023 11:14
@awmackowiak awmackowiak force-pushed the resul_broker_transport_options branch from 752ab5f to ce4580d Compare June 14, 2023 11:30
@awmackowiak awmackowiak requested a review from auvipy June 14, 2023 12:57
@auvipy auvipy requested a review from a team June 14, 2023 13:43
@@ -453,7 +458,7 @@ def apply_chord(self, header_result_args, body, **kwargs):
def _chord_zset(self):
return self._transport_options.get('result_chord_ordered', True)

@cached_property
@property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure about this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I have this cached_property 4 test are doing to failure:

======================================================================= short test summary info ========================================================================
FAILED t/unit/backends/test_redis.py::test_RedisBackend_chords_simple::test_on_chord_part_return__unordered - AssertionError: assert 0
FAILED t/unit/backends/test_redis.py::test_RedisBackend_chords_simple::test_on_chord_part_return_no_expiry__unordered - AssertionError: assert 0
FAILED t/unit/backends/test_redis.py::test_RedisBackend_chords_simple::test_on_chord_part_return__ChordError__unordered - TypeError: '_ContextMock' object is not subscriptable
FAILED t/unit/backends/test_redis.py::test_RedisBackend_chords_simple::test_on_chord_part_return__other_error__unordered - TypeError: '_ContextMock' object is not subscriptable
=============================================================== 4 failed, 82 passed in 63.34s (0:01:03) ================================================================

Its looks like _transport_options is setup once and every test use it (doesn't update their custom setup).
Also I don't see reason that should be cached if:

  1. It is in settings object so all worker have their own copy.
  2. Get something form dict don't cost much for computation.

Maybe something I have missed or I haven't know and any advice how to fix will be welcome and I change if back :)

@auvipy
Copy link
Member

auvipy commented Jun 14, 2023

@pawl since you use redis, can you please try / review this PR too?

@pawl
Copy link
Contributor

pawl commented Jun 20, 2023

@auvipy I was unable to start using redis as a broker in production due to this issue: #7276 (reply in thread)

@auvipy
Copy link
Member

auvipy commented Jun 21, 2023

Which version you are using?

@pawl
Copy link
Contributor

pawl commented Jun 22, 2023

Which version you are using?

@auvipy here are the versions I'm using:

  • Celery 5.2.6
  • Kombu 5.2.3
  • redis-py 4.5.4
  • billiard 3.6.4.0

I wasn't seeing any fixes related to celery workers using Redis as a broker getting stuck in the 2 more recent releases.

I'm thinking the issues could be related to the worker_max_tasks_per_child setting which I had set to 1000.

I need to find some time to make progress on celery/kombu#1734 However, Kombu's redis transport code is very difficult to navigate without types and debug logging. I'm thinking it may be easiest to add types before proceeding.

At the moment, I'm not sure I'd recommend using Redis as a celery broker.

@awmackowiak
Copy link
Author

This change doesn't fix reset connection but possibility to configure keep_alive connection to results pub/sub redis.
I didn't see any reason why this change is still open :)
But about kombu and not connection reestablish have you (@pawl @auvipy) some test environments (dockers?) for recreation that issue?

@auvipy auvipy added this to the 5.4 milestone Nov 10, 2023
Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rebase please to test with python 3.12 in CI?

self._ConnectionPool = connection_pool

socket_timeout = _get('redis_socket_timeout')
socket_connect_timeout = _get('redis_socket_connect_timeout')
retry_on_timeout = _get('redis_retry_on_timeout')
socket_keepalive = _get('redis_socket_keepalive')
socket_keepalive_options = self._transport_options.get('socket_keepalive_options', {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we document this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have this description for those parameters and the rest parameters are in code they aren't described either.
So I haven't described it as well. So the question is: should I be highlighted?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least New in version 5.4. annotation?

statefilename Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should be removed from the PR git commit

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Thanks for the clue!

@auvipy
Copy link
Member

auvipy commented Nov 22, 2023

================================== FAILURES ===================================
______________________ test_WorkController.test_statedb _______________________

self = <t.unit.worker.test_worker.test_WorkController object at 0x0000019DC8CCD280>

  def test_statedb(self):
      from celery.worker import state
      Persistent = state.Persistent
  
      state.Persistent = Mock()
      try:
      worker = self.create_worker(statedb='statefilename')

t\unit\worker\test_worker.py:1027:


t\unit\worker\test_worker.py:717: in create_worker
worker = self.app.WorkController(concurrency=1, loglevel=0, **kw)
celery\worker\worker.py:98: in init
self.setup_instance(**self.prepare_args(**kwargs))
celery\worker\worker.py:138: in setup_instance
self.blueprint.apply(self, **kwargs)
celery\bootsteps.py:211: in apply
step.include(parent)
celery\bootsteps.py:339: in include
return self._should_include(parent)[0]
celery\bootsteps.py:335: in _should_include
return True, self.create(parent)
celery\worker\components.py:211: in create
w._persistence = w.state.Persistent(w.state, w.statedb, w.app.clock)
celery\worker\state.py:211: in init
self.merge()
celery\worker\state.py:219: in merge
self._merge_with(self.db)
.tox\3.12-unit\Lib\site-packages\kombu\utils\objects.py:40: in get
return super().get(instance, owner)
C:\hostedtoolcache\windows\Python\3.12.0\x64\Lib\functools.py:995: in get
val = self.func(instance)
celery\worker\state.py:288: in db
return self.open()
celery\worker\state.py:214: in open
return self.storage.open(
C:\hostedtoolcache\windows\Python\3.12.0\x64\Lib\shelve.py:243: in open
return DbfilenameShelf(filename, flag, protocol, writeback)
C:\hostedtoolcache\windows\Python\3.12.0\x64\Lib\shelve.py:227: in init
Shelf.init(self, dbm.open(filename, flag), protocol, writeback)


file = 'statefilename', flag = 'c', mode = 438

???
E dbm.error: db type is dbm.gnu, but the module is not available

C:\hostedtoolcache\windows\Python\3.12.0\x64\Lib\dbm_init_.py:91: error

@awmackowiak
Copy link
Author

@auvipy can you rerun this one integration test. It looks like is non-deterministic.
I assume this because all tests are finished with positive results.

@auvipy
Copy link
Member

auvipy commented Nov 23, 2023

I rerun several time and it is passing this time!

self._ConnectionPool = connection_pool

socket_timeout = _get('redis_socket_timeout')
socket_connect_timeout = _get('redis_socket_connect_timeout')
retry_on_timeout = _get('redis_retry_on_timeout')
socket_keepalive = _get('redis_socket_keepalive')
socket_keepalive_options = self._transport_options.get('socket_keepalive_options', {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least New in version 5.4. annotation?

@auvipy auvipy requested a review from Nusnus November 23, 2023 13:21
@Nusnus
Copy link
Member

Nusnus commented Feb 19, 2024

@awmackowiak Hey there - is it possible to finalize this PR?

Thank you 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

None yet

4 participants