Can't log from within tpool.execute #432

smerritt · 2017-08-18T22:29:08Z

If you try to log from within a function called by tpool.execute, there is a chance that the tpool thread never returns, and you see a stack trace like this one:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 458, in fire_timers
    timer()
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 58, in __call__
    cb(*args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/semaphore.py", line 147, in _do_acquire
    waiter.switch()
error: cannot switch to a different thread

This is because many logging handlers have mutexes that are threading._RLock objects, and Eventlet's replacement for thread.allocate_lock returns an eventlet.semaphore.Semaphore object, which does not work across different hubs in different pthreads.

Here's a small script to reproduce the issue:

#!/usr/bin/env python
#
# Demonstrates the crash with logging across pthreads
import eventlet.patcher
import eventlet.tpool
import logging
import random
import sys
import time

eventlet.patcher.monkey_patch()

logger = logging.getLogger("logger-test")
# This handler's .lock is a threading._RLock
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.setLevel(logging.DEBUG)


def log_n_times(me, n):
    for x in range(n):
        logger.info("%s %d", me, x)
        time.sleep(random.random() * 0.01)


logger.info("starting")

greenthread = eventlet.spawn(log_n_times, 'greenthread', 50)
eventlet.tpool.execute(log_n_times, 'pthread', 50)
greenthread.wait()

logger.info("done")

The bug was originally found in Openstack Swift, and there's a better writeup at https://bugs.launchpad.net/swift/+bug/1710328 and a commit at openstack/swift@6d16079 that fixes the problem, but only for Swift, and not in a general way.

I'd like to figure out how to fix this in general, but I'm not sure how to proceed. Using a pipe-based mutex for all _RLock objects would work, but would be a very expensive fix. Perhaps just the locks in logging handlers, because those are global?

The text was updated successfully, but these errors were encountered:

temoto · 2017-08-18T22:51:55Z

@smerritt thank you for this sad information.

Please try to replace your custom PipeMutex with eventlet.patcher.original('threading').Lock. If that works, I think we could include a custom green version of logging before actual problem is fixed.

Actual problem is, of course, green locks not working across OS threads.

smerritt · 2017-08-18T23:13:21Z

@temoto That produces correct results, but the performance is not good. My usual process is a WSGI server, so a bunch of greenthreads on the main OS thread, and a few other OS threads hidden inside eventlet.tpool. When a tpool thread has the lock and a main-OS-thread greenthread tries to get it, the whole main OS thread is blocked. The pipe gives me a file descriptor to wait on, so in this case, other greenthreads in the main OS thread can keep working.

temoto · 2017-08-18T23:25:22Z

Sorry, I didn't think enough before saying. Of course it will block all other greenthreads.

You know the tpool itself was using pipe for some time and then switched to local socket connection. I think with a little ugly creativity you could leverage that synchronisation.

lock = eventlet.tpool.Proxy(original_threading.Lock())

It's still slow as pipe, but at least you don't have to add a crutch pipemutex to code base.

smerritt · 2017-08-21T22:07:48Z

The proxied lock only works as long as you never run out of tpool threads.

Imagine you've got only two tpool threads, A and B, plus the main thread.

Thread A calls .acquire(), which goes into the tpool, and thread B picks it up, locks the lock, and goes back in the pool to do more work.

The next work item also needs the lock, so thread B calls .acquire(), the work item goes in the queue, and B blocks.

Thread A finishes with the lock, calls .release(), the work item goes in the queue, and A blocks.

temoto · 2017-08-22T03:23:56Z

Yes it's wasting tpool threads, but they're cheap and easy to increase.

smerritt · 2017-08-22T16:34:38Z

True, but you can exhaust any finite tpool.

Thread A has the lock. B wants it, calls acquire. Now A has it, B waits on C, C waits on the lock. Then D wants it: A has it, B waits on C, C waits on the lock, D waits on E, E waits on the lock.

You can fill up the tpool with these pretty quickly. All it takes is one thread to hold the lock for a long time.

temoto · 2017-08-22T18:14:54Z

@smerritt dear Sam, I'm confused, I would imagine A eventually release the lock and whole thing continues. It seems only harmful as starvation against other usages of tpool. But considering that tpool proxied lock was a dirty workaround in the first place, this hardly deserves our time?

Real solution is to make eventlet work across OS threads. Today I was thinking how to implement it, and probably general greenlet multiplexing on OS threads is a bit too complex for now. But special treatment of synchronisation primitives seems doable.

smerritt · 2017-08-22T18:24:39Z

If all the tpool threads are occupied, then the proxied call to release() will never happen, or at least that's what I thought. Perhaps if acquire() is proxied but release() is not, then it would all work.

You are, of course, correct that the answer is to make eventlet semaphores work across OS threads. I'm afraid I don't currently have any useful ideas to offer up in that domain.

Since change I1f1d9c0d6e3f04f1ecd5ef7c5d813005ee116409 we are running parts of the backups on native threads, which due to an eventlet bug [1] have bad interactions with greenthreads, so we have to avoid any logging when executing code in a native thread. This patch removes the MD5 logging on the SwiftObjectWriter close method and adds comments and docstring referring to this limitation. [1] eventlet/eventlet#432 Closes-Bug: #1745168 Change-Id: I0857cecd7d8ab0ee7e3e9bd6e15f4987ede4d653

Since change I1f1d9c0d6e3f04f1ecd5ef7c5d813005ee116409 we are running parts of the backups on native threads, which due to an eventlet bug [1] have bad interactions with greenthreads, so we have to avoid any logging when executing code in a native thread. This patch removes the MD5 logging on the SwiftObjectWriter close method and adds comments and docstring referring to this limitation. [1] eventlet/eventlet#432 Closes-Bug: #1745168 Change-Id: I0857cecd7d8ab0ee7e3e9bd6e15f4987ede4d653 (cherry picked from commit c6cb84b)

Since change I1f1d9c0d6e3f04f1ecd5ef7c5d813005ee116409 we are running parts of the backups on native threads, which due to an eventlet bug [1] have bad interactions with greenthreads, so we have to avoid any logging when executing code in a native thread. This patch removes the MD5 logging on the SwiftObjectWriter close method and adds comments and docstring referring to this limitation. [1] eventlet/eventlet#432 Closes-Bug: #1745168 Change-Id: I0857cecd7d8ab0ee7e3e9bd6e15f4987ede4d653

This patch sets the log level for cinder backup process to WARNING because of a bug in eventlet as described here: eventlet/eventlet#432 Cinder volume doesn't have this problem, because it uses tooz locks everywhere. Change-Id: I96c1e61c442d9fd3ff2e016ede1b3b19ab4ba171

As of now there no solution to the issue where thread is getting stuck in eventlet. Few other similar incidents and without proper solution: https://bugs.launchpad.net/cinder/+bug/1694509 eventlet/eventlet#432 eventlet/eventlet#492 eventlet/eventlet#395 Change-Id: Ib278780ccb20b9cbef50f54ba1a1ad33761c8002 closes-bug: #1742729

As of now there no solution to the issue where thread is getting stuck in eventlet. Few other similar incidents and without proper solution: https://bugs.launchpad.net/cinder/+bug/1694509 eventlet/eventlet#432 eventlet/eventlet#492 eventlet/eventlet#395 Originally was taken from: https://review.openstack.org/#/c/613023/1 Change-Id: Ic924f0ef0cb632b2439dfb7d1092bebf54adb863 closes-bug: #1742729

hemna · 2021-03-18T16:48:19Z

This is still an open issue?

temoto · 2021-03-18T19:07:44Z

Reproduction script still fails, yes.

* Update oslo.log from branch 'master' to 94b9dc32ec1f52a582adbd97fe2847f7c87d6c17 - Fix logging in eventlet native threads There is a bug in eventlet where logging within a native thread can lead to a deadlock situation: eventlet/eventlet#432 When encountered with this issue some projects in OpenStack using oslo.log, eg. Cinder, resolve them by removing any logging withing native threads. There is actually a better approach. The Swift team came up with a solution a long time ago [1], and in this patch that fix is included as part of the setup method, but will only be run if the eventlet library has already been loaded. This patch adds the eventlet library as a testing dependency for the PipeMutext unit tests. [1]: https://opendev.org/openstack/swift/commit/69c715c505cf9e5df29dc1dff2fa1a4847471cb6 Closes-Bug: #1983863 Change-Id: Iac1b0891ae584ce4b95964e6cdc0ff2483a4e57d

There is a bug in eventlet where logging within a native thread can lead to a deadlock situation: eventlet/eventlet#432 When encountered with this issue some projects in OpenStack using oslo.log, eg. Cinder, resolve them by removing any logging withing native threads. There is actually a better approach. The Swift team came up with a solution a long time ago [1], and in this patch that fix is included as part of the setup method, but will only be run if the eventlet library has already been loaded. This patch adds the eventlet library as a testing dependency for the PipeMutext unit tests. [1]: https://opendev.org/openstack/swift/commit/69c715c505cf9e5df29dc1dff2fa1a4847471cb6 Closes-Bug: #1983863 Change-Id: Iac1b0891ae584ce4b95964e6cdc0ff2483a4e57d

Cinder services as deployed by the operator just hang and will enter an unending loop of kill and restart due to the Liveness probes. What we see in the container logs differ from the cinder-api to the other containers. In the cinder-api we just see that it stops responding to requests, and on the other services we see this exception: Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/eventlet/hubs/hub.py", line 476, in fire_timers timer() File "/usr/lib/python3.9/site-packages/eventlet/hubs/timer.py", line 59, in __call__ cb(*args, **kw) File "/usr/lib/python3.9/site-packages/eventlet/semaphore.py", line 152, in _do_acquire waiter.switch() greenlet.error: cannot switch to a different thread In both cases the issue is the same, there is some logging happening on a native thread and this is creating problems for eventlet, to the point where it hangs. This is a known bug in eventlet [1], one which I recently fixed in Oslo-Log [2]. Since this is not fixed in all OpenStack releases, certainly not the one this operator is currently using, we need to be careful with what we actually enable for logging. The logging we currently have enables debugging for EVERYTHING (rabbit, sqlalchemy, oslo libraries, etc.), regardless of what we set in the `debug` option and `default_log_levels` in `cinder.conf`. This logging override is done via the `logging.conf` file and creates the problem of the native thread logging. Using the `logging.conf` file diverges from the approach we want for the Cinder Operator, where we try to make the configuration of the Cinder services with the operator be as close as possible to a manual Cinder service configuration. This patch removes the usage of the `logging.conf` file by the operator and uses the `cinder.conf` template to set the right logging configuration options. We set `log_file = /dev/stdout` in `cinder.conf` instead of the usual `log_file =` because then Cinder services would log to `stderr` and make `httpd` on the cinder-api container treat all Cinder-API logs as errors, prepending additional information to every single cinder log message, like this: Thu Sep 08 08:21:36.404638 2022] [wsgi:error] [pid 15:tid 69] (sqlalchemy.orm.mapper.Mapper): 2022-09-08 08:21:36,404 INFO [1]: eventlet/eventlet#432 [2]: https://review.opendev.org/c/openstack/oslo.log/+/852443

Cinder services as deployed by the operator just hang and will enter an unending loop of kill and restart due to the Liveness probes. What we see in the container logs differ from the cinder-api to the other containers. In the cinder-api we just see that it stops responding to requests, and on the other services we see this exception: Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/eventlet/hubs/hub.py", line 476, in fire_timers timer() File "/usr/lib/python3.9/site-packages/eventlet/hubs/timer.py", line 59, in __call__ cb(*args, **kw) File "/usr/lib/python3.9/site-packages/eventlet/semaphore.py", line 152, in _do_acquire waiter.switch() greenlet.error: cannot switch to a different thread In both cases the issue is the same, there is some logging happening on a native thread and this is creating problems for eventlet, to the point where it hangs. This is a known bug in eventlet [1], one which I recently fixed in Oslo-Log [2]. Since this is not fixed in all OpenStack releases, certainly not the one this operator is currently using, we need to be careful with what we actually enable for logging. The logging we currently have enables debugging for EVERYTHING (rabbit, sqlalchemy, oslo libraries, etc.), regardless of what we set in the `debug` option and `default_log_levels` in `cinder.conf`. This logging override is done via the `logging.conf` file and creates the problem of the native thread logging. Using the `logging.conf` file diverges from the approach we want for the Cinder Operator, where we try to make the configuration of the Cinder services with the operator be as close as possible to a manual Cinder service configuration. This patch removes the usage of the `logging.conf` file by the operator and uses the `cinder.conf` template to set the right logging configuration options. We set `log_file = /dev/stdout` in `cinder.conf` instead of the usual `log_file =` because then Cinder services would log to `stderr` and make `httpd` on the cinder-api container treat all Cinder-API logs as errors, prepending additional information to every single cinder log message, like this: Thu Sep 08 08:21:36.404638 2022] [wsgi:error] [pid 15:tid 69] (sqlalchemy.orm.mapper.Mapper): 2022-09-08 08:21:36,404 INFO References to the `logging.conf` file have been removed from CRD descriptions and other code locations. [1]: eventlet/eventlet#432 [2]: https://review.opendev.org/c/openstack/oslo.log/+/852443

ebolam · 2022-09-21T19:41:08Z

Still an issue. Running across this issue with loguru when running flask in eventlet.

fixes hanging thread due to eventlet/eventlet#432 which may get fixed for oslo.log in 5.0.1 with openstack/oslo.log@94b9dc3 (at the time of writing master in antelope cycle is constraint to 5.0.0)

frittentheke · 2024-03-07T12:07:00Z

This is still an open issue?

Reproduction script still fails, yes.

We are regularly running into this issue with different OpenStack components:

nova-compute[6888]: Traceback (most recent call last):
nova-compute[6888]:   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 476, in fire_timers
nova-compute[6888]:     timer()
nova-compute[6888]:   File "/usr/lib/python3/dist-packages/eventlet/hubs/timer.py", line 59, in __call__
nova-compute[6888]:     cb(*args, **kw)
nova-compute[6888]:   File "/usr/lib/python3/dist-packages/eventlet/semaphore.py", line 152, in _do_acquire
nova-compute[6888]:     waiter.switch()
nova-compute[6888]: greenlet.error: cannot switch to a different thread

cinder-backup[3262663]: /usr/lib/python3/dist-packages/cinder/db/sqlalchemy/models.py:152: SAWarning: implicitly coercing SELECT object to scalar subquery; please use the .scalar_subquery() method to produce a scalar sub>
cinder-backup[3262663]:   last_heartbeat = column_property(
cinder-backup[3262663]: /usr/lib/python3/dist-packages/cinder/db/sqlalchemy/models.py:160: SAWarning: implicitly coercing SELECT object to scalar subquery; please use the .scalar_subquery() method to produce a scalar sub>
cinder-backup[3262663]:   num_hosts = column_property(
cinder-backup[3262663]: /usr/lib/python3/dist-packages/cinder/db/sqlalchemy/models.py:169: SAWarning: implicitly coercing SELECT object to scalar subquery; please use the .scalar_subquery() method to produce a scalar sub>
cinder-backup[3262663]:   num_down_hosts = column_property(
cinder-backup[3262663]: Traceback (most recent call last):
cinder-backup[3262663]:   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 476, in fire_timers
cinder-backup[3262663]:     timer()
cinder-backup[3262663]:   File "/usr/lib/python3/dist-packages/eventlet/hubs/timer.py", line 59, in __call__
cinder-backup[3262663]:     cb(*args, **kw)
cinder-backup[3262663]:   File "/usr/lib/python3/dist-packages/eventlet/semaphore.py", line 152, in _do_acquire
cinder-backup[3262663]:     waiter.switch()
cinder-backup[3262663]: greenlet.error: cannot switch to a different thread

I understand that there was a fix done at least to the oslo.log https://review.opendev.org/c/openstack/oslo.log/+/852443 which is available with >5.3.x so the Bobcat release, which avoids running into this issue when logging.

Apart from fixing the root cause (if even possible or attempted) in eventlet or OpenStack migrating to asyncio (https://review.opendev.org/c/openstack/governance/+/902585) I was simply wondering if

a) Can the python process be made to crash / exit properly. Currently a process that runs into this issue becomes somewhat of a zombie. This makes recognizing this condition and triggering a restart (systemd, container runtime, ...) much more difficult.

b) If there is any more details / log that could be produced with this traceback to allow finding and fixing the calls that lead to the greenlet.error in the first place. I suppose there are more reasons this can happen than the one in oslo.log?

4383 · 2024-03-07T12:57:21Z

Hello @frittentheke,

Concerning "b", in short term perspective, I think a way to retrieve this kind of details could be to use the debug module of eventlet (https://eventlet.readthedocs.io/en/latest/modules/debug.html) and even maybe, when my patch will be merged and released (#926), to start an eventlet interactive backdoor on the process in trouble, and see what happens in the hub.

As this problem feels like a race condition issue, another short term option would be to use the ebpf/bcc deadlock module to identify where the deadlock is located. https://github.com/iovisor/bcc/blob/master/tools/deadlock_example.txt

Concerning "a" for now I've no response. I'll back later if I've things to share with you concerning that point.

frittentheke · 2024-03-08T09:11:56Z

Concerning "b", in short term perspective, I think a way to retrieve this kind of details could be to use the debug module of eventlet (https://eventlet.readthedocs.io/en/latest/modules/debug.html) and even maybe, when my patch will be merged and released (#926), to start an eventlet interactive backdoor on the process in trouble, and see what happens in the hub.
As this problem feels like a race condition issue, another short term option would be to use the ebpf/bcc deadlock module to identify where the deadlock is located. https://github.com/iovisor/bcc/blob/master/tools/deadlock_example.txt

Thanks @4383 for those ideas. While I understand the goal is to replace eventlet, it's still somewhat important to have it produce more debug information out of the box when reading this "deadlock" state. How else would someone be able to fix certain usage pattern if there is no indication which code paths caused it.

Concerning "a" for now I've no response. I'll back later if I've things to share with you concerning that point.

That would be awesome. Having processes or whole components not fail cleanly is the worst in distributed systems :-)

4383 · 2024-03-08T10:24:18Z

See if you can start an eventlet backdoor https://eventlet.readthedocs.io/en/latest/modules/backdoor.html

It would require a process restart, unfortunately I think you will loose the context of the bug, but you can wait to see if you reproduce it, and, then, jump into that backdoor for further investigations.

Our new maintenance policy is not against adding some debug capabilities. If you find useful info and new debug opportunities, then do not hesitate to propose a patch to share it with the community. We will be happy to review it and to propose it "out of the box".

On my side I'll try to find some spare time to play with the inital reproducer and see if it is possible to increase debug details to help developer catch up this kind of bug. But I should admit that's not my top priority for now.

There is a bug in eventlet where logging within a native thread can lead to a deadlock situation: eventlet/eventlet#432 When encountered with this issue some projects in OpenStack using oslo.log, eg. Cinder, resolve them by removing any logging withing native threads. There is actually a better approach. The Swift team came up with a solution a long time ago [1], and in this patch that fix is included as part of the setup method, but will only be run if the eventlet library has already been loaded. This patch adds the eventlet library as a testing dependency for the PipeMutext unit tests. [1]: https://opendev.org/openstack/swift/commit/69c715c505cf9e5df29dc1dff2fa1a4847471cb6 Closes-Bug: #1983863 Change-Id: Iac1b0891ae584ce4b95964e6cdc0ff2483a4e57d (cherry picked from commit 94b9dc3)

frittentheke · 2024-03-27T13:55:02Z

@4383 thanks again for your time and help!

Our new maintenance policy is not against adding some debug capabilities. If you find useful info and new debug opportunities, then do not hesitate to propose a patch to share it with the community. We will be happy to review it and to propose it "out of the box".

I would if I knew more about how eventlet works.

On my side I'll try to find some spare time to play with the inital reproducer and see if it is possible to increase debug details to help developer catch up this kind of bug. But I should admit that's not my top priority for now.

That would be awesome. If you kindly have a look at my comment https://bugs.launchpad.net/octavia/+bug/2039346/comments/14 about all the Openstack daemons we have throwing these greenlet.error: cannot switch to a different thread errors.

That launchpad bug is about an issue in oslo.log which apparently should not even exist in our Yoga release installation.
So we are again clueless what could be the cause.

4383 · 2024-03-27T14:05:49Z

@frittentheke: oh ok. So I didn't make the link between this oslo.log problem and your gthread problem. I don't know why oslo.log is not fixed on zed.

Just to be sure, you observed this behavior on yoga, exact?

4383 · 2024-03-27T14:16:27Z

Well, I think Openstack have many problems here:

oslo.log lack of the original fix on zed, yoga https://opendev.org/openstack/oslo.log/commit/94b9dc32ec1f52a582adbd97fe2847f7c87d6c17 Hence, leading you to observe the cannot switch non sense.
if this fix is applied to these stable branches, then, it should be followed by an other backport of https://review.opendev.org/c/openstack/oslo.log/+/914190 on these stable branches too.

In other words, you actually suffers from incomplete backports.

4383 · 2024-03-27T14:22:18Z

@frittentheke: I'd suggest to you to reach Daniel (damani), or Takashi (tkajinam) on the openstack oslo channel. I think they would be happy to help you to finalize these incomplete backports.

frittentheke · 2024-03-27T15:51:38Z

Just to be sure, you observed this behavior on yoga, exact?

Yes @4383, we run Yoga using Ubuntu Cloud Archive packages on 22.04 LTS.

But according to Takashi in https://bugs.launchpad.net/octavia/+bug/2039346/comments/10 the issue should not exist on Zed? Or is he mistaken and these fixed actually have be backported further?

4383 · 2024-03-27T15:58:03Z

I think the problem is that zed and yoga do not contains (at least) https://opendev.org/openstack/oslo.log/commit/94b9dc32ec1f52a582adbd97fe2847f7c87d6c17

The other patch is the fix to solve an other issue introduced by https://opendev.org/openstack/oslo.log/commit/94b9dc32ec1f52a582adbd97fe2847f7c87d6c17 (the same patch). But in all case IMO I think we need this fix and its parent fixes (the childrens).

4383 · 2024-03-27T16:03:02Z

So, in order, I think we these patches should be backported to zed and yoga:

SeanMooney · 2024-03-27T19:04:42Z

while that will work on the epolls hub it wont work on the asyncio hub as
this is not supported https://review.opendev.org/c/openstack/oslo.log/+/852443/1/oslo_log/pipe_mutex.py#60

calling eventlet.debug.hub_prevent_multiple_readers(False) raises

RuntimeError("Multiple readers are not yet supported by asyncio hub")

so while a backport could be useful for older release like zed and yoga

the pipemutext impplenation in oslo log is going to need to be updated for the new asyncio hub.

you can see an example of the failure message you will get if you try to use both together

https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_130/914108/5/check/tempest-full-py3/130576a/controller/logs/screen-n-api.txt

4383 · 2024-03-28T09:04:44Z

From an eventlet perspective we won't support that "multiple readers" nonsense (in the Asyncio hub):

#874

Openstack deliverables may have to consider using dup() with file descriptor (https://www.man7.org/linux/man-pages/man2/dup.2.html).

4383 · 2024-03-28T09:15:21Z

IMO this "multiple readers" feature comes from a bad design.
If Openstack deliverables are migrated to asyncio, the design will differ and I don't think we will have to rely on such functionality.

If oslo.log is migrated is async design would be refactored:

Surely would allowing us to bypass this "multiple readers" things.

4383 · 2024-03-28T09:46:26Z

I'd rather suggest that we rely on socket.fromfd and os.dup to move away from the "multiple readers" on Openstack:

Or something like that.

SeanMooney · 2024-03-28T12:46:54Z

Just keep in mind that migration to asynscio will take 3-4 release and druing that time we will need to support running with either hub. There has not been a community agreement to move to explicit async yet either.

If using dup and socket.fromfd can be hidden within oslo.log that is fine but if that would require change to the projects that use oslo.log that's problematic

frittentheke · 2024-03-28T13:29:56Z

There has not been a community agreement to move to explicit async yet either.

Discussion about this is at https://review.opendev.org/c/openstack/governance/+/902585

SeanMooney · 2024-03-28T13:48:37Z

Yep but that has not been approved and it may be rejected. There is a lot of work that needs to be done to socialise that proposal and get buy in from all the project that currently use eventlets. It's unlikely that projects like nova will invest time in adopting explicit async in 2024.2 until we have time to consider the detailed implementation aspect for our project. I may do some pocs, but one of the theme for this cycles ptg is likely to be completing ongoing work form last cycle and focusing on maintenance and tech debt. Changing the threading model does not fit with that theme.
With that said I'm looking forward to discussing this at the ptg.

4383 · 2024-03-28T15:15:27Z

Just keep in mind that migration to asynscio will take 3-4 release and druing that time we will need to support running with either hub. There has not been a community agreement to move to explicit async yet either.

If using dup and socket.fromfd can be hidden within oslo.log that is fine but if that would require change to the projects that use oslo.log that's problematic

I think we are all aware that the migration will take a couple of Openstack release, even possibly more than 4 releases...

If "multiple readers" hack come from libs, like oslo.log, then, I think it should be also possible to remove that hack at the lib level. It would be also possible to implement a kind of log feeder threads as Dan suggested on irc. Hence allowing top layers to enable the asyncio hub. This is the oslo.log use case.

Else, if the "multiple readers" hack is located at the service level, then this service could remains to use the epolls hub giving time to solve this problem at the service level. This is the swift use case.

4383 · 2024-03-28T15:20:06Z

Concerning the PTG, I won't be around during this period, so if a discussion happen I won't join that discussion. Feel free to trigger one. I could follow them asynchronously...

Concerning myself, I'm not convinced that a face to face discussion would allow to have a better and efficient discussion than the one made through write up and proposal. Written exchange leave more rooms for better understanding and thinking. That's my point of view.

4383 · 2024-03-28T15:21:39Z

The writings remain, the words fly away...

temoto added the importance-enhancement label Aug 18, 2017

tipabu mentioned this issue Jun 11, 2021

Add rudimentary support for Python 3.10 #715

Merged

mnaser mentioned this issue Dec 4, 2021

greenlet.error: cannot switch to a different thread (introduced in 0.29.1) #662

Open

Akrog mentioned this issue Sep 9, 2022

Fix Cinder services hanging openstack-k8s-operators/cinder-operator#12

Merged

samiemostafavi mentioned this issue Nov 18, 2022

zun_compute_k8s gets unhealthy with no errors ChameleonCloud/zun#15

Open

Carthaca mentioned this issue Dec 14, 2022

SAPCC: don't log in method called by eventlet.tpool.execute() sapcc/cinder#160

Merged

tipabu mentioned this issue Jan 3, 2024

An asyncio hub for eventlet #870

Merged

4383 added bug greenthreads labels Mar 27, 2024

4383 mentioned this issue Mar 27, 2024

Using the asyncio hub on Openstack lead to threading errors #948

Closed

4383 mentioned this issue Mar 28, 2024

Support multiple readers in the asyncio hub #874

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't log from within tpool.execute #432

Can't log from within tpool.execute #432

smerritt commented Aug 18, 2017

temoto commented Aug 18, 2017

smerritt commented Aug 18, 2017

temoto commented Aug 18, 2017 •

edited

smerritt commented Aug 21, 2017

temoto commented Aug 22, 2017

smerritt commented Aug 22, 2017

temoto commented Aug 22, 2017

smerritt commented Aug 22, 2017

hemna commented Mar 18, 2021

temoto commented Mar 18, 2021

ebolam commented Sep 21, 2022

frittentheke commented Mar 7, 2024 •

edited

4383 commented Mar 7, 2024 •

edited

frittentheke commented Mar 8, 2024

4383 commented Mar 8, 2024

frittentheke commented Mar 27, 2024

4383 commented Mar 27, 2024

4383 commented Mar 27, 2024 •

edited

4383 commented Mar 27, 2024

frittentheke commented Mar 27, 2024

4383 commented Mar 27, 2024 •

edited

4383 commented Mar 27, 2024

SeanMooney commented Mar 27, 2024 •

edited

4383 commented Mar 28, 2024 •

edited

4383 commented Mar 28, 2024

4383 commented Mar 28, 2024 •

edited

SeanMooney commented Mar 28, 2024

frittentheke commented Mar 28, 2024

SeanMooney commented Mar 28, 2024

4383 commented Mar 28, 2024 •

edited

4383 commented Mar 28, 2024 •

edited

4383 commented Mar 28, 2024 •

edited

Can't log from within tpool.execute #432

Can't log from within tpool.execute #432

Comments

smerritt commented Aug 18, 2017

temoto commented Aug 18, 2017

smerritt commented Aug 18, 2017

temoto commented Aug 18, 2017 • edited

smerritt commented Aug 21, 2017

temoto commented Aug 22, 2017

smerritt commented Aug 22, 2017

temoto commented Aug 22, 2017

smerritt commented Aug 22, 2017

hemna commented Mar 18, 2021

temoto commented Mar 18, 2021

ebolam commented Sep 21, 2022

frittentheke commented Mar 7, 2024 • edited

4383 commented Mar 7, 2024 • edited

frittentheke commented Mar 8, 2024

4383 commented Mar 8, 2024

frittentheke commented Mar 27, 2024

4383 commented Mar 27, 2024

4383 commented Mar 27, 2024 • edited

4383 commented Mar 27, 2024

frittentheke commented Mar 27, 2024

4383 commented Mar 27, 2024 • edited

4383 commented Mar 27, 2024

SeanMooney commented Mar 27, 2024 • edited

4383 commented Mar 28, 2024 • edited

4383 commented Mar 28, 2024

4383 commented Mar 28, 2024 • edited

SeanMooney commented Mar 28, 2024

frittentheke commented Mar 28, 2024

SeanMooney commented Mar 28, 2024

4383 commented Mar 28, 2024 • edited

4383 commented Mar 28, 2024 • edited

4383 commented Mar 28, 2024 • edited

temoto commented Aug 18, 2017 •

edited

frittentheke commented Mar 7, 2024 •

edited

4383 commented Mar 7, 2024 •

edited

4383 commented Mar 27, 2024 •

edited

4383 commented Mar 27, 2024 •

edited

SeanMooney commented Mar 27, 2024 •

edited

4383 commented Mar 28, 2024 •

edited

4383 commented Mar 28, 2024 •

edited

4383 commented Mar 28, 2024 •

edited

4383 commented Mar 28, 2024 •

edited

4383 commented Mar 28, 2024 •

edited