Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-thread locks can starve if there is a hub #1744

Open
jamadden opened this issue Jan 13, 2021 · 1 comment
Open

Cross-thread locks can starve if there is a hub #1744

jamadden opened this issue Jan 13, 2021 · 1 comment
Labels
Type: Bug Identified as a bug; needs a code change to fix Type: Enhancement We can do better through adding this

Comments

@jamadden
Copy link
Member

jamadden commented Jan 13, 2021

(This came up while I was pondering and testing #1735.)

Consider this code, which has one native thread spinning on acquiring and releasing a lock, while another thread just wants to acquire it once. The spinning thread is started before the other thread.

import sys
from threading import Event
from threading import Thread
from threading import Lock

from gevent.lock import Semaphore
from gevent import get_hub

LockFactory = Semaphore
HUB_IN_BG = False
if '--hub' in sys.argv:
    HUB_IN_BG = True
if '--native' in sys.argv:
    LockFactory = Lock

lock = LockFactory()
bg_acquired = Event()
spin_running = Event()
spin_count = 0

def acquires():
    if HUB_IN_BG:
        get_hub()
    lock.acquire()
    bg_acquired.set()
    lock.release()


def spin():
    if HUB_IN_BG:
        get_hub()
    global spin_count
    with lock:
        spin_running.set()

    while 1:
        spin_count += 1
        with lock:

            if bg_acquired.is_set():
                break

spin_thread = Thread(target=spin)
spin_thread.daemon = True
spin_thread.start()
spin_running.wait()

bg_thread = Thread(target=acquires)
bg_thread.daemon = True
bg_thread.start()

bg_acquired.wait()

print("Bg got the lock after", spin_count, "iterations", lock)

When run directly, it works:

$ python /tmp/lockt.py
Bg got the lock after 26305 iterations <Semaphore at 0x1014bf450 counter=1 _links[0]>

When run using native locks, it works:

$ python /tmp/lockt.py --native
Bg got the lock after 34030 iterations <unlocked _thread.lock object at 0x1010e4e80>

However, as soon as one or both of the background threads gets a gevent hub, it fails; the spinning thread monopolizes the lock and the parked thread never gets a chance:

$ python /tmp/lockt.py —hub
<time passes>
^CTraceback (most recent call last):
  File "/tmp/lockt.py", line 58, in <module>
    bg_acquired.wait()
  File "//3.9/lib/python3.9/threading.py", line 574, in wait
    signaled = self._cond.wait(timeout)
  File "/3.9/lib/python3.9/threading.py", line 312, in wait
    waiter.acquire()
KeyboardInterrupt

(The vast majority of the time, it just spins forever. Sometimes, rarely, it actually does appear to work; I think this is a race condition in the startup code.)

This is due to how the thread parks itself when it has a hub, and this happens if just one thread (either one!) has a hub, or they both do.

With the work in #1743, we're probably close to being able to handle this.

@jamadden jamadden added Type: Bug Identified as a bug; needs a code change to fix Type: Enhancement We can do better through adding this labels Jan 13, 2021
@jamadden
Copy link
Member Author

jamadden commented Jan 13, 2021

The direct problem here is that spin() is never yielding to the hub, so the callback that was scheduled and which would serve to awaken the acquires() thread never gets a chance to run. The fact that it works if neither thread has a hub is nice, but was a red-herring that prevented me from noticing the obvious problem.

Obvious? I'm not sure, especially as it's also different from how the standard library behaves. It seems perfectly reasonable to assume that lock.release() might yield to the hub. It doesn't because Semaphore is optimized for gevent's real use-case of a single thread and greenlets that by definition must already be yielding to the hub at some point to make progress. Thus, release() is very simple in the common case. Even greenlets that repeatedly acquire and release the lock without yielding to the hub will block other greenlets in the same thread from making progress—monkey-patching the example to use greenlets still demonstrates the issue.

I feel like this has come up before but haven't searched the issues yet...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Identified as a bug; needs a code change to fix Type: Enhancement We can do better through adding this
Projects
None yet
Development

No branches or pull requests

1 participant