Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential segmentation fault when subtyping greenlet #245

Closed
fygao-wish opened this issue Jul 1, 2021 · 3 comments
Closed

Potential segmentation fault when subtyping greenlet #245

fygao-wish opened this issue Jul 1, 2021 · 3 comments

Comments

@fygao-wish
Copy link
Contributor

fygao-wish commented Jul 1, 2021

Hi,

I recently ran into some weird errors when running an application with subtyped greenlet.
Some example error messages:

  • Fatal Python error: GC object already tracked
  • AttributeError: run
  • Segmentation fault

After some digging, I found the root cause is that when a greenlet is resurrected, it keeps the object alive, but the subtype is decrefed. Similar issue here: https://bugs.python.org/issue8212

Here's a snippet to repro the problem:

from greenlet import getcurrent, greenlet
from concurrent.futures import ThreadPoolExecutor
import sys
import time

class MyGreenlet(greenlet):
    def __init__(self, *args, **kwargs):
        super(MyGreenlet, self).__init__(*args, **kwargs)

    def switch(self):
        super(MyGreenlet, self).switch()

TPOOL = ThreadPoolExecutor(max_workers=1)
# In main thread, we start greenlets and save them in the glets array
# In order to trigger greenlet resurrect easily, we start another thread to clear the glets array
glets = []

def clean_greenlets():
    while True:
        time.sleep(1)
        if glets:
            glets.clear()

# start a thread to clear the glets list
TPOOL.submit(clean_greenlets)

def gr_run():
    while True:
        time.sleep(0.10)
        print("{}".format(sys.getrefcount(MyGreenlet)))
        getcurrent().parent.switch()

while True:
    glet = MyGreenlet(gr_run)
    glets.append(glet)
    glet.switch()
fygao-wish added a commit to fygao-wish/greenlet that referenced this issue Jul 4, 2021
@jamadden
Copy link
Contributor

This turns out to be quite a kettle of fish, and it's not exactly "resurrection" as considered from the Python level — you can't provoke this with __del__. Here's what's going on.

If the last reference to a greenlet goes away while the greenlet is still running, then a GreenletExit exception is raised in the greenlet so that it can do cleanup (execute finally blocks, or __exit__ of context managers, deallocate local variables, etc). So far so good.

However, if some other thread besides the thread the greenlet belongs to (runs in) is the thread that deletes that final reference, we can't raise the GreenletExit in the soon-to-die greenlet: Just as you can't switch() to a greenlet on another thread, you can't raise an exception in a greenlet on another thread.

So to workaround this, when we discover this situation, greenlet internally "resurrects" the dyeing greenlet into a thread-local list, local to the list running the greenlet and calls it good. It is no longer reachable from Python, and we'll actually handle the cleanup at sometime in the future.

That time in the future is when the dyeing greenlet's thread is again running, and somebody calls one of several greenlet APIs — getcurrent() being the one used here. Every time those APIs are called, greenlet checks the thread-local list to see if there are any suspended greenlets waiting to have their GreenletExit exception raised, and if so, raises that in what is now the correct thread.

This approach has some bugs and issues.

The first is the reference count issue with the type, as you discovered. Here's a simplified example that doesn't involve any nesting or indeterminate sleeps:

from __future__ import print_function
import sys

from threading import Thread
from threading import Event

from greenlet import getcurrent, greenlet


class MyGreenlet(greenlet):
    pass


glets = []
ref_cleared = Event()

def greenlet_main():
    getcurrent().parent.switch()

def thread_main(greenlet_running_event):
    mine = MyGreenlet(greenlet_main)
    glets.append(mine)
    # The greenlets being deleted must be active
    mine.switch()
    # Don't keep any reference to it in this thread
    del mine
    # Let main know we published our greenlet.
    greenlet_running_event.set()
    # Wait for main to let us know the references are
    # gone and the greenlet objects no longer reachable
    ref_cleared.wait()
    # The creating thread must call getcurrent() (or a few other
    # greenlet APIs) because that's when the thread-local list of dead
    # greenlets gets cleared.
    getcurrent()

# We start with 3 references to the subclass:
# - This module
# - Its __mro__
# - The __subclassess__ attribute of greenlet
# - (If we call gc.get_referents(), we find four entries, including
#   some other tuple ``(greenlet)`` that I'm not sure about but must be part
#   of the machinery.)
#
# On Python 3 it's enough to just run 3 threads; on Python 2.7,
# more threads are needed, and the results are still
# non-deterministic. Presumably the memory layouts are different
thread_ready_events = []
for _ in range(
        sys.getrefcount(MyGreenlet) + (
            15 if sys.version_info[0] < 3 else 0)
):
    event = Event()
    thread = Thread(target=thread_main, args=(event,))
    thread_ready_events.append(event)
    thread.start()


for done_event in thread_ready_events:
    done_event.wait()


del glets[:]
ref_cleared.set()
# Let any other thread run; it will crash the interpreter. A
# ``time.sleep()`` will also work.
with open('/dev/null', 'wb') as f:
    f.write(b'foobar')

The second is that there is an extra reference to that thread-local list, preventing it from ever being deallocated. This means that any time we'd need to delete a greenlet in another thread, we at least leak one list object when that thread exits. That's a straightforward bug to fix.

The third is that if the thread that the dying greenlet is running in doesn't call any of the greenlet APIs that perform this "garbage collection" before it exits (and after the last reference was cleared by some other thread), the to-be-killed greenlets are left in limbo and never deallocated, nor is anything they reference through their attributes, on their stacks, etc. They stay in that list indefinitely, and the list is leaked. This is true even if the above bug is fixed, because there is a reference cycle (thread locals → dead greenlet list → a greenlet → thread locals), and, even though greenlet objects participate in GC and visit the thread locals, they only do this when they are the actively running greenlet. The suspended greenlets in the dead greenlet list thus get skipped over.

@jamadden
Copy link
Contributor

jamadden commented Jul 22, 2021

The type reference counting issue can also be reproduce independently of the thread-local list, or indeed, threads at all. This causes a crash (reliably on Python 3.10, less reliably on earlier versions):

from greenlet import getcurrent
from greenlet import greenlet
from greenlet import GreenletExit

class Greenlet(greenlet):
    pass

glets = []

def greenlet_main():
    try:
        getcurrent().parent.switch()
    except GreenletExit:
        glets.append(getcurrent())


for _ in range(10):
    Greenlet(greenlet_main).switch()

del glets

jamadden pushed a commit that referenced this issue Aug 6, 2021
jamadden added a commit that referenced this issue Aug 6, 2021
jamadden added a commit that referenced this issue Aug 6, 2021
@jamadden
Copy link
Contributor

jamadden commented Aug 6, 2021

Thanks for opening this issue and the PR!

The crash is fixed. The remaining things this investigation revealed and which still need addressed have been moved to #251 and #252.

@jamadden jamadden closed this as completed Aug 6, 2021
kraj pushed a commit to YoeDistro/meta-openembedded that referenced this issue Oct 27, 2021
1.1.2 (2021-09-29)
==================

- Fix a potential crash due to a reference counting error when Python
  subclasses of ``greenlet.greenlet`` were deallocated. The crash
  became more common on Python 3.10; on earlier versions, silent
  memory corruption could result. See `issue 245
  <https://github.com/python-greenlet/greenlet/issues/245>`_. Patch by
  fygao-wish.
- Fix a leak of a list object when the last reference to a greenlet
  was deleted from some other thread than the one to which it
  belonged. For this to work correctly, you must call a greenlet API
  like ``getcurrent()`` before the thread owning the greenlet exits:
  this is a long-standing limitation that can also lead to the leak of
  a thread's main greenlet if not called; we hope to lift this
  limitation. Note that in some cases this may also fix leaks of
  greenlet objects themselves. See `issue 251
  <https://github.com/python-greenlet/greenlet/issues/251>`_.
- Python 3.10: Tracing or profiling into a spawned greenlet didn't
  work as expected. See `issue 256
  <https://github.com/python-greenlet/greenlet/issues/256>`_, reported
  by Joe Rickerby.

Signed-off-by: Zheng Ruoqin <zhengrq.fnst@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
kraj pushed a commit to YoeDistro/meta-openembedded that referenced this issue Oct 27, 2021
1.1.2 (2021-09-29)
==================

- Fix a potential crash due to a reference counting error when Python
  subclasses of ``greenlet.greenlet`` were deallocated. The crash
  became more common on Python 3.10; on earlier versions, silent
  memory corruption could result. See `issue 245
  <https://github.com/python-greenlet/greenlet/issues/245>`_. Patch by
  fygao-wish.
- Fix a leak of a list object when the last reference to a greenlet
  was deleted from some other thread than the one to which it
  belonged. For this to work correctly, you must call a greenlet API
  like ``getcurrent()`` before the thread owning the greenlet exits:
  this is a long-standing limitation that can also lead to the leak of
  a thread's main greenlet if not called; we hope to lift this
  limitation. Note that in some cases this may also fix leaks of
  greenlet objects themselves. See `issue 251
  <https://github.com/python-greenlet/greenlet/issues/251>`_.
- Python 3.10: Tracing or profiling into a spawned greenlet didn't
  work as expected. See `issue 256
  <https://github.com/python-greenlet/greenlet/issues/256>`_, reported
  by Joe Rickerby.

Signed-off-by: Zheng Ruoqin <zhengrq.fnst@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
halstead pushed a commit to openembedded/meta-openembedded that referenced this issue Oct 28, 2021
1.1.2 (2021-09-29)
==================

- Fix a potential crash due to a reference counting error when Python
  subclasses of ``greenlet.greenlet`` were deallocated. The crash
  became more common on Python 3.10; on earlier versions, silent
  memory corruption could result. See `issue 245
  <https://github.com/python-greenlet/greenlet/issues/245>`_. Patch by
  fygao-wish.
- Fix a leak of a list object when the last reference to a greenlet
  was deleted from some other thread than the one to which it
  belonged. For this to work correctly, you must call a greenlet API
  like ``getcurrent()`` before the thread owning the greenlet exits:
  this is a long-standing limitation that can also lead to the leak of
  a thread's main greenlet if not called; we hope to lift this
  limitation. Note that in some cases this may also fix leaks of
  greenlet objects themselves. See `issue 251
  <https://github.com/python-greenlet/greenlet/issues/251>`_.
- Python 3.10: Tracing or profiling into a spawned greenlet didn't
  work as expected. See `issue 256
  <https://github.com/python-greenlet/greenlet/issues/256>`_, reported
  by Joe Rickerby.

Signed-off-by: Zheng Ruoqin <zhengrq.fnst@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Trevor Gamblin <trevor.gamblin@windriver.com>
HerrMuellerluedenscheid pushed a commit to HerrMuellerluedenscheid/meta-openembedded that referenced this issue Jan 16, 2022
1.1.2 (2021-09-29)
==================

- Fix a potential crash due to a reference counting error when Python
  subclasses of ``greenlet.greenlet`` were deallocated. The crash
  became more common on Python 3.10; on earlier versions, silent
  memory corruption could result. See `issue 245
  <https://github.com/python-greenlet/greenlet/issues/245>`_. Patch by
  fygao-wish.
- Fix a leak of a list object when the last reference to a greenlet
  was deleted from some other thread than the one to which it
  belonged. For this to work correctly, you must call a greenlet API
  like ``getcurrent()`` before the thread owning the greenlet exits:
  this is a long-standing limitation that can also lead to the leak of
  a thread's main greenlet if not called; we hope to lift this
  limitation. Note that in some cases this may also fix leaks of
  greenlet objects themselves. See `issue 251
  <https://github.com/python-greenlet/greenlet/issues/251>`_.
- Python 3.10: Tracing or profiling into a spawned greenlet didn't
  work as expected. See `issue 256
  <https://github.com/python-greenlet/greenlet/issues/256>`_, reported
  by Joe Rickerby.

Signed-off-by: Zheng Ruoqin <zhengrq.fnst@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Trevor Gamblin <trevor.gamblin@windriver.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants