Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #392: Port to Python 3.13 #396

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

vstinner
Copy link
Contributor

  • Replace C_RECURSION_LIMIT with Py_C_RECURSION_LIMIT.
  • Add Py_C_RECURSION_LIMIT for Python 3.12 and older.
  • Disable GREENLET_USE_CFRAME on Python 3.13.
  • Define Py_BUILD_CORE to include pycore_frame.h.

* Replace C_RECURSION_LIMIT with Py_C_RECURSION_LIMIT.
* Add Py_C_RECURSION_LIMIT for Python 3.12 and older.
* Disable GREENLET_USE_CFRAME on Python 3.13.
* Define Py_BUILD_CORE to include pycore_frame.h.
@vstinner
Copy link
Contributor Author

The second commit adds a Python 3.13 CI to GitHub Actions. It's to prove that my change works as expected. I can easily remove that change if you want to drop the 3.13 test for now.

Well, IMO it's good to test as soon as possible ;-)

@vstinner
Copy link
Contributor Author

Ah. While "tox -e py313" pass locally, it fails on the GitHub Action:

test_break_ctxvars (greenlet.tests.test_contextvars.ContextVarsTests.test_break_ctxvars) ... Fatal Python error: Segmentation fault
Current thread 0x00007fd2a2a61b80 (most recent call first):
  File "/home/runner/work/greenlet/greenlet/src/greenlet/tests/leakcheck.py", line 212 in _run_test
  File "/home/runner/work/greenlet/greenlet/src/greenlet/tests/leakcheck.py", line 295 in __call__
  File "/home/runner/work/greenlet/greenlet/src/greenlet/tests/leakcheck.py", line 317 in wrapper
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/case.py", line 589 in _callTestMethod
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/case.py", line 636 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/case.py", line 692 in __call__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 122 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 84 in __call__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 122 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 84 in __call__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 122 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 84 in __call__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/runner.py", line 240 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/main.py", line 270 in runTests
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/main.py", line 104 in __init__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/__main__.py", line 18 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main
Extension modules: greenlet._greenlet, psutil._psutil_linux, psutil._psutil_posix, greenlet.tests._test_extension_cpp, greenlet.tests._test_extension (total: 5)
/home/runner/work/_temp/7f62c942-1552-419a-bf75-ca42301e321a.sh: line 3:  1982 Segmentation fault      (core dumped) python -m unittest discover -v greenlet.tests
Error: Process completed with exit code 139.

I'm not sure about this change:

+  #if GREENLET_USE_CFRAME
     tstate->cframe->current_frame = this->current_frame;
+  #endif

We should save/restore the current_frame, but I'm not sure how.

@jamadden
Copy link
Contributor

Thanks for looking into this @vstinner ! I haven't had any time to keep up with Python 3.13 changes, but I can at least offer some insight into that crash. This is because that same test crashes immediately for me on my M2 Mac. Assuming the crash is the same, here's the backtrace:

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0xdddddddddddddde5 -> 0xffffdddddddddde5 (possible pointer authentication failure)
Exception Codes:       0x0000000000000001, 0xdddddddddddddde5

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   Python [25867]

VM Region Info: 0xffffdddddddddde5 is not in any region.  
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      UNUSED SPACE AT END

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	       0x1877c60dc __pthread_kill + 8
1   libsystem_pthread.dylib       	       0x1877fdcc0 pthread_kill + 288
2   libsystem_c.dylib             	       0x1876d2540 raise + 32
3   Python                        	       0x103a15410 faulthandler_fatal_error + 412
4   libsystem_platform.dylib      	       0x18782da24 _sigtramp + 56
5   _greenlet.cpython-313-darwin.so	       0x1037cf910 greenlet::PythonState::operator>>(_ts*) + 56 (TPythonState.cpp:205) [inlined]
6   _greenlet.cpython-313-darwin.so	       0x1037cf910 greenlet::Greenlet::g_switchstack_success() + 88 (TGreenlet.cpp:133)
7   _greenlet.cpython-313-darwin.so	       0x1037cb0c4 greenlet::Greenlet::g_switchstack() + 288 (TGreenlet.cpp:210)
8   _greenlet.cpython-313-darwin.so	       0x1037cfaa8 greenlet::MainGreenlet::g_switch() + 48 (TMainGreenlet.cpp:100)
9   _greenlet.cpython-313-darwin.so	       0x1037d0cc4 green_switch(_greenlet*, _object*, _object*) + 268 (greenlet.cpp:530)
10  Python                        	       0x1038e27b8 cfunction_call + 72
11  Python                        	       0x103890124 _PyObject_MakeTpCall + 128
12  Python                        	       0x1039b5f80 context_run + 104
13  Python                        	       0x1038e1ee8 cfunction_vectorcall_FASTCALL_KEYWORDS.llvm.10847317658053313408 + 92
14  _greenlet.cpython-313-darwin.so	       0x1037cce54 greenlet::UserGreenlet::inner_bootstrap(_greenlet*, _object*) + 264 (TUserGreenlet.cpp:464)
15  _greenlet.cpython-313-darwin.so	       0x1037cc6c8 greenlet::UserGreenlet::g_initialstub(void*) + 1364 (TUserGreenlet.cpp:311)
16  _greenlet.cpython-313-darwin.so	       0x1037cf298 greenlet::UserGreenlet::g_switch() + 136 (TUserGreenlet.cpp:179)

The invalid address causing the segfault is interesting...

The crashing code in question:

#if GREENLET_PY311
  #if GREENLET_PY312
    tstate->py_recursion_remaining = tstate->py_recursion_limit - this->py_recursion_depth;
    tstate->c_recursion_remaining = Py_C_RECURSION_LIMIT - this->c_recursion_depth;
    this->unexpose_frames(); // <- CRASH; line 205
  #else // \/ 3.11

unexpose_frames was new for 3.12:

#if GREENLET_PY312
void GREENLET_NOINLINE(PythonState::unexpose_frames)()
{
    if (!this->top_frame()) {
        return;
    }

    // See GreenletState::expose_frames() and the comment on frames_were_exposed
    // for more information about this logic.
    _PyInterpreterFrame *iframe = this->_top_frame->f_frame;
    while (iframe != nullptr) {
        _PyInterpreterFrame *prev_exposed = iframe->previous;
        assert(iframe->frame_obj);
        memcpy(&iframe->previous, &iframe->frame_obj->_f_frame_data[0],
               sizeof(void *));
        iframe = prev_exposed;
    }
}
#else
void PythonState::unexpose_frames()
{}
#endif      

IIRC, this all has to do with the way frames are "lazily" allocated now. We have to turn them into real frames so that we can get the complete stacks of suspended greenlets; that was added in greenlet 3.0.3 (previously, I'd just punted on that problem).

I didn't write much of expose_frames; it was generously contributed in b5f9d23

If I disable that functionality by ifdefing out expose/unexpose_frames I get a little farther, but still crash; this time in ContextVarsTests.test_context_not_propagated:

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x000000010489c020
Exception Codes:       0x0000000000000001, 0x000000010489c020

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [26401]

VM Region Info: 0x10489c020 is not in any region.  Bytes after previous region: 33  Bytes before following region: 32736
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      shared memory               104898000-10489c000    [   16K] r--/r-- SM=SHM  
--->  GAP OF 0x8000 BYTES
      VM_ALLOCATE                 1048a4000-1048ac000    [   32K] rw-/rwx SM=PRV  

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   Python                        	       0x105209420 dump_frame + 24
1   Python                        	       0x105208a04 dump_traceback + 116
2   Python                        	       0x105208b88 _Py_DumpTracebackThreads + 324
3   Python                        	       0x105219498 faulthandler_dump_traceback + 84
4   Python                        	       0x1052193f4 faulthandler_fatal_error + 384
5   libsystem_platform.dylib      	       0x18782da24 _sigtramp + 56
6   _greenlet.cpython-313-darwin.so	       0x104fcf3dc greenlet::PythonState::operator<<(_ts const*) + 104 (TPythonState.cpp:144)
7   _greenlet.cpython-313-darwin.so	       0x104fcf3dc greenlet::PythonState::operator<<(_ts const*) + 104 (TPythonState.cpp:144)
8   _greenlet.cpython-313-darwin.so	       0x104fcf25c greenlet::Greenlet::g_switchstack() + 144 (TGreenlet.cpp:167)
9   _greenlet.cpython-313-darwin.so	       0x104fd3ac8 greenlet::MainGreenlet::g_switch() + 48 (TMainGreenlet.cpp:100)
10  _greenlet.cpython-313-darwin.so	       0x104fd1234 greenlet::UserGreenlet::inner_bootstrap(_greenlet*, _object*) + 1396 (TUserGreenlet.cpp:553)
11  _greenlet.cpython-313-darwin.so	       0x104fd063c greenlet::UserGreenlet::g_initialstub(void*) + 1364 (TUserGreenlet.cpp:311)
12  _greenlet.cpython-313-darwin.so	       0x104fd32c8 greenlet::UserGreenlet::g_switch() + 136 (TUserGreenlet.cpp:179)
13  _greenlet.cpython-313-darwin.so	       0x104fd4ce4 green_switch(_greenlet*, _object*, _object*) + 268 (greenlet.cpp:530)

@vstinner
Copy link
Contributor Author

PythonState::unexpose_frames() is called with this->_top_frame->f_frame = 0xdddddddddddddddd: the struct _frame memory was freed, so reading a member gives 0xdddddddddddddddd pointer. this->_top_frame doesn't "own" the memory. It's just a raw pointer to something managed internally by Python.

I don't know how to move forward on this issue :-(

@vstinner
Copy link
Contributor Author

The following code of void PythonState::operator<<(const PyThreadState *const tstate) noexcept is scary:

    PyFrameObject *frame = PyThreadState_GetFrame((PyThreadState *)tstate);
    Py_XDECREF(frame);  // PyThreadState_GetFrame gives us a new
                        // reference.
    this->_top_frame.steal(frame);

Why does it call Py_XDECREF(): we have a nice strong reference which is safe to use. Calling Py_XDECREF() converts it to a borrowed reference and so the frame can be deallocated and we end up with a dangling pointer and likely a crash.

Exception Codes:       KERN_INVALID_ADDRESS at 0xdddddddddddddde5 -> 0xffffdddddddddde5 (possible pointer authentication failure)

Exactly: 0xdddddddddd... pointer is a dangling pointer.

@vstinner
Copy link
Contributor Author

I hacked greenlet to hold a strong reference to the frame, but then another assertion failed...

test_break_ctxvars (greenlet.tests.test_contextvars.ContextVarsTests.test_break_ctxvars) ...

python: Python/ceval.c:894: _PyEval_EvalFrameDefault: Assertion `STACK_LEVEL() == 0' failed.

Program received signal SIGABRT, Aborted.

@jamadden
Copy link
Contributor

Why does it call Py_XDECREF(): ... Calling Py_XDECREF() converts it to a borrowed reference and so the frame can be deallocated

Exactly. IIRC, in most places we keep borrowed references, it's specifically so they can be deallocated. greenlets aren't required to ever finish or get switched back to before they get deallocated (this is surprisingly easy to arrange to happen). That means that any code on the C call stack that was going to decref something like a frame may never get a chance to run. So, while a greenlet is switched out, we essentially transfer ownership out of the C stack and into us. If we kept an extra strong reference, and the C stack code never runs again, we leak.

This exposes implicit assumptions about the objects like that: we assume that "someone else" is keeping a strong reference to it, someone who may not run again to decref; or, put another way, the code that originally "owned" these objects won't run again to decref them until we switch the greenlet back in. It seems like something in this area has changed; for example, we're getting back fresh (transient) objects that have no other owner, or the place in the C stack that was keeping a strong reference is now keeping a borrowed reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants