Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.10 hang at exit in drop_gil() (due to resource warning at exit?) #91414

Open
rpurdie mannequin opened this issue Apr 8, 2022 · 6 comments
Open

Python 3.10 hang at exit in drop_gil() (due to resource warning at exit?) #91414

rpurdie mannequin opened this issue Apr 8, 2022 · 6 comments
Labels
3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@rpurdie
Copy link
Mannequin

rpurdie mannequin commented Apr 8, 2022

BPO 47258
Nosy @rpurdie

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2022-04-08.10:02:05.768>
labels = ['interpreter-core', '3.10']
title = 'Python 3.10 hang at exit in drop_gil() (due to resource warning at exit?)'
updated_at = <Date 2022-04-08.10:02:05.768>
user = 'https://github.com/rpurdie'

bugs.python.org fields:

activity = <Date 2022-04-08.10:02:05.768>
actor = 'rpurdie'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2022-04-08.10:02:05.768>
creator = 'rpurdie'
dependencies = []
files = []
hgrepos = []
issue_num = 47258
keywords = []
message_count = 1.0
messages = ['416968']
nosy_count = 1.0
nosy_names = ['rpurdie']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue47258'
versions = ['Python 3.10']

@rpurdie
Copy link
Mannequin Author

rpurdie mannequin commented Apr 8, 2022

We had a python hang at shutdown. The gdb python backtrace and C backtraces are below. It is hung in the COND_WAIT(gil->switch_cond, gil->switch_mutex) call in drop_gil().

Py_FinalizeEx -> handle_system_exit() -> PyGC_Collect -> handle_weakrefs -> drop_gil

I think from the stack trace it may have been printing the warning:

sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/pokybuild/yocto-worker/oe-selftest-fedora/build/build-st-1560250/bitbake-cookerdaemon.log' mode='a+' encoding='UTF-8'>

however I'm not sure if it was that or trying to show a different exception. Even if we have a resource leak, it shouldn't really hang!

(gdb) py-bt
Traceback (most recent call first):
  File "/usr/lib64/python3.10/weakref.py", line 106, in remove
    def remove(wr, selfref=ref(self), _atomic_removal=_remove_dead_weakref):
  Garbage-collecting
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7f0f7bd54b20 <_PyRuntime+512>) at futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7f0f7bd54b20 <_PyRuntime+512>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, 
    private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
#2  0x00007f0f7b88979f in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7f0f7bd54b20 <_PyRuntime+512>, expected=expected@entry=0, clockid=clockid@entry=0, 
    abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007f0f7b88beb0 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f0f7bd54b28 <_PyRuntime+520>, cond=0x7f0f7bd54af8 <_PyRuntime+472>) at pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=cond@entry=0x7f0f7bd54af8 <_PyRuntime+472>, mutex=mutex@entry=0x7f0f7bd54b28 <_PyRuntime+520>) at pthread_cond_wait.c:619
#5  0x00007f0f7bb388d8 in drop_gil (ceval=0x7f0f7bd54a78 <_PyRuntime+344>, ceval2=<optimized out>, tstate=0x558744ef7c10)
    at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/ceval_gil.h:182
#6  0x00007f0f7bb223e8 in eval_frame_handle_pending (tstate=<optimized out>) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/ceval.c:1185
#7  _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/ceval.c:1775
#8  0x00007f0f7bb19600 in _PyEval_EvalFrame (throwflag=0, 
    f=Frame 0x7f0f7a0c8a60, for file /usr/lib64/python3.10/weakref.py, line 106, in remove (wr=<KeyedRef at remote 0x7f0f7a2f5fd0>, selfref=<weakref at remote 0x7f0f7a2bb880>, _atomic_removal=<built-in method _remove_dead_weakref of module object at remote 0x7f0f7b67aac0>), tstate=0x558744ef7c10)
    at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Include/internal/pycore_ceval.h:46
#9  _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=1, kwnames=<optimized out>)
    at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/ceval.c:5065
#10 0x00007f0f7bb989a8 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x7fff8b815bc8, callable=<function at remote 0x7f0f7a2ab520>, 
    tstate=0x558744ef7c10) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Include/cpython/abstract.h:114
#11 PyObject_CallOneArg (func=<function at remote 0x7f0f7a2ab520>, arg=<optimized out>) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Include/cpython/abstract.h:184
#12 0x00007f0f7bb0fce1 in handle_weakrefs (old=0x558744edbd30, unreachable=0x7fff8b815c70) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/gcmodule.c:887
#13 gc_collect_main (tstate=0x558744ef7c10, generation=2, n_collected=0x7fff8b815d50, n_uncollectable=0x7fff8b815d48, nofail=0)
    at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/gcmodule.c:1281
#14 0x00007f0f7bb9194e in gc_collect_with_callback (tstate=tstate@entry=0x558744ef7c10, generation=generation@entry=2)
    at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/gcmodule.c:1413
#15 0x00007f0f7bbc827e in PyGC_Collect () at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/gcmodule.c:2099
#16 0x00007f0f7bbc7bc2 in Py_FinalizeEx () at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pylifecycle.c:1781
#17 0x00007f0f7bbc7d7c in Py_Exit (sts=0) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pylifecycle.c:2858
#18 0x00007f0f7bbc4fbb in handle_system_exit () at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:775
#19 0x00007f0f7bbc4f3d in _PyErr_PrintEx (set_sys_last_vars=1, tstate=0x558744ef7c10) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:785
#20 PyErr_PrintEx (set_sys_last_vars=1) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:880
#21 0x00007f0f7bbbcece in PyErr_Print () at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:886
#22 _PyRun_SimpleFileObject (fp=<optimized out>, filename=<optimized out>, closeit=1, flags=0x7fff8b815f18) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:462
#23 0x00007f0f7bbbcc57 in _PyRun_AnyFileObject (fp=0x558744ed9370, filename='/home/pokybuild/yocto-worker/oe-selftest-fedora/build/bitbake/bin/bitbake', closeit=1, 
    flags=0x7fff8b815f18) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:90
#24 0x00007f0f7bbb9e44 in pymain_run_file_obj (skip_source_first_line=0, filename='/home/pokybuild/yocto-worker/oe-selftest-fedora/build/bitbake/bin/bitbake', program_name='python3')
    at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/main.c:353
#25 pymain_run_file (config=0x558744edbe30) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/main.c:372
#26 pymain_run_python (exitcode=0x7fff8b815f10) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/main.c:587
#27 Py_RunMain () at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/main.c:666
#28 0x00007f0f7bb8670d in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/main.c:720
#29 0x00007f0f7b82c560 in __libc_start_call_main (main=main@entry=0x558743bfd160 <main>, argc=argc@entry=5, argv=argv@entry=0x7fff8b816168)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#30 0x00007f0f7b82c60c in __libc_start_main_impl (main=0x558743bfd160 <main>, argc=5, argv=0x7fff8b816168, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fff8b816158) at ../csu/libc-start.c:409
#31 0x0000558743bfd095 in _start ()

@rpurdie rpurdie mannequin added 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Apr 8, 2022
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@itachaaa
Copy link

itachaaa commented Aug 22, 2022

Similar problems are found in Python 3.9.11. The stack is as follows;We have six threads, five of which are daemon threads, two of which periodically wait for the thread lock, and two of which periodically release the lock. When the lock is hanged, both the thread and the main thread that wait for the lock in the two periods are suspended.

(gdb)
  Id   Target Id           Frame
* 1    LWP 237849 "python" 0x0000ffff87f352a8 in pthread_cond_wait () from /lib64/libpthread.so.0
  2    LWP 238085 "python" 0x0000ffff87f37e00 in ?? () from /lib64/libpthread.so.0
  3    LWP 238092 "python" 0x0000ffff87f37e00 in ?? () from /lib64/libpthread.so.0
(gdb) py-bt
Traceback (most recent call first):
  File "/usr1/python/python/target/checkout/package/python/python-22.50.19/lib/python3.9/socket.py", line 243, in __repr__
  Garbage-collecting
(gdb) bt
#0  0x0000ffff87f352a8 in pthread_cond_wait () from /lib64/libpthread.so.0
#1  0x0000aaaab1538184 in drop_gil (ceval=0xaaaab1b79820 <_PyRuntime+344>, ceval@entry=0xaaaab14dce9c <PyObject_Str+44>, ceval2=ceval2@entry=0xaaaab98e79f0,
    tstate=tstate@entry=0xaaaab98ea760) at Python/ceval_gil.h:178
#2  0x0000aaaab149b5c4 in eval_frame_handle_pending (tstate=0xaaaab98ea760) at Python/ceval.c:882
#3  _PyEval_EvalFrameDefault (tstate=<optimized out>,
    f=Frame 0xffff7833e610, for file /usr1/python/python/target/checkout/package/python/python-22.50.19/lib/python3.9/socket.py, line 243, in __repr__ (self=<SSLSoc   ket(_context=<SSLContext at remote 0xffff783d37c0>, _session=None, _sslobj=<_ssl._SSLSocket at remote 0xffff7838ca50>, server_side=False, server_hostname=None, do_h   andshake_on_connect=True, suppress_ragged_eofs=True, _connected=True) at remote 0xffff7838cac0>), throwflag=<optimized out>) at Python/ceval.c:1402
#4  0x0000aaaab1499e1c in _PyEval_EvalFrame (throwflag=0,
    f=Frame 0xffff7833e610, for file /usr1/python/python/target/checkout/package/python/python-22.50.19/lib/python3.9/socket.py, line 243, in __repr__ (self=<SSLSoc   ket(_context=<SSLContext at remote 0xffff783d37c0>, _session=None, _sslobj=<_ssl._SSLSocket at remote 0xffff7838ca50>, server_side=False, server_hostname=None, do_h   andshake_on_connect=True, suppress_ragged_eofs=True, _connected=True) at remote 0xffff7838cac0>), tstate=0xaaaab98ea760) at ./Include/internal/pycore_ceval.h:40
#5  function_code_fastcall (tstate=0xaaaab98ea760, co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:330
#6  0x0000aaaab14f6360 in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>,
    tstate=<optimized out>) at ./Include/cpython/abstract.h:118
#7  _PyObject_CallNoArg (func=<optimized out>) at ./Include/cpython/abstract.h:172
#8  call_unbound_noarg (self=<optimized out>, func=<function at remote 0xffff7ac07dc0>, unbound=<optimized out>) at Objects/typeobject.c:1515
#9  slot_tp_repr (self=<optimized out>) at Objects/typeobject.c:6630
#10 0x0000aaaab14dcd44 in PyObject_Repr (
    v=<SSLSocket(_context=<SSLContext at remote 0xffff783d37c0>, _session=None, _sslobj=<_ssl._SSLSocket at remote 0xffff7838ca50>, server_side=False, server_hostna   me=None, do_handshake_on_connect=True, suppress_ragged_eofs=True, _connected=True) at remote 0xffff7838cac0>) at Objects/object.c:420
#11 0x0000aaaab152006c in unicode_fromformat_arg (vargs=0xffffd4603ff8, f=<optimized out>, writer=0xffffd4603fb8) at Objects/unicodeobject.c:2989
#12 PyUnicode_FromFormatV (format=<optimized out>, vargs=...) at Objects/unicodeobject.c:3053
#13 0x0000aaaab1536368 in _PyErr_WarnFormatV (
    source=<SSLSocket(_context=<SSLContext at remote 0xffff783d37c0>, _session=None, _sslobj=<_ssl._SSLSocket at remote 0xffff7838ca50>, server_side=False, server_h   ostname=None, do_handshake_on_connect=True, suppress_ragged_eofs=True, _connected=True) at remote 0xffff7838cac0>, category=<type at remote 0xaaaab1b20db0>,
    stack_level=1, format=<optimized out>, vargs=<error reading variable: Cannot access memory at address 0xffffffbb>) at Python/_warnings.c:1124
#14 0x0000aaaab1536998 in PyErr_ResourceWarning (source=<optimized out>, stack_level=<optimized out>, format=<optimized out>) at Python/_warnings.c:1179
#15 0x0000ffff7abb9e54 in ?? ()
   from /opt/cloud/envs/Product-OSSToolService/222.170.102/rtsp/python/lib/python3.9/lib-dynload/_socket.cpython-39-aarch64-linux-gnu.so
#16 0x0000aaaab158a4b8 in finalize_garbage (tstate=0xaaaab98ea760, collectable=0xffffd4604280) at Modules/gcmodule.c:969
#17 collect (tstate=0xaaaab98ea760, generation=generation@entry=2, n_collected=n_collected@entry=0x0, n_uncollectable=n_uncollectable@entry=0x0,
    nofail=nofail@entry=1) at Modules/gcmodule.c:1260
#18 0x0000aaaab158b474 in _PyGC_CollectNoFail () at Modules/gcmodule.c:2097
#19 0x0000aaaab1554840 in _PyImport_Cleanup (tstate=tstate@entry=0xaaaab98ea760) at Python/import.c:572
#20 0x0000aaaab156963c in Py_FinalizeEx () at Python/pylifecycle.c:1426
#21 0x0000aaaab14a4110 in Py_RunMain () at Modules/main.c:679
#22 0x0000aaaab14a49f8 in pymain_main (args=args@entry=0xffffd4604568) at Modules/main.c:707
#23 0x0000aaaab14a4ad4 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
#24 0x0000ffff87c9ebec in __libc_start_main () from /lib64/libc.so.6
#25 0x0000aaaab14a37bc in _start ()
--Type <RET> for more, q to quit, c to continue without paging--
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

@kumaraditya303
Copy link
Contributor

@itachaaa Can you provide a minimal reproducer? Also can you try main branch?

@itachaaa
Copy link

The difficulty is that it does not always occur. The code is abstracted as follows, but it cannot be reproduced.Does the main branch refer to the master branch?

import gc
import socket
import sys
import threading
import time


class MyQueue(object):

    def __init__(self):
        self.event = threading.Event()
        self.i = 0

    def set_(self,):
        self.event.set()
        time.sleep(1)
        gc.collect()
        if self.i > 2:
            sys.exit()

    def wait(self, name):
        self.event.wait()
        print('waiting')
        self.event.clear()
        self.i += 1
        with open('log.txt', 'a') as fp:
            fp.write(str(self.i) + name)
            fp.write('\n')

    def start(self, daemon_, name):
        threading.Thread(
            target=loop_wrap(self.wait),
            args=(name, ),
            name="wait",
            daemon=daemon_
        ).start()
        
        threading.Thread(
            target=loop_wrap(self.set_),
            name="set",
            daemon=True
        ).start()


def loop_wrap(func):
    def wrapper(*args):
        while True:
            try:
                func(*args)
            except Exception:
                time.sleep(1)
    return wrapper


if __name__ == '__main__':
    queue1 = MyQueue()
    queue2 = MyQueue()
    queue1.start(True, 'queue1')
    queue2.start(True, 'queue2')
    # time.sleep(1)
    print('exit main')

@kumaraditya303
Copy link
Contributor

Yes main means master branch.

@Gigioliva
Copy link

Any news on this? I have the same problem but can't create an MWE. I have a periodic task that runs ~300 containers per day with the same code and randomly ~4 of them hang in this state

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
Projects
None yet
Development

No branches or pull requests

3 participants