Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault on python3.11 under coverage #977

Open
graingert opened this issue Dec 8, 2022 · 9 comments
Open

segfault on python3.11 under coverage #977

graingert opened this issue Dec 8, 2022 · 9 comments

Comments

@graingert
Copy link
Contributor

repro:

import sys
import os
import tempfile

import tables

def main():
    items2 = [
         "/data00",
         "/data01",
         "/data03",
         "/data05",
         "/data04",
         "/data10",
         "/data06",
         "/data07",
         "/data08",
    ]

    class MyTable(tables.IsDescription):
        index = tables.Float64Col()
        values_block_0 = tables.StringCol(1)
        values_block_1 = tables.Int64Col()

    with tempfile.TemporaryDirectory() as tmp_path:
        fn = os.path.join(tmp_path, "demo.h5")
        with tables.open_file(fn, mode="w") as handle:
            for key in items2:
                group = handle.create_group("/", key[1:], "")
                table = handle.create_table(group, "table", MyTable)
                table.cols.index.create_index(6)

if __name__ == "__main__":
    sys.exit(main())
PYTHONFAULTHANDLER=True coverage run demo.py
Fatal Python error: Segmentation fault

Current thread 0x00007fbc5b200740 (most recent call first):
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/tables/file.py", line 387 in cache_node
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/tables/node.py", line 372 in _g_set_location
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/tables/node.py", line 241 in __init__
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/tables/leaf.py", line 259 in __init__
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/tables/carray.py", line 200 in __init__
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/tables/earray.py", line 143 in __init__
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/tables/index.py", line 493 in _g_post_init_hook
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/tables/node.py", line 258 in __init__
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/tables/group.py", line 221 in __init__
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/tables/index.py", line 381 in __init__
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/tables/table.py", line 284 in _column__create_index
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/tables/table.py", line 3564 in create_index
  File "/home/graingert/projects/dask/demo.py", line 31 in main
  File "/home/graingert/projects/dask/demo.py", line 39 in <module>
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/coverage/execfile.py", line 199 in run
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/coverage/cmdline.py", line 830 in do_run
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/coverage/cmdline.py", line 659 in command_line
  File "/home/graingert/anaconda3/envs/dask-311/lib/python3.11/site-packages/coverage/cmdline.py", line 943 in main
  File "/home/graingert/anaconda3/envs/dask-311/bin/coverage", line 11 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, tables._comp_lzo, tables._comp_bzip2, tables.utilsextension, numexpr.interpreter, tables.hdf5extension, tables.linkextension, tables.lrucacheextension, tables.tableextension, tables.indexesextension (total: 22)
[1]    879063 segmentation fault (core dumped)  PYTHONFAULTHANDLER=True coverage run demo.py
@FrancescAlted
Copy link
Member

That's strange. We are lately testing PyTables quite a lot with Python 3.11 and the test suite is passing flawlessly (e.g. https://github.com/PyTables/PyTables/actions/runs/3639270655). Is your example passing with 3.10? Can you tell us how did you compile PyTables locally for 3.11?

@graingert
Copy link
Contributor Author

graingert commented Dec 8, 2022

this is with pytables compiled from conda-forge, and yes it passes with 3.10

@graingert
Copy link
Contributor Author

I can also reproduce this by compiling the code in master on a regular python virtual environment (without conda), the problem only happens when running with coverage

@graingert graingert changed the title seegfault on python3.11 under coverage segfault on python3.11 under coverage Dec 8, 2022
@FrancescAlted
Copy link
Member

I have done some research, and I can reproduce the segfault with Python 3.11 under coverage. For what is worth, everything seems to work well under Python 3.10, and valgrind does not issue significant warnings:

$ PYTHONPATH=. PYTHONFAULTHANDLER=True valgrind /home/faltet/miniconda3/envs/pytables/bin/coverage run bug-report.py
==665545== Memcheck, a memory error detector
==665545== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==665545== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==665545== Command: /home/faltet/miniconda3/envs/pytables/bin/coverage run bug-report.py
==665545==
==665678== Warning: invalid file descriptor 1024 in syscall close()
==665678== Warning: invalid file descriptor 1025 in syscall close()
==665678== Warning: invalid file descriptor 1026 in syscall close()
==665678== Warning: invalid file descriptor 1027 in syscall close()
==665678==    Use --log-fd=<number> to select an alternative log fd.
==665678== Warning: invalid file descriptor 1028 in syscall close()
==665678== Warning: invalid file descriptor 1029 in syscall close()
==665679== Warning: invalid file descriptor 1024 in syscall close()
==665679== Warning: invalid file descriptor 1025 in syscall close()
==665679== Warning: invalid file descriptor 1026 in syscall close()
==665679== Warning: invalid file descriptor 1027 in syscall close()
==665679==    Use --log-fd=<number> to select an alternative log fd.
==665679== Warning: invalid file descriptor 1028 in syscall close()
==665679== Warning: invalid file descriptor 1029 in syscall close()
==665545==
==665545== HEAP SUMMARY:
==665545==     in use at exit: 3,876,518 bytes in 2,973 blocks
==665545==   total heap usage: 100,866 allocs, 97,893 frees, 49,666,167 bytes allocated
==665545==
==665545== LEAK SUMMARY:
==665545==    definitely lost: 192 bytes in 3 blocks
==665545==    indirectly lost: 0 bytes in 0 blocks
==665545==      possibly lost: 487,116 bytes in 419 blocks
==665545==    still reachable: 3,389,210 bytes in 2,551 blocks
==665545==         suppressed: 0 bytes in 0 blocks
==665545== Rerun with --leak-check=full to see details of leaked memory
==665545==
==665545== For lists of detected and suppressed errors, rerun with: -s
==665545== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Whereas for Python 3.11 and coverage:

$ LD_LIBRARY_PATH=/usr/local/lib PYTHONPATH=. PYTHONFAULTHANDLER=True valgrind coverage run bug-report.py
==663946== Memcheck, a memory error detector
==663946== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==663946== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==663946== Command: /usr/bin/coverage run bug-report.py
==663946==
==663946== Invalid read of size 1
==663946==    at 0x357BC8: frame_dealloc (frameobject.c:864)
==663946==    by 0x3EB878: _PyTrash_thread_destroy_chain (object.c:2275)
==663946==    by 0x3B0C6E: UnknownInlinedFun (object.h:538)
==663946==    by 0x3B0C6E: UnknownInlinedFun (object.h:602)
==663946==    by 0x3B0C6E: list_ass_slice (listobject.c:729)
==663946==    by 0x56C02B15: __Pyx_DelItemInt_Fast (lrucacheextension.c:13862)
==663946==    by 0x56C02B15: __pyx_f_6tables_17lrucacheextension_9NodeCache_setitem (lrucacheextension.c:2740)
==663946==    by 0x56C0273E: __pyx_pf_6tables_17lrucacheextension_9NodeCache_4__setitem__ (lrucacheextension.c:2639)
==663946==    by 0x56C0273E: __pyx_pw_6tables_17lrucacheextension_9NodeCache_5__setitem__ (lrucacheextension.c:2616)
==663946==    by 0x56C0273E: __pyx_mp_ass_subscript_6tables_17lrucacheextension_NodeCache (lrucacheextension.c:11650)
==663946==    by 0x333B1C: _PyEval_EvalFrameDefault (ceval.c:2308)
==663946==    by 0x35147B: UnknownInlinedFun (pycore_ceval.h:73)
==663946==    by 0x35147B: UnknownInlinedFun (ceval.c:6428)
==663946==    by 0x35147B: _PyFunction_Vectorcall (call.c:393)
==663946==    by 0x3270CC: _PyObject_FastCallDictTstate (call.c:152)
==663946==    by 0x357526: UnknownInlinedFun (call.c:482)
==663946==    by 0x357526: slot_tp_init (typeobject.c:7861)
==663946==    by 0x322438: UnknownInlinedFun (typeobject.c:1112)
==663946==    by 0x322438: _PyObject_MakeTpCall (call.c:214)
==663946==    by 0x20EC6E: UnknownInlinedFun (ceval.c:7308)
==663946==    by 0x20EC6E: _PyEval_EvalFrameDefault.cold (ceval.c:4767)
==663946==    by 0x35147B: UnknownInlinedFun (pycore_ceval.h:73)
==663946==    by 0x35147B: UnknownInlinedFun (ceval.c:6428)
==663946==    by 0x35147B: _PyFunction_Vectorcall (call.c:393)
==663946==  Address 0x5d71dad is not stack'd, malloc'd or (recently) free'd
==663946==
Fatal Python error: Segmentation fault

Current thread 0x0000000004b6c740 (most recent call first):
  File "/home/faltet/software/PyTables/tables/file.py", line 387 in cache_node
  File "/home/faltet/software/PyTables/tables/node.py", line 372 in _g_set_location
  File "/home/faltet/software/PyTables/tables/node.py", line 241 in __init__
  File "/home/faltet/software/PyTables/tables/leaf.py", line 260 in __init__
  File "/home/faltet/software/PyTables/tables/carray.py", line 200 in __init__
  File "/home/faltet/software/PyTables/tables/earray.py", line 143 in __init__
  File "/home/faltet/software/PyTables/tables/index.py", line 493 in _g_post_init_hook
  File "/home/faltet/software/PyTables/tables/node.py", line 258 in __init__
  File "/home/faltet/software/PyTables/tables/group.py", line 221 in __init__
  File "/home/faltet/software/PyTables/tables/index.py", line 381 in __init__
  File "/home/faltet/software/PyTables/tables/table.py", line 284 in _column__create_index
  File "/home/faltet/software/PyTables/tables/table.py", line 3570 in create_index
  File "/home/faltet/software/PyTables/bug-report.py", line 32 in main
  File "/home/faltet/software/PyTables/bug-report.py", line 35 in <module>
  File "/usr/lib/python3.11/site-packages/coverage/execfile.py", line 199 in run
  File "/usr/lib/python3.11/site-packages/coverage/cmdline.py", line 830 in do_run
  File "/usr/lib/python3.11/site-packages/coverage/cmdline.py", line 659 in command_line
  File "/usr/lib/python3.11/site-packages/coverage/cmdline.py", line 943 in main
  File "/usr/bin/coverage", line 33 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, tables._comp_lzo, tables._comp_bzip2, tables.utilsextension, numexpr.interpreter, tables.hdf5extension, tables.linkextension, tables.lrucacheextension, tables.tableextension, tables.indexesextension (total: 22)
==663946==
==663946== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==663946==    at 0x4A007B9: __pthread_kill_implementation (pthread_kill.c:44)
==663946==    by 0x4A007B9: __pthread_kill_internal (pthread_kill.c:78)
==663946==    by 0x4A007B9: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==663946==    by 0x49A8BB1: raise (raise.c:26)
==663946==    by 0x49A8C4F: ??? (in /usr/lib64/glibc-hwcaps/x86-64-v3/libc.so.6)
==663946==    by 0x357BC7: frame_dealloc (frameobject.c:864)
==663946==
==663946== HEAP SUMMARY:
==663946==     in use at exit: 13,326,169 bytes in 23,298 blocks
==663946==   total heap usage: 45,228 allocs, 21,930 frees, 64,295,562 bytes allocated
==663946==
==663946== LEAK SUMMARY:
==663946==    definitely lost: 93,224 bytes in 71 blocks
==663946==    indirectly lost: 370,944 bytes in 255 blocks
==663946==      possibly lost: 235,812 bytes in 1,209 blocks
==663946==    still reachable: 12,626,189 bytes in 21,763 blocks
==663946==                       of which reachable via heuristic:
==663946==                         length64           : 188,107 bytes in 781 blocks
==663946==                         newarray           : 2,240 bytes in 118 blocks
==663946==         suppressed: 0 bytes in 0 blocks
==663946== Rerun with --leak-check=full to see details of leaked memory

Curiously enough, without coverage, valgrind do not finish normally with Python 3.11:

$ LD_LIBRARY_PATH=/usr/local/lib PYTHONPATH=. PYTHONFAULTHANDLER=True valgrind /usr/bin/python bug-report.py
==666420== Memcheck, a memory error detector
==666420== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==666420== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==666420== Command: /usr/bin/python bug-report.py
==666420==

Even more interestingly, specifying python3.11 explicitly, valgrind seems happier:

$ LD_LIBRARY_PATH=/usr/local/lib PYTHONPATH=. PYTHONFAULTHANDLER=True valgrind /usr/bin/python3.11 bug-report.py
==667315== Memcheck, a memory error detector
==667315== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==667315== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==667315== Command: /usr/bin/python3.11 bug-report.py
==667315==
==667315==
==667315== HEAP SUMMARY:
==667315==     in use at exit: 1,311,328 bytes in 797 blocks
==667315==   total heap usage: 50,839 allocs, 50,042 frees, 52,848,052 bytes allocated
==667315==
==667315== LEAK SUMMARY:
==667315==    definitely lost: 128 bytes in 2 blocks
==667315==    indirectly lost: 0 bytes in 0 blocks
==667315==      possibly lost: 183,998 bytes in 104 blocks
==667315==    still reachable: 1,127,202 bytes in 691 blocks
==667315==         suppressed: 0 bytes in 0 blocks
==667315== Rerun with --leak-check=full to see details of leaked memory
==667315==
==667315== For lists of detected and suppressed errors, rerun with: -s
==667315== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

And funny enough, both commands launch exactly the same interpreter:

(base) faltet@beast~/software/PyTables (direct-chunking-blosc2) $ /usr/bin/python
Python 3.11.0 (main, Oct 28 2022, 15:47:52) [GCC 12.2.1 20221024 releases/gcc-12.2.0-164-g1ccec25cf0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
(base) faltet@beast~/software/PyTables (direct-chunking-blosc2) $
(base) faltet@beast~/software/PyTables (direct-chunking-blosc2) $ /usr/bin/python3.11
Python 3.11.0 (main, Oct 28 2022, 15:47:52) [GCC 12.2.1 20221024 releases/gcc-12.2.0-164-g1ccec25cf0] on linux
Type "help", "copyright", "credits" or "license" for more information.

Now, if I remove the existing /usr/bin/python and replace it by a symbolic link to python3:

(base) faltet@beast~/software/PyTables (direct-chunking-blosc2) $ sudo mv /usr/bin/python /usr/bin/python.bck
Password:
(base) faltet@beast~/software/PyTables (direct-chunking-blosc2) $ sudo ln -s /usr/bin/python3 /usr/bin/python
(base) faltet@beast~/software/PyTables (direct-chunking-blosc2) $ ls -lh /usr/bin/python*
lrwxrwxrwx 1 root root   16 Dec  8 19:47 /usr/bin/python -> /usr/bin/python3
lrwxrwxrwx 1 root root   10 Feb  1  2019 /usr/bin/python3 -> python3.11
-rwxr-xr-x 2 root root 5.3M Feb  1  2019 /usr/bin/python3.11
-rwxr-xr-x 1 root root 3.4K Nov 25 12:01 /usr/bin/python3.11-config
lrwxrwxrwx 1 root root   17 Feb  1  2019 /usr/bin/python3-config -> python3.11-config
-rwxr-xr-x 1 root root  15K Oct  1  2017 /usr/bin/python.bck
(base) faltet@beast~/software/PyTables (direct-chunking-blosc2) $ LD_LIBRARY_PATH=/usr/local/lib PYTHONPATH=. PYTHONFAULTHANDLER=True valgrind /usr/bin/python bug-report.py
==669491== Memcheck, a memory error detector
==669491== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==669491== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==669491== Command: /usr/bin/python bug-report.py
==669491==
==669491==
==669491== HEAP SUMMARY:
==669491==     in use at exit: 1,311,280 bytes in 797 blocks
==669491==   total heap usage: 50,845 allocs, 50,048 frees, 52,848,053 bytes allocated
==669491==
==669491== LEAK SUMMARY:
==669491==    definitely lost: 128 bytes in 2 blocks
==669491==    indirectly lost: 0 bytes in 0 blocks
==669491==      possibly lost: 183,998 bytes in 104 blocks
==669491==    still reachable: 1,127,154 bytes in 691 blocks
==669491==         suppressed: 0 bytes in 0 blocks
==669491== Rerun with --leak-check=full to see details of leaked memory
==669491==
==669491== For lists of detected and suppressed errors, rerun with: -s
==669491== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

So, back to normality. Now, using coverage, I am getting the issue again:

(base) faltet@beast~/software/PyTables (direct-chunking-blosc2) $ LD_LIBRARY_PATH=/usr/local/lib PYTHONPATH=. PYTHONFAULTHANDLER=True valgrind coverage run bug-report.py
==670048== Memcheck, a memory error detector
==670048== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==670048== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==670048== Command: /usr/bin/coverage run bug-report.py
==670048==
==670048== Invalid read of size 1
==670048==    at 0x357BC8: frame_dealloc (frameobject.c:864)
==670048==    by 0x3EB878: _PyTrash_thread_destroy_chain (object.c:2275)
==670048==    by 0x3B0C6E: UnknownInlinedFun (object.h:538)
==670048==    by 0x3B0C6E: UnknownInlinedFun (object.h:602)
==670048==    by 0x3B0C6E: list_ass_slice (listobject.c:729)
==670048==    by 0x56C02B15: __Pyx_DelItemInt_Fast (lrucacheextension.c:13862)
==670048==    by 0x56C02B15: __pyx_f_6tables_17lrucacheextension_9NodeCache_setitem (lrucacheextension.c:2740)
==670048==    by 0x56C0273E: __pyx_pf_6tables_17lrucacheextension_9NodeCache_4__setitem__ (lrucacheextension.c:2639)
==670048==    by 0x56C0273E: __pyx_pw_6tables_17lrucacheextension_9NodeCache_5__setitem__ (lrucacheextension.c:2616)
==670048==    by 0x56C0273E: __pyx_mp_ass_subscript_6tables_17lrucacheextension_NodeCache (lrucacheextension.c:11650)
==670048==    by 0x333B1C: _PyEval_EvalFrameDefault (ceval.c:2308)
==670048==    by 0x35147B: UnknownInlinedFun (pycore_ceval.h:73)
==670048==    by 0x35147B: UnknownInlinedFun (ceval.c:6428)
==670048==    by 0x35147B: _PyFunction_Vectorcall (call.c:393)
==670048==    by 0x3270CC: _PyObject_FastCallDictTstate (call.c:152)
==670048==    by 0x357526: UnknownInlinedFun (call.c:482)
==670048==    by 0x357526: slot_tp_init (typeobject.c:7861)
==670048==    by 0x322438: UnknownInlinedFun (typeobject.c:1112)
==670048==    by 0x322438: _PyObject_MakeTpCall (call.c:214)
==670048==    by 0x20EC6E: UnknownInlinedFun (ceval.c:7308)
==670048==    by 0x20EC6E: _PyEval_EvalFrameDefault.cold (ceval.c:4767)
==670048==    by 0x35147B: UnknownInlinedFun (pycore_ceval.h:73)
==670048==    by 0x35147B: UnknownInlinedFun (ceval.c:6428)
==670048==    by 0x35147B: _PyFunction_Vectorcall (call.c:393)
==670048==  Address 0x5d71dad is not stack'd, malloc'd or (recently) free'd
==670048==
Fatal Python error: Segmentation fault

Current thread 0x0000000004b6c740 (most recent call first):
  File "/home/faltet/software/PyTables/tables/file.py", line 387 in cache_node
  File "/home/faltet/software/PyTables/tables/node.py", line 372 in _g_set_location
  File "/home/faltet/software/PyTables/tables/node.py", line 241 in __init__
  File "/home/faltet/software/PyTables/tables/leaf.py", line 260 in __init__
  File "/home/faltet/software/PyTables/tables/carray.py", line 200 in __init__
  File "/home/faltet/software/PyTables/tables/earray.py", line 143 in __init__
  File "/home/faltet/software/PyTables/tables/index.py", line 493 in _g_post_init_hook
  File "/home/faltet/software/PyTables/tables/node.py", line 258 in __init__
  File "/home/faltet/software/PyTables/tables/group.py", line 221 in __init__
  File "/home/faltet/software/PyTables/tables/index.py", line 381 in __init__
  File "/home/faltet/software/PyTables/tables/table.py", line 284 in _column__create_index
  File "/home/faltet/software/PyTables/tables/table.py", line 3570 in create_index
  File "/home/faltet/software/PyTables/bug-report.py", line 32 in main
  File "/home/faltet/software/PyTables/bug-report.py", line 35 in <module>
  File "/usr/lib/python3.11/site-packages/coverage/execfile.py", line 199 in run
  File "/usr/lib/python3.11/site-packages/coverage/cmdline.py", line 830 in do_run
  File "/usr/lib/python3.11/site-packages/coverage/cmdline.py", line 659 in command_line
  File "/usr/lib/python3.11/site-packages/coverage/cmdline.py", line 943 in main
  File "/usr/bin/coverage", line 33 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, tables._comp_lzo, tables._comp_bzip2, tables.utilsextension, numexpr.interpreter, tables.hdf5extension, tables.linkextension, tables.lrucacheextension, tables.tableextension, tables.indexesextension (total: 22)
==670048==
==670048== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==670048==    at 0x4A007B9: __pthread_kill_implementation (pthread_kill.c:44)
==670048==    by 0x4A007B9: __pthread_kill_internal (pthread_kill.c:78)
==670048==    by 0x4A007B9: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==670048==    by 0x49A8BB1: raise (raise.c:26)
==670048==    by 0x49A8C4F: ??? (in /usr/lib64/glibc-hwcaps/x86-64-v3/libc.so.6)
==670048==    by 0x357BC7: frame_dealloc (frameobject.c:864)
==670048==
==670048== HEAP SUMMARY:
==670048==     in use at exit: 13,325,569 bytes in 23,297 blocks
==670048==   total heap usage: 45,227 allocs, 21,930 frees, 64,291,430 bytes allocated
==670048==
==670048== LEAK SUMMARY:
==670048==    definitely lost: 90,656 bytes in 70 blocks
==670048==    indirectly lost: 360,208 bytes in 249 blocks
==670048==      possibly lost: 249,116 bytes in 1,216 blocks
==670048==    still reachable: 12,625,589 bytes in 21,762 blocks
==670048==                       of which reachable via heuristic:
==670048==                         length64           : 188,107 bytes in 781 blocks
==670048==                         newarray           : 2,240 bytes in 118 blocks
==670048==         suppressed: 0 bytes in 0 blocks
==670048== Rerun with --leak-check=full to see details of leaked memory
==670048==
==670048== For lists of detected and suppressed errors, rerun with: -s
==670048== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

All of this seems to point to running Python3.11 through a wrapper makes the issue to appear. Provided that we cannot reproduce the issue with Python 3.10 and that 3.11 is very young, my bet is that 3.11.0 is the guilty here (unless PyTables had an issue that went unnoticed since Python 2.x). At any rate, I'd be curious to check with 3.11.1 as soon as it is available.

@FrancescAlted
Copy link
Member

FWIW, Python 3.11.1 has been actually released 2 days ago, so I gave it a try, and still the same issue:

$ LD_LIBRARY_PATH=/usr/local/lib PYTHONPATH=. PYTHONFAULTHANDLER=True valgrind /home/faltet/.local/bin/coverage run bug-report.py
==781750== Memcheck, a memory error detector
==781750== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==781750== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==781750== Command: /home/faltet/.local/bin/coverage run bug-report.py
==781750==
==781750== Invalid read of size 1
==781750==    at 0x280E37: frame_dealloc (frameobject.c:865)
==781750==    by 0x2B8560: _PyTrash_thread_destroy_chain (object.c:2276)
==781750==    by 0x2B8560: _PyTrash_end (object.c:2302)
==781750==    by 0x28A800: Py_DECREF (object.h:538)
==781750==    by 0x28A800: Py_XDECREF (object.h:602)
==781750==    by 0x28A800: list_ass_slice (listobject.c:729)
==781750==    by 0x57F40B15: __Pyx_DelItemInt_Fast (lrucacheextension.c:13862)
==781750==    by 0x57F40B15: __pyx_f_6tables_17lrucacheextension_9NodeCache_setitem (lrucacheextension.c:2740)
==781750==    by 0x57F4073E: __pyx_pf_6tables_17lrucacheextension_9NodeCache_4__setitem__ (lrucacheextension.c:2639)
==781750==    by 0x57F4073E: __pyx_pw_6tables_17lrucacheextension_9NodeCache_5__setitem__ (lrucacheextension.c:2616)
==781750==    by 0x57F4073E: __pyx_mp_ass_subscript_6tables_17lrucacheextension_NodeCache (lrucacheextension.c:11650)
==781750==    by 0x200479: _PyEval_EvalFrameDefault (ceval.c:2301)
==781750==    by 0x368CEF: _PyEval_EvalFrame (pycore_ceval.h:73)
==781750==    by 0x368CEF: _PyEval_Vector (ceval.c:6435)
==781750==    by 0x25F2BD: _PyObject_FastCallDictTstate (call.c:152)
==781750==    by 0x25F2BD: _PyObject_Call_Prepend (call.c:482)
==781750==    by 0x2D5170: slot_tp_init (typeobject.c:7861)
==781750==    by 0x2CD015: type_call (typeobject.c:1112)
==781750==    by 0x25DA9C: _PyObject_MakeTpCall (call.c:214)
==781750==    by 0x1FD6F0: _PyEval_EvalFrameDefault (ceval.c:4772)
==781750==  Address 0x5971d35 is not stack'd, malloc'd or (recently) free'd
==781750==
Fatal Python error: Segmentation fault

Current thread 0x0000000004b6c740 (most recent call first):
  File "/home/faltet/software/PyTables/tables/file.py", line 387 in cache_node
  File "/home/faltet/software/PyTables/tables/node.py", line 372 in _g_set_location
  File "/home/faltet/software/PyTables/tables/node.py", line 241 in __init__
  File "/home/faltet/software/PyTables/tables/leaf.py", line 260 in __init__
  File "/home/faltet/software/PyTables/tables/carray.py", line 200 in __init__
  File "/home/faltet/software/PyTables/tables/earray.py", line 143 in __init__
  File "/home/faltet/software/PyTables/tables/index.py", line 493 in _g_post_init_hook
  File "/home/faltet/software/PyTables/tables/node.py", line 258 in __init__
  File "/home/faltet/software/PyTables/tables/group.py", line 221 in __init__
  File "/home/faltet/software/PyTables/tables/index.py", line 381 in __init__
  File "/home/faltet/software/PyTables/tables/table.py", line 284 in _column__create_index
  File "/home/faltet/software/PyTables/tables/table.py", line 3570 in create_index
  File "/home/faltet/software/PyTables/bug-report.py", line 32 in main
  File "/home/faltet/software/PyTables/bug-report.py", line 35 in <module>
  File "/home/faltet/.local/lib/python3.11/site-packages/coverage/execfile.py", line 199 in run
  File "/home/faltet/.local/lib/python3.11/site-packages/coverage/cmdline.py", line 830 in do_run
  File "/home/faltet/.local/lib/python3.11/site-packages/coverage/cmdline.py", line 659 in command_line
  File "/home/faltet/.local/lib/python3.11/site-packages/coverage/cmdline.py", line 943 in main
  File "/home/faltet/.local/bin/coverage", line 8 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, tables._comp_lzo, tables._comp_bzip2, tables.utilsextension, numexpr.interpreter, tables.hdf5extension, tables.linkextension, tables.lrucacheextension, tables.tableextension, tables.indexesextension (total: 22)
==781750==
==781750== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==781750==    at 0x4A007B9: __pthread_kill_implementation (pthread_kill.c:44)
==781750==    by 0x4A007B9: __pthread_kill_internal (pthread_kill.c:78)
==781750==    by 0x4A007B9: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==781750==    by 0x49A8BB1: raise (raise.c:26)
==781750==    by 0x49A8C4F: ??? (in /usr/lib64/glibc-hwcaps/x86-64-v3/libc.so.6)
==781750==    by 0x280E36: frame_dealloc (frameobject.c:862)
==781750==
==781750== HEAP SUMMARY:
==781750==     in use at exit: 12,129,148 bytes in 22,811 blocks
==781750==   total heap usage: 41,377 allocs, 18,566 frees, 44,667,808 bytes allocated
==781750==
==781750== LEAK SUMMARY:
==781750==    definitely lost: 98,384 bytes in 74 blocks
==781750==    indirectly lost: 404,766 bytes in 281 blocks
==781750==      possibly lost: 227,210 bytes in 1,205 blocks
==781750==    still reachable: 11,398,788 bytes in 21,251 blocks
==781750==                       of which reachable via heuristic:
==781750==                         length64           : 188,107 bytes in 781 blocks
==781750==                         newarray           : 2,240 bytes in 118 blocks
==781750==         suppressed: 0 bytes in 0 blocks
==781750== Rerun with --leak-check=full to see details of leaked memory
==781750==
==781750== For lists of detected and suppressed errors, rerun with: -s
==781750== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

@FrancescAlted
Copy link
Member

Until a better solution arises, I have found a workaround by setting the NODE_CACHE_SLOTS to 1 (i.e. disabling node caching). I.e., use:

       with tables.open_file(fn, mode="w", NODE_CACHE_SLOTS=1) as handle:

Alternatively, you can override the default of NODE_CACHE_SLOTS in system-wide tables/parameters.py.

At any rate, after experimenting with different values of NODE_CACHE_SLOTS, I see that using 32 does not segfault for the example above. As using 32 instead of 64 does not seem a big deal, perhaps I'll change this value prior to the next release (to happen very soon now). If anyone figures out a better solution, shout!

@avalentino
Copy link
Member

@FrancescAlted I wonder how much difference would make to use a pure python implementation of the LRU cache for the node cache. The Python standard library now has nice tools to implement it.
Do you remember numbers?

@FrancescAlted
Copy link
Member

FrancescAlted commented Dec 9, 2022

No, I don't remember well. What I do remember is that, when I was implementing the LRU cache, I was kind of a performance zealot and I wanted to get all the performance from extensions. But I bet now that using regular standard library would lead to similar numbers.

@avalentino
Copy link
Member

Thanks.
I see that we have a couple of script in the benchmark folder to measure the none cache performance.Maybe I could try to understand what is the impact of changing the LRU cache implementation in terms of coding effort.
Then we could re consider if it makes sense to change anything in this area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants