Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault after running test suite with hdf5 1.14.4.2 #2419

Open
antonio-rojas opened this issue May 2, 2024 · 8 comments
Open

Segfault after running test suite with hdf5 1.14.4.2 #2419

antonio-rojas opened this issue May 2, 2024 · 8 comments

Comments

@antonio-rojas
Copy link

After running the test suite against hdf5 1.14.4.2, we get the attached segfault. All tests pass correctly, the segfault happens after the test suite has finished running.

  • Operating System: Arch Linux x86_64
  • Python version: 3.12.3
  • Where Python was acquired: system
  • h5py version: 3.11.0
  • HDF5 version: 1.14.4.2
  • The full traceback/stack trace shown (if it appears)
#0  0x00007613d3646da8 in H5T__unlock_cb (_dt=0x5b1cc4fe5920, id=216172782113784908, _udata=0x7ffee4b314ec)
   at /usr/src/debug/hdf5/hdf5-1.14.4-2/src/H5T.c:1648
#1  0x00007613d357db2a in H5I__iterate_cb (_key=0x0, _udata=<synthetic pointer>, _item=0x5b1cc51c8270)
   at /usr/src/debug/hdf5/hdf5-1.14.4-2/src/H5Iint.c:276
#2  H5I_iterate (type=H5I_DATATYPE, func=0x7613d3646da0 <H5T__unlock_cb>, udata=0x7ffee4b314ec, app_ref=false)
   at /usr/src/debug/hdf5/hdf5-1.14.4-2/src/H5Iint.c:1584
#3  0x00007613d3647195 in H5T_top_term_package () at /usr/src/debug/hdf5/hdf5-1.14.4-2/src/H5T.c:1703
#4  0x00007613d3446ca5 in H5_term_library () at /usr/src/debug/hdf5/hdf5-1.14.4-2/src/H5.c:456
#5  0x00007613d5a5cb36 in __run_exit_handlers (status=status@entry=0, listp=0x7613d5bf6680 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, 
   run_dtors=run_dtors@entry=true) at exit.c:108
#6  0x00007613d5a5cc80 in __GI_exit (status=status@entry=0) at exit.c:138
#7  0x00007613d5e7f49e in Py_Exit (sts=0) at Python/pylifecycle.c:3060
#8  0x00007613d5e74dfb in handle_system_exit () at Python/pythonrun.c:756
#9  0x00007613d5e74747 in _PyErr_PrintEx (tstate=0x7613d6222ae8 <_PyRuntime+459656>, set_sys_last_vars=1) at Python/pythonrun.c:765
#10 0x00007613d5e743db in PyErr_PrintEx (set_sys_last_vars=1) at Python/pythonrun.c:845
#11 PyErr_Print () at Python/pythonrun.c:851
#12 _PyRun_SimpleFileObject (fp=<optimized out>, filename=<optimized out>, closeit=<optimized out>, flags=0x7ffee4b31e10) at Python/pythonrun.c:439
#13 0x00007613d5e73f88 in _PyRun_AnyFileObject (fp=0x5b1cc4601590, filename=0x7613d592e170, closeit=1, flags=0x7ffee4b31e10) at Python/pythonrun.c:78
#14 0x00007613d5e6cc67 in pymain_run_file_obj (skip_source_first_line=0, filename=0x7613d592e170, program_name=0x7613d592e130) at Modules/main.c:360
#15 pymain_run_file (config=0x7613d61c56c8 <_PyRuntime+77672>) at Modules/main.c:379
#16 pymain_run_python (exitcode=0x7ffee4b31de4) at Modules/main.c:629
#17 Py_RunMain () at Modules/main.c:709
#18 0x00007613d5e28fab in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:763
#19 0x00007613d5a43cd0 in __libc_start_call_main (main=main@entry=0x5b1cc299b120 <main>, argc=argc@entry=3, argv=argv@entry=0x7ffee4b32068)
   at ../sysdeps/nptl/libc_start_call_main.h:58
#20 0x00007613d5a43d8a in __libc_start_main_impl (main=0x5b1cc299b120 <main>, argc=3, argv=0x7ffee4b32068, init=<optimized out>, fini=<optimized out>, 
   rtld_fini=<optimized out>, stack_end=0x7ffee4b32058) at ../csu/libc-start.c:360
#21 0x00005b1cc299b045 in _start ()
@jhendersonHDF
Copy link

Hi @antonio-rojas, this is likely an issue with HDF5, as opposed to h5py, but I'll let the h5py folks chime in as well in case they've seen similar issues before. It appears to be a problem with a partially-initialized library-internal datatype and could be related to the Float16 support that was added for the 1.14.4 release. Though, I'd expect the library to be failing early on in initialization if that's the case. Are you building HDF5 from source for use with h5py? If so, you could try passing the "--disable-nonstandard-feature-float16" configure option (for Autotools) or "-DHDF5_ENABLE_NONSTANDARD_FEATURE_FLOAT16=OFF" option (for CMake) and see if the segfault still occurs.

I'll see about fixing the segfault issue, but would also like to determine what datatype is only getting partially initialized.

@antonio-rojas
Copy link
Author

Are you building HDF5 from source for use with h5py? If so, you could try passing the "--disable-nonstandard-feature-float16" configure option (for Autotools) or "-DHDF5_ENABLE_NONSTANDARD_FEATURE_FLOAT16=OFF" option (for CMake) and see if the segfault still occurs.

With that configure option, the h5py tests don't even run

============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-8.1.2, pluggy-1.4.0
rootdir: /build/python-h5py/src
plugins: typeguard-4.2.1, mpi-0.6
collected 0 items / 1 error

==================================== ERRORS ====================================
__ ERROR collecting h5py-3.11.0/build/lib.linux-x86_64-cpython-312/h5py/tests __
/usr/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1387: in _gcd_import
    ???
<frozen importlib._bootstrap>:1360: in _find_and_load
    ???
<frozen importlib._bootstrap>:1310: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:488: in _call_with_frames_removed
    ???
<frozen importlib._bootstrap>:1387: in _gcd_import
    ???
<frozen importlib._bootstrap>:1360: in _find_and_load
    ???
<frozen importlib._bootstrap>:1310: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:488: in _call_with_frames_removed
    ???
<frozen importlib._bootstrap>:1387: in _gcd_import
    ???
<frozen importlib._bootstrap>:1360: in _find_and_load
    ???
<frozen importlib._bootstrap>:1331: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:935: in _load_unlocked
    ???
<frozen importlib._bootstrap_external>:995: in exec_module
    ???
<frozen importlib._bootstrap>:488: in _call_with_frames_removed
    ???
h5py-3.11.0/build/lib.linux-x86_64-cpython-312/h5py/__init__.py:37: in <module>
    from ._conv import register_converters as _register_converters, \
h5py/_conv.pyx:1: in init h5py._conv
    ???
h5py/h5r.pyx:1: in init h5py.h5r
    ???
h5py/h5p.pyx:1: in init h5py.h5p
    ???
h5py/h5t.pyx:235: in init h5py.h5t
    ???
h5py/h5t.pyx:80: in h5py.h5t.lockid
    ???
h5py/h5t.pyx:49: in h5py.h5t.typewrap
    ???
E   ValueError: Not a datatype (not a datatype)
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!

@jhendersonHDF
Copy link

With that configure option, the h5py tests don't even run

I see, thanks for testing! I'm going to guess that's likely due to the changes added in https://github.com/h5py/h5py/pull/2406/files to support the Float16 work. It looks like h5py will always attempt to lock the H5T_NATIVE_FLOAT16 datatype, but that macro will point to an invalid datatype if _Float16 support is disabled, or if the platform doesn't have native support for the _Float16 C datatype. @ajelenak @takluyver I think h5py would need to check whether the H5T_NATIVE_FLOAT16 macro maps to the value H5I_INVALID_HID before attempting to lock it and use it later on, as well as unlock it.

FWIW, HDFGroup/hdf5#4459 should fix the segfault issue here and simply skip the case of trying to unlock a partially-initialized datatype.

@antonio-rojas
Copy link
Author

antonio-rojas commented May 2, 2024

FWIW, HDFGroup/hdf5#4459 should fix the segfault issue here and simply skip the case of trying to unlock a partially-initialized datatype.

Looks like the fix is incomplete, it's still crashing at H5T_close now

#0  0x0000714812853a0e in H5T_close (dt=0x5c21921a0980) at /usr/src/debug/hdf5/hdf5-hdf5_1.14.4.2/src/H5T.c:4207
#1  0x000071481284774b in H5T__close_cb (dt=0x5c21921a0980, request=<optimized out>) at /usr/src/debug/hdf5/hdf5-hdf5_1.14.4.2/src/H5T.c:1881
#2  0x0000714812776a19 in H5I__mark_node (key=0x0, _udata=<synthetic pointer>, _info=0x5c21922dea60)
    at /usr/src/debug/hdf5/hdf5-hdf5_1.14.4.2/src/H5Iint.c:388
#3  H5I_clear_type (type=<optimized out>, force=false, app_ref=<optimized out>) at /usr/src/debug/hdf5/hdf5-hdf5_1.14.4.2/src/H5Iint.c:323
#4  0x00007148128476c7 in H5T_top_term_package () at /usr/src/debug/hdf5/hdf5-hdf5_1.14.4.2/src/H5T.c:1706
#5  0x0000714812646ca5 in H5_term_library () at /usr/src/debug/hdf5/hdf5-hdf5_1.14.4.2/src/H5.c:456
#6  0x0000714814c5b2e6 in __run_exit_handlers (status=0, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true, 
    run_dtors=run_dtors@entry=true) at exit.c:108
#7  0x0000714814c5b42e in __GI_exit (status=<optimized out>) at exit.c:138
#8  0x0000714814c41d51 in __libc_start_call_main (main=main@entry=0x5c219066a120 <main>, argc=argc@entry=7, argv=argv@entry=0x7fff72a637f8)
    at ../sysdeps/nptl/libc_start_call_main.h:74
#9  0x0000714814c41e0c in __libc_start_main_impl (main=0x5c219066a120 <main>, argc=7, argv=0x7fff72a637f8, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fff72a637e8) at ../csu/libc-start.c:360
#10 0x00005c219066a045 in _start ()

@jhendersonHDF
Copy link

Looks like the fix is incomplete, it's still crashing at H5T_close now

Thanks, looks like some more bad assumptions in regards to partially-initialized datatypes. I'll do some more thorough testing and revise.

@antonio-rojas
Copy link
Author

The latest version of HDFGroup/hdf5#4459 fixes the issues for me.

@jhendersonHDF
Copy link

The latest version of HDFGroup/hdf5#4459 fixes the issues for me.

Thanks for testing @antonio-rojas! This should be merged soon and will be in HDF5 1.14.5 currently targeted for a Fall release. Unfortunately that means those wishing to use HDF5 1.14.4.2 with h5py will need that PR as a patch.

@jhendersonHDF
Copy link

FYI, we've decided to do a patch release of 1.14.4 with the fix in HDFGroup/hdf5#4459 so it can be used with h5py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants