Build libhdf5 with the --enable-threadsafe flag #776

ZanSara · 2019-10-28T16:46:56Z

Hello,

It seems to me that PyTables relies on the version of libhdf5.so found in the system rather than building its own, and that most systems have versions of libhdf5.so that are compiled without the --enable-threadsafe flag (please correct me if I'm wrong). This causes some annoying concurrency issues even while reading files (probably issues #700 and #593, Pandas issue #12236, and duplicates).

Rebuilding the HDF5 library with the proper flag and building PyTables over it seems to solve most of these issues, at least in the tests I've done so far.

Do you think is possible to bundle a version of libhdf5.so compiled with that flag, or to build it when installing PyTables? Unfortunately I am not an expert in this matter. For now I am doing the whole process from a bash script, but it would be amazing to have it done somehow when installing PyTables.

The exact flags I use for the build are /configure --prefix=/usr/local/hdf5 --disable-hl --enable-threadsafe

The text was updated successfully, but these errors were encountered:

tomkooij · 2019-10-28T18:37:18Z

@ZanSara : Good point. I agree that --enable-thread-safe is a sound option.

I guess pip install tables on Linux/Mac will usually install pytables from a wheel, with vendored hdf5 library, compiled from source when building the wheels.

I will add --enable-thread-safe in the wheel-builder repo: MacPython/pytables-wheels

For reference, the HDF5 config of the current wheels is:

	    SUMMARY OF THE HDF5 CONFIGURATION
	    =================================

General Information:
-------------------
                   HDF5 Version: 1.10.4
                  Configured on: Mon Oct 28 15:56:36 UTC 2019
                  Configured by: root@eb93ce3c980d
                    Host system: x86_64-unknown-linux-gnu
              Uname information: Linux eb93ce3c980d 4.4.0-101-generic #124~14.04.1-Ubuntu SMP Fri Nov 10 19:05:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
                       Byte sex: little-endian
             Installation point: /usr/local

Compiling Options:
------------------
                     Build Mode: production
              Debugging Symbols: no
                        Asserts: no
                      Profiling: no
             Optimization Level: high

Linking Options:
----------------
                      Libraries: static, shared
  Statically Linked Executables: 
                        LDFLAGS: 
                     H5_LDFLAGS: 
                     AM_LDFLAGS:  -L/usr/local/lib
                Extra libraries: -lrt -lsz -lz -ldl -lm 
                       Archiver: ar
                       AR_FLAGS: cr
                         Ranlib: ranlib

Languages:
----------
                              C: yes
                     C Compiler: /opt/rh/devtoolset-2/root/usr/bin/gcc ( gcc (GCC) 4.8.2 20140120 )
                       CPPFLAGS: -I/usr/local/include 
                    H5_CPPFLAGS: -D_GNU_SOURCE -D_POSIX_C_SOURCE=200112L   -DNDEBUG -UH5_DEBUG_API
                    AM_CPPFLAGS:  -I/usr/local/include
                        C Flags: -Wl,-strip-all
                     H5 C Flags:  -std=c99  -pedantic -Wall -Wextra -Wbad-function-cast -Wc++-compat -Wcast-align -Wcast-qual -Wconversion -Wdeclaration-after-statement -Wdisabled-optimization -Wfloat-equal -Wformat=2 -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-include-dirs -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpacked -Wpointer-arith -Wredundant-decls -Wshadow -Wstrict-prototypes -Wswitch-default -Wswitch-enum -Wundef -Wunused-macros -Wunsafe-loop-optimizations -Wwrite-strings -Wlogical-op -Wlarger-than=2048 -Wvla -Wsync-nand -Wframe-larger-than=16384 -Wpacked-bitfield-compat -Wstrict-overflow=5 -Wjump-misses-init -Wdouble-promotion -Wtrampolines -Wstack-usage=8192 -Wvector-operation-performance  -s -Wno-inline -Wno-aggregate-return -Wno-missing-format-attribute -Wno-missing-noreturn -Wno-suggest-attribute=const -Wno-suggest-attribute=pure -Wno-suggest-attribute=noreturn -Wno-suggest-attribute=format -O3
                     AM C Flags: 
               Shared C Library: yes
               Static C Library: yes


                        Fortran: no

                            C++: no

                           Java: no


Features:
---------
                   Parallel HDF5: no
Parallel Filtered Dataset Writes: no
              Large Parallel I/O: no
              High-level library: yes
                    Threadsafety: no
             Default API mapping: v110
  With deprecated public symbols: yes
          I/O filters (external): deflate(zlib),szip(encoder)
                             MPE: no
                      Direct VFD: no
                         dmalloc: no
  Packages w/ extra debug output: none
                     API tracing: no
            Using memory checker: no
 Memory allocation sanity checks: no
             Metadata trace file: no
          Function stack tracing: no
       Strict file format checks: no
    Optimization instrumentation: nol

ZanSara · 2019-10-29T12:13:31Z

Thank you for your help!

@ZanSara

MRG: Build HDF5 with thread safety enabled Build HDF5 with --enable-thread-safe flag. This was brought to my attention by @ZanSara over at PyTables/PyTables#776 It seems a good idea to enable thread safety for the HDF5 library. This makes a lot of sense, as the conda-forge [hdf5 package is also build](https://github.com/conda-forge/hdf5-feedstock/blob/master/recipe/build.sh) with this flag.

tomkooij · 2019-10-29T13:39:39Z

Once this get merged, I'll rebuild the wheels for 3.6.1 and let you know here.

tdagnino · 2019-11-05T01:38:56Z

Hi,

I'm sorry if I`m posting my question in wrong way but I've been really struggling with this problem for a few days now and came onto this bug fix which really got my hopes up but I've had no success with the 3.6.1 version I installed with pip.

I'm having the same problems with multiple threads with release 3.6.1 on windows.

My program is very simple. Actually, I've ultra simplified it and still have issues. Open different hdf5 files in each thread and just retrieve the objects at the keys.

store = pd.HDFStore(file_name, mode="r")
keys = store.keys()
for key in keys:
     print(key)
     store.get(key)
store.close()

Was the problem only fixed for linux and mac in release 3.6.1 ?

Thank you for your help.

tomkooij · 2019-11-05T12:44:54Z

@tdagnino : The current wheels vendor a HDF5 lib that is not yet compiled with the --enable-threadsafe flag.

Still working on this over at matthew-brett/multibuild#277

Will report here when finished.

tomkooij · 2019-11-06T13:26:43Z

@ZanSara threadsafe wheels (3.6.1-2) are on test.pypi,org. Can you test a wheel before I upload them to the real pypi and break everybody's builds?

You can install from testpypi using:

$ pip install --index-url https://test.pypi.org/simple/ tables

This should install a manylinux wheel (on Linux) with hdf5 compiled with --enable-threadsafe

Please let me know if you are able.

ZanSara · 2019-11-06T15:11:53Z

I have a try now and let you know soon 👍

ZanSara · 2019-11-06T16:26:34Z

My original test code was written with Pandas, not directly with PyTables, and with it, recompiling the library solved the bug just fine. However, installing the test version of tables you linked (before installing Pandas of course) doesn't fix Pandas bug.

For reference, my test code is taken from one of the issues I reference above:

import pandas as pd
import numpy as np
from multiprocessing import Pool
import warnings

# To avoid natural name warnings
warnings.filterwarnings('ignore')

def init(hdf_store):
    global hdf_buff
    hdf_buff = hdf_store

def reader(name):
    df = hdf_buff[name]
    return (name, df)

def main():
    # Creating the store
    with pd.HDFStore('storage.h5', 'w') as store:
        for i in range(100):
            df = pd.DataFrame(np.random.rand(5,3), columns=list('abc'))
            store.append(str(i), df, index=False, expectedrows=5)
    # Reading concurrently with one connection
    with pd.HDFStore('storage.h5', 'r') as store:
        with Pool(4, initializer=init, initargs=(store,)) as p:
            ret = pd.concat(dict(p.map(reader, [str(i) for i in range(100)])))

if __name__ == '__main__':
    main()

For reference, on a vanilla version of Pandas the code crashes with one of these two error messages at random:

Traceback (most recent call last):
  File "read-concurrently-pandas.py", line 29, in <module>
    main()
  File "read-concurrently-pandas.py", line 26, in main
    ret = pd.concat(dict(p.map(reader, [str(i) for i in range(100)])))
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
KeyError: 'No object named 21 in the file'

or

tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "H5Dio.c", line 199, in H5Dread
    can't read data
  File "H5Dio.c", line 601, in H5D__read
    can't read data
  File "H5Dchunk.c", line 2201, in H5D__chunk_read
    error looking up chunk address
  File "H5Dchunk.c", line 2931, in H5D__chunk_lookup
    can't query chunk address
  File "H5Dbtree.c", line 1049, in H5D__btree_idx_get_addr
    can't get chunk info
  File "H5B.c", line 335, in H5B_find
    unable to load B-tree node
  File "H5AC.c", line 1625, in H5AC_protect
    H5C_protect() failed
  File "H5C.c", line 2362, in H5C_protect
    can't load entry
  File "H5C.c", line 6726, in H5C_load_entry
    Can't deserialize image
  File "H5Bcache.c", line 181, in H5B__cache_deserialize
    wrong B-tree signature

End of HDF5 error back trace

On the recompiled libraries instead it works fine.

I am now writing a simple test that uses tables directly to see if the problem persists . Maybe it's also a problem with my setup. I'll keep you updated.

ZanSara · 2019-11-07T09:44:01Z

Ok, here is the modified test (I post it so you can double-check for mistakes):

import tables
import numpy as np
from multiprocessing import Pool
import warnings

# To avoid natural name warnings
warnings.filterwarnings('ignore')

class Particle(tables.IsDescription):
    name = tables.StringCol(16)     # 16-character String
    idnumber = tables.Int64Col()    # Signed 64-bit integer

def init(hdf_store):
    global hdf_buff
    hdf_buff = hdf_store

def reader(name):
    table = hdf_buff.root.readout
    return (name, table)

def main():
    # Create test file
    with tables.open_file("storage.h5", mode="w", title="Test file") as store:
        table = store.create_table("/", 'readout', Particle, "Readout example")

        particle = table.row
        for i in range(100):
            particle['name'] = 'Particle: %6d' % (i)
            particle['idnumber'] = i * (2 ** 34)
            particle.append()
        table.flush()
    
    # Simply read the table - no problem
    with tables.open_file("storage.h5", mode="r", title="Test file") as store:
        cols = []
        init(store)
        for i in range(100):
            cols.append( reader(str(i)) ) 
        ret = np.column_stack(cols)

    # Reading concurrently with one connection - fails
    with tables.open_file("storage.h5", mode="r", title="Test file") as store:
        with Pool(4, initializer=init, initargs=(store,)) as p:
            ret = np.column_stack( dict(p.map(reader, [str(i) for i in range(100)])).values() )
            print(ret)

if __name__ == '__main__':
    main()

If you comment out the concurrent read code, the script above works fine. If you run it all, it will crash with one of these two errors at random:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "read-concurrently-tables.py", line 18, in reader
    table = hdf_buff.root.readout
  File "/home/szanzott/Projects/hdf5_compile_trials/pytables-new-wheels/old-pytables/lib/python3.6/site-packages/tables/group.py", line 836, in __getattr__
    return self._f_get_child(name)
  File "/home/szanzott/Projects/hdf5_compile_trials/pytables-new-wheels/old-pytables/lib/python3.6/site-packages/tables/group.py", line 708, in _f_get_child
    self._g_check_has_child(childname)
  File "/home/szanzott/Projects/hdf5_compile_trials/pytables-new-wheels/old-pytables/lib/python3.6/site-packages/tables/group.py", line 395, in _g_check_has_child
    % (self._v_pathname, name))
tables.exceptions.NoSuchNodeError: group ``/`` does not have a child named ``readout``
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "read-concurrently-tables.py", line 49, in <module>
    main()
  File "read-concurrently-tables.py", line 44, in main
    ret = np.column_stack( dict(p.map(reader, [str(i) for i in range(100)])).values() )
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
tables.exceptions.NoSuchNodeError: group ``/`` does not have a child named ``readout``

or

Traceback (most recent call last):
  File "read-concurrently-tables.py", line 49, in <module>
    main()
  File "read-concurrently-tables.py", line 44, in main
    ret = np.column_stack( dict(p.map(reader, [str(i) for i in range(100)])).values() )
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[('14', /readout (Table(100,)) 'Readout example'
  description := {
  "idnumber": Int64Col(shape=(), dflt=0, pos=0),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=1)}
  byteorder := 'little'
  chunkshape := (2730,)), ('15', /readout (Table(100,)) 'Readout example'
  description := {
  "idnumber": Int64Col(shape=(), dflt=0, pos=0),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=1)}
  byteorder := 'little'
  chunkshape := (2730,)), ('16', /readout (Table(100,)) 'Readout example'
  description := {
  "idnumber": Int64Col(shape=(), dflt=0, pos=0),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=1)}
  byteorder := 'little'
  chunkshape := (2730,)), ('17', /readout (Table(100,)) 'Readout example'
  description := {
  "idnumber": Int64Col(shape=(), dflt=0, pos=0),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=1)}
  byteorder := 'little'
  chunkshape := (2730,)), ('18', /readout (Table(100,)) 'Readout example'
  description := {
  "idnumber": Int64Col(shape=(), dflt=0, pos=0),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=1)}
  byteorder := 'little'
  chunkshape := (2730,)), ('19', /readout (Table(100,)) 'Readout example'
  description := {
  "idnumber": Int64Col(shape=(), dflt=0, pos=0),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=1)}
  byteorder := 'little'
  chunkshape := (2730,)), ('20', /readout (Table(100,)) 'Readout example'
  description := {
  "idnumber": Int64Col(shape=(), dflt=0, pos=0),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=1)}
  byteorder := 'little'
  chunkshape := (2730,))]'. Reason: 'TypeError('self.dims,self.wbuf cannot be converted to a Python object for pickling',)'

This is what happend with the current PyTables, so not with your test ones yet. Now I setup with your version and let you know.

ZanSara · 2019-11-07T09:53:48Z

I can confirm that the new test version does not fix the bug.
I re-tested also with my hand-made environment and the concurrent read works fine there.
If there is a way for me to help you debug this further, let me know :)

tomkooij · 2019-11-07T11:41:40Z

@ZanSara thanks this helps a lot! I will look into it and come back to you (probably asking for more help)

eriniocentric · 2021-07-23T00:20:02Z

Hi, I am still dealing with this exact issue.

I access H5 files for read access in using python's multiprocessing on two separate environments. It works fine on one but fails in the other and I can't pinpoint why. Any tips for solving this?

avalentino · 2021-07-23T05:35:32Z

@eriniocentric if you are using multiprocessing probably it is not an threading issue IMHO.

eriniocentric · 2021-07-24T19:49:21Z

Any idea what these errors are telling me. It doesn't happen when I run the command alone but happens when I use multiprocessing Pool class. I am basically doing a read_where in each thread on the same h5 file table.

Exception ignored in: 'tables.indexesextension.IndexArray._g_read_sorted_slice'
tables.exceptions.HDF5ExtError: Problems reading the array data.
Exception ignored in: 'tables.tableextension.Table._read_chunk'
tables.exceptions.HDF5ExtError: HDF5 error back trace

File "H5Dio.c", line 199, in H5Dread
can't read data
File "H5Dio.c", line 601, in H5D__read
can't read data
File "H5Dchunk.c", line 2201, in H5D__chunk_read
error looking up chunk address
File "H5Dchunk.c", line 2931, in H5D__chunk_lookup
can't query chunk address
File "H5Dbtree.c", line 1049, in H5D__btree_idx_get_addr
can't get chunk info
File "H5B.c", line 357, in H5B_find
can't lookup key in subtree
File "H5B.c", line 357, in H5B_find
can't lookup key in subtree
File "H5B.c", line 335, in H5B_find
unable to load B-tree node
File "H5AC.c", line 1625, in H5AC_protect
H5C_protect() failed
File "H5C.c", line 2362, in H5C_protect
can't load entry
File "H5C.c", line 6726, in H5C_load_entry
Can't deserialize image
File "H5Bcache.c", line 181, in H5B__cache_deserialize
wrong B-tree signature

End of HDF5 error back trace

Problems reading chunk records.

avalentino · 2021-07-26T06:13:37Z

It is hard to say what can be the problem.
Do you open the HDF5 file in the worker process or in the main one?
Maybe the first option is safer but I'm not sure.
Also I assume that all worker processes use the HDF5 file in read-only mode, is it correct?

muraleee · 2022-10-12T19:30:36Z

Did --enable-threadsafe get added at all?

avalentino · 2022-10-13T05:55:45Z

HDF5 for wheels is built using https://github.com/PyTables/PyTables/blob/master/ci/github/get_hdf5.sh
It seems that the --enable-threadsafe option is still not used.

tomkooij mentioned this issue Oct 28, 2019

Build HDF5 with thread safety enabled multi-build/multibuild#274

Merged

ZanSara mentioned this issue Oct 29, 2019

BUG: read_hdf is not read-threadsafe pandas-dev/pandas#12236

Open

tomkooij mentioned this issue Oct 30, 2019

Release 3.6.1 #775

Merged

11 tasks

tomkooij added this to the 3.6.2 milestone Jan 19, 2020

avalentino added the wheel label Mar 20, 2021

minasmz mentioned this issue Oct 22, 2021

HDF5 issue when reading hdf5 file ratschlab/HIRID-ICU-Benchmark#9

Closed

aufdenkampe mentioned this issue Nov 10, 2021

Release to support HDF5 1.12.1 #912

Closed

avalentino added the help wanted label Dec 14, 2021

avalentino linked a pull request Dec 22, 2021 that will close this issue

Build hdf5 from source when building wheels #930

Merged

avalentino added the good first issues label Dec 22, 2021

avalentino closed this as completed in #930 Dec 26, 2021

gszep mentioned this issue Oct 26, 2022

Crash when using from multiple threads JuliaIO/HDF5.jl#835

Closed

Coolicedtea mentioned this issue Nov 5, 2023

Build wheel with Threadsafe hdf5 lib #1075

Closed

clime mentioned this issue Apr 10, 2024

Exception: tables.exceptions.HDF5ExtError: Problems reading the array data. #1153

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build libhdf5 with the --enable-threadsafe flag #776

Build libhdf5 with the --enable-threadsafe flag #776

ZanSara commented Oct 28, 2019 •

edited

tomkooij commented Oct 28, 2019

ZanSara commented Oct 29, 2019

tomkooij commented Oct 29, 2019

tdagnino commented Nov 5, 2019

tomkooij commented Nov 5, 2019

tomkooij commented Nov 6, 2019

ZanSara commented Nov 6, 2019

ZanSara commented Nov 6, 2019

ZanSara commented Nov 7, 2019

ZanSara commented Nov 7, 2019 •

edited

tomkooij commented Nov 7, 2019

eriniocentric commented Jul 23, 2021

avalentino commented Jul 23, 2021

eriniocentric commented Jul 24, 2021

avalentino commented Jul 26, 2021

muraleee commented Oct 12, 2022

avalentino commented Oct 13, 2022

Build libhdf5 with the --enable-threadsafe flag #776

Build libhdf5 with the --enable-threadsafe flag #776

Comments

ZanSara commented Oct 28, 2019 • edited

tomkooij commented Oct 28, 2019

ZanSara commented Oct 29, 2019

tomkooij commented Oct 29, 2019

tdagnino commented Nov 5, 2019

tomkooij commented Nov 5, 2019

tomkooij commented Nov 6, 2019

ZanSara commented Nov 6, 2019

ZanSara commented Nov 6, 2019

ZanSara commented Nov 7, 2019

ZanSara commented Nov 7, 2019 • edited

tomkooij commented Nov 7, 2019

eriniocentric commented Jul 23, 2021

avalentino commented Jul 23, 2021

eriniocentric commented Jul 24, 2021

avalentino commented Jul 26, 2021

muraleee commented Oct 12, 2022

avalentino commented Oct 13, 2022

ZanSara commented Oct 28, 2019 •

edited

ZanSara commented Nov 7, 2019 •

edited