Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build libhdf5 with the --enable-threadsafe flag #776

Closed
ZanSara opened this issue Oct 28, 2019 · 17 comments · Fixed by #930
Closed

Build libhdf5 with the --enable-threadsafe flag #776

ZanSara opened this issue Oct 28, 2019 · 17 comments · Fixed by #930

Comments

@ZanSara
Copy link

ZanSara commented Oct 28, 2019

Hello,

It seems to me that PyTables relies on the version of libhdf5.so found in the system rather than building its own, and that most systems have versions of libhdf5.so that are compiled without the --enable-threadsafe flag (please correct me if I'm wrong). This causes some annoying concurrency issues even while reading files (probably issues #700 and #593, Pandas issue #12236, and duplicates).

Rebuilding the HDF5 library with the proper flag and building PyTables over it seems to solve most of these issues, at least in the tests I've done so far.

Do you think is possible to bundle a version of libhdf5.so compiled with that flag, or to build it when installing PyTables? Unfortunately I am not an expert in this matter. For now I am doing the whole process from a bash script, but it would be amazing to have it done somehow when installing PyTables.

The exact flags I use for the build are /configure --prefix=/usr/local/hdf5 --disable-hl --enable-threadsafe

@tomkooij
Copy link
Contributor

@ZanSara : Good point. I agree that --enable-thread-safe is a sound option.

I guess pip install tables on Linux/Mac will usually install pytables from a wheel, with vendored hdf5 library, compiled from source when building the wheels.

I will add --enable-thread-safe in the wheel-builder repo: MacPython/pytables-wheels

For reference, the HDF5 config of the current wheels is:

	    SUMMARY OF THE HDF5 CONFIGURATION
	    =================================

General Information:
-------------------
                   HDF5 Version: 1.10.4
                  Configured on: Mon Oct 28 15:56:36 UTC 2019
                  Configured by: root@eb93ce3c980d
                    Host system: x86_64-unknown-linux-gnu
              Uname information: Linux eb93ce3c980d 4.4.0-101-generic #124~14.04.1-Ubuntu SMP Fri Nov 10 19:05:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
                       Byte sex: little-endian
             Installation point: /usr/local

Compiling Options:
------------------
                     Build Mode: production
              Debugging Symbols: no
                        Asserts: no
                      Profiling: no
             Optimization Level: high

Linking Options:
----------------
                      Libraries: static, shared
  Statically Linked Executables: 
                        LDFLAGS: 
                     H5_LDFLAGS: 
                     AM_LDFLAGS:  -L/usr/local/lib
                Extra libraries: -lrt -lsz -lz -ldl -lm 
                       Archiver: ar
                       AR_FLAGS: cr
                         Ranlib: ranlib

Languages:
----------
                              C: yes
                     C Compiler: /opt/rh/devtoolset-2/root/usr/bin/gcc ( gcc (GCC) 4.8.2 20140120 )
                       CPPFLAGS: -I/usr/local/include 
                    H5_CPPFLAGS: -D_GNU_SOURCE -D_POSIX_C_SOURCE=200112L   -DNDEBUG -UH5_DEBUG_API
                    AM_CPPFLAGS:  -I/usr/local/include
                        C Flags: -Wl,-strip-all
                     H5 C Flags:  -std=c99  -pedantic -Wall -Wextra -Wbad-function-cast -Wc++-compat -Wcast-align -Wcast-qual -Wconversion -Wdeclaration-after-statement -Wdisabled-optimization -Wfloat-equal -Wformat=2 -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-include-dirs -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpacked -Wpointer-arith -Wredundant-decls -Wshadow -Wstrict-prototypes -Wswitch-default -Wswitch-enum -Wundef -Wunused-macros -Wunsafe-loop-optimizations -Wwrite-strings -Wlogical-op -Wlarger-than=2048 -Wvla -Wsync-nand -Wframe-larger-than=16384 -Wpacked-bitfield-compat -Wstrict-overflow=5 -Wjump-misses-init -Wdouble-promotion -Wtrampolines -Wstack-usage=8192 -Wvector-operation-performance  -s -Wno-inline -Wno-aggregate-return -Wno-missing-format-attribute -Wno-missing-noreturn -Wno-suggest-attribute=const -Wno-suggest-attribute=pure -Wno-suggest-attribute=noreturn -Wno-suggest-attribute=format -O3
                     AM C Flags: 
               Shared C Library: yes
               Static C Library: yes


                        Fortran: no

                            C++: no

                           Java: no


Features:
---------
                   Parallel HDF5: no
Parallel Filtered Dataset Writes: no
              Large Parallel I/O: no
              High-level library: yes
                    Threadsafety: no
             Default API mapping: v110
  With deprecated public symbols: yes
          I/O filters (external): deflate(zlib),szip(encoder)
                             MPE: no
                      Direct VFD: no
                         dmalloc: no
  Packages w/ extra debug output: none
                     API tracing: no
            Using memory checker: no
 Memory allocation sanity checks: no
             Metadata trace file: no
          Function stack tracing: no
       Strict file format checks: no
    Optimization instrumentation: nol 

@ZanSara
Copy link
Author

ZanSara commented Oct 29, 2019

Thank you for your help!

matthew-brett added a commit to multi-build/multibuild that referenced this issue Oct 29, 2019
MRG: Build HDF5 with thread safety enabled

Build HDF5 with --enable-thread-safe flag.

This was brought to my attention by @ZanSara over at PyTables/PyTables#776

It seems a good idea to enable thread safety for the HDF5 library.

This makes a lot of sense, as the conda-forge [hdf5 package is also build](https://github.com/conda-forge/hdf5-feedstock/blob/master/recipe/build.sh) with this flag.
@tomkooij
Copy link
Contributor

Once this get merged, I'll rebuild the wheels for 3.6.1 and let you know here.

@tomkooij tomkooij mentioned this issue Oct 30, 2019
11 tasks
@tdagnino
Copy link

tdagnino commented Nov 5, 2019

Hi,

I'm sorry if I`m posting my question in wrong way but I've been really struggling with this problem for a few days now and came onto this bug fix which really got my hopes up but I've had no success with the 3.6.1 version I installed with pip.

I'm having the same problems with multiple threads with release 3.6.1 on windows.

My program is very simple. Actually, I've ultra simplified it and still have issues. Open different hdf5 files in each thread and just retrieve the objects at the keys.

store = pd.HDFStore(file_name, mode="r")
keys = store.keys()
for key in keys:
     print(key)
     store.get(key)
store.close()

Was the problem only fixed for linux and mac in release 3.6.1 ?

Thank you for your help.

@tomkooij
Copy link
Contributor

tomkooij commented Nov 5, 2019

@tdagnino : The current wheels vendor a HDF5 lib that is not yet compiled with the --enable-threadsafe flag.

Still working on this over at matthew-brett/multibuild#277

Will report here when finished.

@tomkooij
Copy link
Contributor

tomkooij commented Nov 6, 2019

@ZanSara threadsafe wheels (3.6.1-2) are on test.pypi,org. Can you test a wheel before I upload them to the real pypi and break everybody's builds?

You can install from testpypi using:

$ pip install --index-url https://test.pypi.org/simple/ tables

This should install a manylinux wheel (on Linux) with hdf5 compiled with --enable-threadsafe

Please let me know if you are able.

@ZanSara
Copy link
Author

ZanSara commented Nov 6, 2019

I have a try now and let you know soon 👍

@ZanSara
Copy link
Author

ZanSara commented Nov 6, 2019

My original test code was written with Pandas, not directly with PyTables, and with it, recompiling the library solved the bug just fine. However, installing the test version of tables you linked (before installing Pandas of course) doesn't fix Pandas bug.

For reference, my test code is taken from one of the issues I reference above:

import pandas as pd
import numpy as np
from multiprocessing import Pool
import warnings

# To avoid natural name warnings
warnings.filterwarnings('ignore')

def init(hdf_store):
    global hdf_buff
    hdf_buff = hdf_store

def reader(name):
    df = hdf_buff[name]
    return (name, df)

def main():
    # Creating the store
    with pd.HDFStore('storage.h5', 'w') as store:
        for i in range(100):
            df = pd.DataFrame(np.random.rand(5,3), columns=list('abc'))
            store.append(str(i), df, index=False, expectedrows=5)
    # Reading concurrently with one connection
    with pd.HDFStore('storage.h5', 'r') as store:
        with Pool(4, initializer=init, initargs=(store,)) as p:
            ret = pd.concat(dict(p.map(reader, [str(i) for i in range(100)])))

if __name__ == '__main__':
    main()

For reference, on a vanilla version of Pandas the code crashes with one of these two error messages at random:

Traceback (most recent call last):
  File "read-concurrently-pandas.py", line 29, in <module>
    main()
  File "read-concurrently-pandas.py", line 26, in main
    ret = pd.concat(dict(p.map(reader, [str(i) for i in range(100)])))
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
KeyError: 'No object named 21 in the file'

or

tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "H5Dio.c", line 199, in H5Dread
    can't read data
  File "H5Dio.c", line 601, in H5D__read
    can't read data
  File "H5Dchunk.c", line 2201, in H5D__chunk_read
    error looking up chunk address
  File "H5Dchunk.c", line 2931, in H5D__chunk_lookup
    can't query chunk address
  File "H5Dbtree.c", line 1049, in H5D__btree_idx_get_addr
    can't get chunk info
  File "H5B.c", line 335, in H5B_find
    unable to load B-tree node
  File "H5AC.c", line 1625, in H5AC_protect
    H5C_protect() failed
  File "H5C.c", line 2362, in H5C_protect
    can't load entry
  File "H5C.c", line 6726, in H5C_load_entry
    Can't deserialize image
  File "H5Bcache.c", line 181, in H5B__cache_deserialize
    wrong B-tree signature

End of HDF5 error back trace

On the recompiled libraries instead it works fine.

I am now writing a simple test that uses tables directly to see if the problem persists . Maybe it's also a problem with my setup. I'll keep you updated.

@ZanSara
Copy link
Author

ZanSara commented Nov 7, 2019

Ok, here is the modified test (I post it so you can double-check for mistakes):

import tables
import numpy as np
from multiprocessing import Pool
import warnings

# To avoid natural name warnings
warnings.filterwarnings('ignore')

class Particle(tables.IsDescription):
    name = tables.StringCol(16)     # 16-character String
    idnumber = tables.Int64Col()    # Signed 64-bit integer

def init(hdf_store):
    global hdf_buff
    hdf_buff = hdf_store

def reader(name):
    table = hdf_buff.root.readout
    return (name, table)

def main():
    # Create test file
    with tables.open_file("storage.h5", mode="w", title="Test file") as store:
        table = store.create_table("/", 'readout', Particle, "Readout example")

        particle = table.row
        for i in range(100):
            particle['name'] = 'Particle: %6d' % (i)
            particle['idnumber'] = i * (2 ** 34)
            particle.append()
        table.flush()
    
    # Simply read the table - no problem
    with tables.open_file("storage.h5", mode="r", title="Test file") as store:
        cols = []
        init(store)
        for i in range(100):
            cols.append( reader(str(i)) ) 
        ret = np.column_stack(cols)

    # Reading concurrently with one connection - fails
    with tables.open_file("storage.h5", mode="r", title="Test file") as store:
        with Pool(4, initializer=init, initargs=(store,)) as p:
            ret = np.column_stack( dict(p.map(reader, [str(i) for i in range(100)])).values() )
            print(ret)

if __name__ == '__main__':
    main()

If you comment out the concurrent read code, the script above works fine. If you run it all, it will crash with one of these two errors at random:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "read-concurrently-tables.py", line 18, in reader
    table = hdf_buff.root.readout
  File "/home/szanzott/Projects/hdf5_compile_trials/pytables-new-wheels/old-pytables/lib/python3.6/site-packages/tables/group.py", line 836, in __getattr__
    return self._f_get_child(name)
  File "/home/szanzott/Projects/hdf5_compile_trials/pytables-new-wheels/old-pytables/lib/python3.6/site-packages/tables/group.py", line 708, in _f_get_child
    self._g_check_has_child(childname)
  File "/home/szanzott/Projects/hdf5_compile_trials/pytables-new-wheels/old-pytables/lib/python3.6/site-packages/tables/group.py", line 395, in _g_check_has_child
    % (self._v_pathname, name))
tables.exceptions.NoSuchNodeError: group ``/`` does not have a child named ``readout``
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "read-concurrently-tables.py", line 49, in <module>
    main()
  File "read-concurrently-tables.py", line 44, in main
    ret = np.column_stack( dict(p.map(reader, [str(i) for i in range(100)])).values() )
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
tables.exceptions.NoSuchNodeError: group ``/`` does not have a child named ``readout``

or

Traceback (most recent call last):
  File "read-concurrently-tables.py", line 49, in <module>
    main()
  File "read-concurrently-tables.py", line 44, in main
    ret = np.column_stack( dict(p.map(reader, [str(i) for i in range(100)])).values() )
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[('14', /readout (Table(100,)) 'Readout example'
  description := {
  "idnumber": Int64Col(shape=(), dflt=0, pos=0),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=1)}
  byteorder := 'little'
  chunkshape := (2730,)), ('15', /readout (Table(100,)) 'Readout example'
  description := {
  "idnumber": Int64Col(shape=(), dflt=0, pos=0),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=1)}
  byteorder := 'little'
  chunkshape := (2730,)), ('16', /readout (Table(100,)) 'Readout example'
  description := {
  "idnumber": Int64Col(shape=(), dflt=0, pos=0),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=1)}
  byteorder := 'little'
  chunkshape := (2730,)), ('17', /readout (Table(100,)) 'Readout example'
  description := {
  "idnumber": Int64Col(shape=(), dflt=0, pos=0),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=1)}
  byteorder := 'little'
  chunkshape := (2730,)), ('18', /readout (Table(100,)) 'Readout example'
  description := {
  "idnumber": Int64Col(shape=(), dflt=0, pos=0),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=1)}
  byteorder := 'little'
  chunkshape := (2730,)), ('19', /readout (Table(100,)) 'Readout example'
  description := {
  "idnumber": Int64Col(shape=(), dflt=0, pos=0),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=1)}
  byteorder := 'little'
  chunkshape := (2730,)), ('20', /readout (Table(100,)) 'Readout example'
  description := {
  "idnumber": Int64Col(shape=(), dflt=0, pos=0),
  "name": StringCol(itemsize=16, shape=(), dflt=b'', pos=1)}
  byteorder := 'little'
  chunkshape := (2730,))]'. Reason: 'TypeError('self.dims,self.wbuf cannot be converted to a Python object for pickling',)'

This is what happend with the current PyTables, so not with your test ones yet. Now I setup with your version and let you know.

@ZanSara
Copy link
Author

ZanSara commented Nov 7, 2019

I can confirm that the new test version does not fix the bug.
I re-tested also with my hand-made environment and the concurrent read works fine there.
If there is a way for me to help you debug this further, let me know :)

@tomkooij
Copy link
Contributor

tomkooij commented Nov 7, 2019

@ZanSara thanks this helps a lot! I will look into it and come back to you (probably asking for more help)

@tomkooij tomkooij added this to the 3.6.2 milestone Jan 19, 2020
@eriniocentric
Copy link

Hi, I am still dealing with this exact issue.

I access H5 files for read access in using python's multiprocessing on two separate environments. It works fine on one but fails in the other and I can't pinpoint why. Any tips for solving this?

@avalentino
Copy link
Member

@eriniocentric if you are using multiprocessing probably it is not an threading issue IMHO.

@eriniocentric
Copy link

Any idea what these errors are telling me. It doesn't happen when I run the command alone but happens when I use multiprocessing Pool class. I am basically doing a read_where in each thread on the same h5 file table.

Exception ignored in: 'tables.indexesextension.IndexArray._g_read_sorted_slice'
tables.exceptions.HDF5ExtError: Problems reading the array data.
Exception ignored in: 'tables.tableextension.Table._read_chunk'
tables.exceptions.HDF5ExtError: HDF5 error back trace

File "H5Dio.c", line 199, in H5Dread
can't read data
File "H5Dio.c", line 601, in H5D__read
can't read data
File "H5Dchunk.c", line 2201, in H5D__chunk_read
error looking up chunk address
File "H5Dchunk.c", line 2931, in H5D__chunk_lookup
can't query chunk address
File "H5Dbtree.c", line 1049, in H5D__btree_idx_get_addr
can't get chunk info
File "H5B.c", line 357, in H5B_find
can't lookup key in subtree
File "H5B.c", line 357, in H5B_find
can't lookup key in subtree
File "H5B.c", line 335, in H5B_find
unable to load B-tree node
File "H5AC.c", line 1625, in H5AC_protect
H5C_protect() failed
File "H5C.c", line 2362, in H5C_protect
can't load entry
File "H5C.c", line 6726, in H5C_load_entry
Can't deserialize image
File "H5Bcache.c", line 181, in H5B__cache_deserialize
wrong B-tree signature

End of HDF5 error back trace

Problems reading chunk records.

@avalentino
Copy link
Member

It is hard to say what can be the problem.
Do you open the HDF5 file in the worker process or in the main one?
Maybe the first option is safer but I'm not sure.
Also I assume that all worker processes use the HDF5 file in read-only mode, is it correct?

@muraleee
Copy link

Did --enable-threadsafe get added at all?

@avalentino
Copy link
Member

HDF5 for wheels is built using https://github.com/PyTables/PyTables/blob/master/ci/github/get_hdf5.sh
It seems that the --enable-threadsafe option is still not used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants