New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance streaming object references #2395
Comments
I think what you reported here is a known issue for any HDF5 file created with default libhdf5 settings and then copied into an object store. Can you report here the output of the |
Here's the output from
|
I see two options:
Suggest to use the libhdf5's ros3 driver to avoid mismatch with the fsspec's default request block size for now. Below are h5py open statements that should hopefully show improved performance: For # 1 above: h5py.File(SMALL_HDF5_URL, mode='r', driver='ros3') For # 2: h5py.File(SMALL_HDF5_URL, mode='r', driver='ros3', page_buf_size=67_108_864)
I assumed access to the HDF5 file does not require S3 authentication. If not, then the above commands will require additional keywords. |
Thanks a lot for your advice! I'm trying to follow # 1, but struggling to get h5py 3.10.0 working with libhdf5-1.14.3 on Ubuntu (fails on import, but I'm probably doing something wrong):
Will update when I can actually test the suggestions. |
@bjhardcastle Any updates? |
When using fsspec to stream hdf5 files with object references, object de-referencing seems to read more data than is necessary:
output:
This becomes a problem for the large file URL. It would be preferable to get the location that the object points to and use it directly rather than de-reference, but it seems impossible to get the location without reading the entirety of the de-referenced data.
I thought
get_name()
might help:h5py/h5py/h5r.pyx
Lines 132 to 147 in d051d24
but it seems read the same amount of data.
I'm curious why
H5Rget_name()
(which apparently returns the length of the name) is used instead ofH5Rget_name_string()
https://docs.hdfgroup.org/hdf5/v1_14/group___j_h5_r.html#ga48c4d6cb9e011af084d3c8088b121ac5 - but I can't see the source code.
The text was updated successfully, but these errors were encountered: