New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset.compression is None when using hdf5plugin compressors #2161
Comments
And this would not solve the issue for all cases, because |
What does One simple answer could be that if there's any filter ID that we don't recognise, we return 'unknown'. Not very specific, but better than effectively saying 'no compression'. It's also not clear what we'd do if there are two or more compression filters in the pipeline, though I guess that's unlikely. Our 'gzip' name is also kind of wrong - HDF5 calls it deflate, and what's actually stored is zlib output - deflate with some different wrapper from gzip. But that's part of the API now, so we can't easily change it. |
I was looking into this same problem, so I'm pleased I don't have to create another issue. To answer @takluyver, your suggestion does indeed return information on the compression filter:
I second @ivirshup's call for this to be fixed because it confused me for a couple of days. I had assumed that there was a bug in |
What can be improved is the list of known (registered) compression filters so not just the built-ins are recognized. And will have to support multiple compression filters since there are such cases in the NASA satellite data. |
That small modification would already prevent confusion. |
Not every filter is for data compression so if a filter is unknown why it should be reported from |
There is another twist to this story, discussed in the mention above. In dealing with a NeXpy issue, a user provided a file that contained compressed data, which NeXpy was able to decompress, presumably because of
I should explain that |
My thinking on this is to make compression return 'unknown' if there's anything h5py doesn't recognise, and add a new property like (The 'name' for LZ4 is
In practice, I think every registered filter not built into HDF5 itself is either doing some kind of compression, or preparing data so that a later compression filter will be more effective - and you can argue that e.g. bitgroom+zlib together is a different compression algorithm to zlib alone. So if there's a filter we don't recognise, chances are good that it's compression. And of course 'unknown' can also mean 'unknown whether compression is in use'. 😉
I think this is a separate thing, but can you check |
@takluyver, you are right about no filter being applied.
The previous invalid filter number ValueError is what you get whenever a dataset has no applied filters. Ideally, a less misleading error message should be issued by h5py after checking |
Description
When a hdf5 dataset is written using one of the compression filters from hdf5plugin, the dataset has a
compression
attribute ofNone
.This is a feature request to change that.
Example
But, h5ls is able to see the name of the compression filter, so ideally h5py should too:
Details
This is due to the possible values for Dataset.compression being hardcoded here:
h5py/h5py/_hl/dataset.py
Lines 556 to 563 in 1487a54
It looks like there is an API for getting the names of a filter:
H5Zget_filter_info
.But I'm not sure how you would be able to tell whether a filter was a "compressor".
The complicated solution would be to let hdf5plugin register it's compressors with
h5py
. Hopefully there is a more elegant solution.Version info
Summary of the h5py configuration
h5py 3.7.0
HDF5 1.12.2
Python 3.9.12 (main, Mar 26 2022, 15:52:10)
[Clang 13.0.0 (clang-1300.0.29.30)]
sys.platform darwin
sys.maxsize 9223372036854775807
numpy 1.22.4
cython (built with) 0.29.30
numpy (built against) 1.19.3
HDF5 (built against) 1.12.2
The text was updated successfully, but these errors were encountered: