New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better access to compression/filter information #2180
base: master
Are you sure you want to change the base?
Conversation
Well, it now looks like I've reimplemented part of HDF5's property list API in Python, to manage filters. That feels kind of silly, but I couldn't see a better way to allow |
Codecov ReportBase: 89.74% // Head: 89.66% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #2180 +/- ##
==========================================
- Coverage 89.74% 89.66% -0.09%
==========================================
Files 17 17
Lines 2390 2476 +86
==========================================
+ Hits 2145 2220 +75
- Misses 245 256 +11
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@takluyver thanks for this PR. Why should Another suggestion: Would something like |
As you've noted, the notion of reporting compression as a single string doesn't really work, because there can be multiple filters used - either by mistake, or filters like shuffle that are part of the compression even though they don't compress data themselves. So the compromise I picked here is to add
It can certainly work technically, but my guess is that for the high-level API, having ways to get just the names or just the IDs makes more sense. So long as the stored names are meaningful, the two should give you the same information, but the names are convenient for human understanding, whereas the IDs are more reliable for writing code (e.g. I think the filter settings can be left to the low-level API. It's not much extra code to access them: dset.id.get_create_plist().get_filter(0) |
This follows from the discussion in #2161. The changes are:
dset.compression
will return 'unknown' rather thanNone
if a plugin filter is used and none of the three built-in compression filters (gzip, lzf, szip) are included.dset.filter_ids
anddset.filter_names
give you more information on all the filters in use, compression or otherwise.dset.id.get_create_plist().get_filter(i)
group.create_dataset_like()
can now copy a filter pipeline even if it doesn't recognise the filters in it.Closes #2161