Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Invalid mapping selections (point selections not currently supported with virtual datasets) #2388

Open
LIMWAER opened this issue Feb 28, 2024 · 2 comments

Comments

@LIMWAER
Copy link

LIMWAER commented Feb 28, 2024

Hello, trying to make custom VirtualSource to filter data with boolean mask.

import numpy as np
import h5py

size_full = 10
size_sparse = 5
num_files = 3


class CustomVirtualSource(h5py.VirtualSource):
    def __init__(self, path_or_dataset, name=None,
                 shape=None, dtype=None, maxshape=None, slice=None):
        super().__init__(path_or_dataset, name,
                         shape, dtype, maxshape)
        if slice:
            arr = np.array(slice)
            slice_shape = arr.shape
            self.sel = h5py._hl.selections.select(slice_shape, arr)




for n in range(1, num_files + 1):
    with h5py.File('{}.h5'.format(n), 'w') as f:
        data = np.arange(size_sparse) + 10 ** n
        f['data'] = data

layout = h5py.VirtualLayout(shape=(num_files, size_full), dtype=data.dtype)

for n in range(1, num_files + 1):
    filename = "{}.h5".format(n)
    with h5py.File(filename, 'r') as f:
        vsource = CustomVirtualSource(f['data'], slice=[True,False,True,True])
        layout[n - 1, [1,2,3,4]] = vsource

with h5py.File("VDS.h5", 'w', libver='latest') as f:
    f.create_virtual_dataset('data', layout, fillvalue=0)
    print("Virtual dataset:")
    print(f['data'][:, :])
    print(f['data'].virtual_sources())

I get this error and I understand why:

 Traceback (most recent call last):
  File "/Users/igorzolin/Library/Application Support/JetBrains/PyCharm2023.3/scratches/check_ts.py", line 33, in <module>
    layout[n - 1, [1,2,3,4]] = vsource
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/h5py/_hl/vds.py", line 180, in __setitem__
    self.dcpl.set_virtual(
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5p.pyx", line 902, in h5py.h5p.PropDCID.set_virtual
ValueError: Invalid mapping selections (point selections not currently supported with virtual datasets)

If I try initiate like this vsource = CustomVirtualSource(f['data'], mask=[1,2,3,4]) I get <h5py._hl.selections.FancySelection object at 0x1148bc310> and everything works fine:

Virtual dataset:
[[   0   11   12   13   14    0    0    0    0    0]
 [   0  101  102  103  104    0    0    0    0    0]
 [   0 1001 1002 1003 1004    0    0    0    0    0]]
[VDSmap(vspace=<h5py.h5s.SpaceID object at 0x1148b5f30>, file_name='1.h5', dset_name='/data', src_space=<h5py.h5s.SpaceID object at 0x1148b60c0>), VDSmap(vspace=<h5py.h5s.SpaceID object at 0x1148b5ee0>, file_name='2.h5', dset_name='/data', src_space=<h5py.h5s.SpaceID object at 0x1148b5bc0>), VDSmap(vspace=<h5py.h5s.SpaceID object at 0x1148b6110>, file_name='3.h5', dset_name='/data', src_space=<h5py.h5s.SpaceID object at 0x1148b6200>)]

When I try something like this vsource = CustomVirtualSource(f['data'], mask=[2,3,4]) and
layout[n - 1, [1,2,3]] = vsource
I get

Traceback (most recent call last):
  File "/Users/igorzolin/Library/Application Support/JetBrains/PyCharm2023.3/scratches/check_ts.py", line 33, in <module>
    vsource = CustomVirtualSource(f['data'], mask=[2,3,4])
  File "/Users/igorzolin/Library/Application Support/JetBrains/PyCharm2023.3/scratches/check_ts.py", line 17, in __init__
    self.sel = h5py._hl.selections.select(slice_shape, arr)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/h5py/_hl/selections.py", line 82, in select
    return selector.make_selection(args)
  File "h5py/_selector.pyx", line 282, in h5py._selector.Selector.make_selection
  File "h5py/_selector.pyx", line 212, in h5py._selector.Selector.apply_args
IndexError: Fancy indexing out of range for (0-2)

But this vsource = CustomVirtualSource(f['data'], mask=[0,2,3]) is working

Virtual dataset:
[[   0   10   12   13    0    0    0    0    0    0]
 [   0  100  102  103    0    0    0    0    0    0]
 [   0 1000 1002 1003    0    0    0    0    0    0]]

The main question is can I somehow make mask for only points I need? And also if Fancy indexing works as expected why so?

Summary of the h5py configuration
---------------------------------
h5py    3.10.0
HDF5    1.12.2
Python  3.10.7 (v3.10.7:6cc6b13308, Sep  5 2022, 14:02:52) [Clang 13.0.0 (clang-1300.0.29.30)]
sys.platform    darwin
sys.maxsize     9223372036854775807
numpy   1.23.3
cython (built with) 0.29.36
numpy (built against) 1.21.6
HDF5 (built against) 1.12.2
@LIMWAER
Copy link
Author

LIMWAER commented Feb 29, 2024

Update:
I found a way that I can get slice like this and it is working perfect vsource = h5py.VirtualSource(f['data'])[:2] but still want to be able to filter more specific

@takluyver
Copy link
Member

Fancy indexing is expressed to HDF5 as selecting multiple hyperslabs in one dataspace. I guess it allows that for virtual dataset mappings, though I don't know how efficient it will be at looking these up if you have a lot of such selections. I think it should be possible to do fancy indexing with VirtualSource objects directly, no need to create your own class.

It looks like not all your examples are with exactly the code you show (parameter mask vs. slice), so it's hard to say exactly what's going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants