Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to control the link access/creation property lists from the high level interface #2257

Open
cderemble opened this issue May 1, 2023 · 8 comments · May be fixed by #2258
Open

Comments

@cderemble
Copy link
Contributor

Currently, link access/creation property lists are global instances created when loading the library via default_lapl / default_lcpl.
It would be cool if individual groups could have the possibility to override these properties. Ideally, these properties would be passed on to groups returned by __getitem__

One possible implementation would be to be to override the properties _lapl and _lcpl in Group : they would default to the global property lists and use the ones optionally provided in the constructor.
The __init__ function of Group and File would then need to be modified by adding 2 optional arguments: lapl and lcpl. The rest of the library should not be affected. By default, the behavior of the library should be unchanged.

@cderemble
Copy link
Contributor Author

possible implementation in #2258

@takluyver
Copy link
Member

Thanks! Could you remind me why you're interested in this? I have a vague memory that we discussed it already, but I don't remember where.

HDF5's property lists are sort of like a workaround for C not having keyword arguments or optional arguments. So the h5py high-level API has tended to expose them as Python keyword arguments rather than expecting the user to create and pass around property list wrappers directly. (Of course, if you use the low-level API, you will be working with property list objects). I'd be interested to work out if we can stick to that pattern here.

@cderemble
Copy link
Contributor Author

cderemble commented May 18, 2023

the problem is the default lapl has read/write access mode set. so even if you open a file with mode="r", if you access an external link, the file pointed to by this link will be opened with mode read/write (if file permissions are allowing it).
you will then see the linked file being modified even if you opened the master file in readonly mode, which is quite unexpected if you dont know about the default lapl.

@aragilar
Copy link
Member

It would seem reasonable to me to have externally referenced files default to whatever mode closest matches the the mode of the main one (with some option to override the mode separately at the high level).

@takluyver
Copy link
Member

I agree that that makes sense, but I'm also confused - the HDF5 docs suggest that this should already be the default:

The library will normally use the file access flag used to open the parent file as the file access flag for the target file.

And h5py doesn't seem to override that:

h5py/h5py/_hl/base.py

Lines 129 to 135 in 06d9d0d

def default_lapl():
""" Default link access property list """
lapl = h5p.create(h5p.LINK_ACCESS)
fapl = h5p.create(h5p.FILE_ACCESS)
fapl.set_fclose_degree(h5f.CLOSE_STRONG)
lapl.set_elink_fapl(fapl)
return lapl

Am I missing somewhere that we're setting that to something different? Or is this a bug?

(I'm more and more convinced that making references to separate files transparent, as external links, external datasets or virtual datasets, is a mistake. 🤔 )

@takluyver
Copy link
Member

At least for me, following an external link from a read-only file seems to do the right thing (the second file is also read-only):

In [3]: f1 = h5py.File('foo.h5', 'r')

In [4]: ds = f1['extlink']

In [5]: ds
Out[5]: <HDF5 dataset "a": shape (10, 10), type "<f4">

In [6]: ds.file
Out[6]: <HDF5 file "foo2.h5" (mode r)>

@cderemble
Copy link
Contributor Author

cderemble commented May 20, 2023

My bad: I got it wrong, let me reformulate.

The problem is when you want to modify a file that contains external links to other files that you know wont be modified. For example, let's copy a dataset from an external file via a link:

f = h5py/File('foo.h5', mode='w')
f['ext_data_set'] = h5py.ExternalLink('foo2.h5', 'data_set')
f.create_dataset('new_data_set', data=f['ext_data_set'])

In that example, the file foo2.h5 will be modified on disk even if its data was only read and not written. This is quite annoying when you want to track the last modification time of your files.
Ideally, you are able to specify the opening mode of the external links when opening a new file, like this:

f = h5py/File('foo.h5', mode='w', open_ext_links_in_readonly_mode=True)
f['ext_data_set'] = h5py.ExternalLink('foo2.h5', 'data_set')
f.create_dataset('new_data_set', data=f['ext_data_set'])

Currently, one way to achieve this is to make the file foo2.h5 readonly on disk, which might help in some situations but not all.

@takluyver
Copy link
Member

Thanks, that makes more sense now. 🙂

f = h5py/File('foo.h5', mode='w', open_ext_links_in_readonly_mode=True)

This looks nicer as a high-level API than the rather cryptic lapl, even if it's less powerful. I might shorten the parameter name a bit to something like open_extlinks_readonly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants