Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some todos #1646

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Fix some todos #1646

wants to merge 7 commits into from

Conversation

dota17
Copy link
Contributor

@dota17 dota17 commented Aug 21, 2020

Fix some legacy TODOs.

CC: @takluyver @aragilar

@codecov
Copy link

codecov bot commented Aug 21, 2020

Codecov Report

Merging #1646 into master will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1646      +/-   ##
==========================================
+ Coverage   88.51%   88.52%   +0.01%     
==========================================
  Files          17       17              
  Lines        2255     2258       +3     
==========================================
+ Hits         1996     1999       +3     
  Misses        259      259              
Impacted Files Coverage Δ
h5py/_hl/group.py 96.80% <ø> (ø)
h5py/_hl/attrs.py 95.52% <100.00%> (+0.10%) ⬆️
h5py/_hl/dataset.py 93.15% <0.00%> (ø)
h5py/_hl/filters.py 92.78% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fba98fd...0bde4a2. Read the comment docs.

h5py/h5p.pyx Outdated
src_dset_name = bytes(name).decode('utf-8')
len = H5Pget_virtual_dsetname(self.id, index, name, <size_t>size+1)
if len > 0:
src_dset_name = bytes(name).decode('utf-8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when len == 0? What's returned?

Copy link
Contributor Author

@dota17 dota17 Aug 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, the docs said: Returns a non-negative value if successful; otherwise returns a negative value. Returns the length of the dataset name if successful; otherwise returns a negative value.,
so, should we do it like this: if len >= 0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is around src_dset_name possibly not being defined, and should we have a fallback value or an error or do something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should do something when an error is detected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative (error) return values are already checked and converted to Python exceptions in the autogenerated function wrappers (look in defs.pyx). So there's no need to handle them here.

Do we need a special case when there are 0 bytes? It may not make sense to have an empty string, but if that's what HDF5 gives us with no error, we can convert it to an empty Python string without introducing any ambiguity or inaccuracy.

So it's not clear to me what these three TODOs are for. Maybe we can just remove the TODOs and leave the code alone.

@aragilar
Copy link
Member

The order tracking looks good to me (though the require_group bit I'm unsure about, given the issues around require_dataset).

I'm less certain around the cython string checking, given src_dset_name is defined within an if scope (I'm not sure that case is well tested).

@dota17 dota17 requested a review from aragilar August 24, 2020 07:05
Copy link
Member

@takluyver takluyver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODOs in code can be surprisingly tricky - they often mean it's not clear how something should be done, in which case it's not easy to fix them without a fair bit of discussion.

@property
@with_phil
def track_order(self):
""" Whether dataset creation order are tracked (T/F)"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
""" Whether dataset creation order are tracked (T/F)"""
""" Whether attribute creation order is tracked (T/F)"""

@@ -518,6 +518,20 @@ def fletcher32(self):
"""Fletcher32 filter is present (T/F)"""
return 'fletcher32' in self._filters

@property
@with_phil
def track_order(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this method specifically relates to attributes, I think it would make more sense to expose it on the AttributeManager class, so you access something like ds.attrs.order_tracked.

This also conveniently makes the interface the same for datasets, groups, and named datatypes, all of which can have attributes with or without this setting.

def track_times(self):
""" Whether times associated with an object are tracked (T/F)"""
dcpl = self._dcpl
return dcpl.get_obj_track_times()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to expose this in the high-level API when there's no high-level way to access the timestamps (the low-level way is with h5o.get_info()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't the first time exposing get_obj_track_times in the high level API, it comes from here

kwupdate.setdefault('track_times', dcpl.get_obj_track_times())
so I just move it into a method, and let these properties stay at the same place.

else:
order = True
if not order == track_order:
raise TypeError("tack_order do not match (existing %s vs new %s)" % (order, track_order))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me if this kind of check makes sense. require_dataset() checks shape & dtype - the fundamental components of what a dataset is - but all of the extra parameters (including track_order) are only used if it creates a new dataset.

I think it's pragmatic for details like this to be unchecked - you can tell require_* how to create an object if necessary, but still use an existing object even if it doesn't match all your preferences. Nobody is rushing to 'fix' the checks in require_dataset(), and it's nearly a year since I suggested on #897 that we leave it as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I will leave it as-is, just remove this kind of TODO.

h5py/h5p.pyx Outdated
src_dset_name = bytes(name).decode('utf-8')
len = H5Pget_virtual_dsetname(self.id, index, name, <size_t>size+1)
if len > 0:
src_dset_name = bytes(name).decode('utf-8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative (error) return values are already checked and converted to Python exceptions in the autogenerated function wrappers (look in defs.pyx). So there's no need to handle them here.

Do we need a special case when there are 0 bytes? It may not make sense to have an empty string, but if that's what HDF5 gives us with no error, we can convert it to an empty Python string without introducing any ambiguity or inaccuracy.

So it's not clear to me what these three TODOs are for. Maybe we can just remove the TODOs and leave the code alone.

@dota17
Copy link
Contributor Author

dota17 commented Sep 11, 2020

rerun to pass static-check.

@dota17 dota17 closed this Sep 11, 2020
@dota17 dota17 reopened this Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants