Numpy check #221

hperrot · 2019-12-27T15:56:57Z

This pull request makes it fail loudly if labels are set as lists and not numpy arrays or memmaps.
This addresses the warning "Labels must be dict s of numpy arrays and not list s! Otherwise many operations do not work and result in incomprehensible errors." from the dataset_mixin docstring.

The way this is done is by casting labels in the labels dict to a LabelsDict in the DatasetMixin@labels.setter. The LabelsDict is a subclass of dict and asserts in the setitem function that all new leaf nodes are of type np.ndarray or np.memmap.

Asserting this in the DatasetMixin@labels.setter would not be enough since the labels dict is often updated and not set as a whole. When updating the labels dict, the DatasetMixin@labels.setter is not executed anymore and could not assert this then.

One drawback of this method is that this assertion is also not executed if the labels dict itself is nested and contains other mutable objects like dicts or lists and these are updated somehow.

… memmaps This makes setting labels as lists fail loudly

hperrot · 2019-12-27T16:34:13Z

There is an issue with coveralls, where it is incompatible with coverage 5.0+
TheKevJames/coveralls-python#203
This is why the test don't complete successfully.

pesser · 2019-12-28T19:34:25Z

Cool, I really like the idea of giving dynamic feedback if something is used in unexpected ways. But I have reservations about the idea of using (and introducing) a special class for labels. @jhaux also tried something along this line and ultimately we decided against merging it. A new class always adds a lot of (perceived) complexity and can be off-putting. If we need a special type, we need it but let's think about alternatives and what @jhaux says, since he knows the requirements for the labels best.

I wonder if we could just add a functional test somewhere to make sure labels look as expected. For example, run check_dataset(dataset) after instantiating the dataset from the config. This would avoid introducing another class and could be more flexible to handle nested things which is not possible with the proposed solution.

hperrot added 3 commits December 27, 2019 16:37

Cast labels to LabelsDict which asserts all leafs are numpy arrays or…

4c35280

… memmaps This makes setting labels as lists fail loudly

Make edsetup test work by setting labels as dict and formatting

0d3d39c

formatting

af857ea

Pin coverage < 5.0 for compatability with coveralls

5f7557f

hperrot closed this Dec 29, 2019

pesser mentioned this pull request Dec 29, 2019

minor improvements #218

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numpy check #221

Numpy check #221

hperrot commented Dec 27, 2019

hperrot commented Dec 27, 2019

pesser commented Dec 28, 2019

Numpy check #221

Numpy check #221

Conversation

hperrot commented Dec 27, 2019

hperrot commented Dec 27, 2019

pesser commented Dec 28, 2019