Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy check #221

Closed
wants to merge 4 commits into from
Closed

Numpy check #221

wants to merge 4 commits into from

Conversation

hperrot
Copy link
Collaborator

@hperrot hperrot commented Dec 27, 2019

This pull request makes it fail loudly if labels are set as lists and not numpy arrays or memmaps.
This addresses the warning "Labels must be dict s of numpy arrays and not list s! Otherwise many operations do not work and result in incomprehensible errors." from the dataset_mixin docstring.

The way this is done is by casting labels in the labels dict to a LabelsDict in the DatasetMixin@labels.setter. The LabelsDict is a subclass of dict and asserts in the setitem function that all new leaf nodes are of type np.ndarray or np.memmap.

Asserting this in the DatasetMixin@labels.setter would not be enough since the labels dict is often updated and not set as a whole. When updating the labels dict, the DatasetMixin@labels.setter is not executed anymore and could not assert this then.

One drawback of this method is that this assertion is also not executed if the labels dict itself is nested and contains other mutable objects like dicts or lists and these are updated somehow.

@hperrot
Copy link
Collaborator Author

hperrot commented Dec 27, 2019

There is an issue with coveralls, where it is incompatible with coverage 5.0+
TheKevJames/coveralls-python#203
This is why the test don't complete successfully.

@pesser
Copy link
Owner

pesser commented Dec 28, 2019

Cool, I really like the idea of giving dynamic feedback if something is used in unexpected ways. But I have reservations about the idea of using (and introducing) a special class for labels. @jhaux also tried something along this line and ultimately we decided against merging it. A new class always adds a lot of (perceived) complexity and can be off-putting. If we need a special type, we need it but let's think about alternatives and what @jhaux says, since he knows the requirements for the labels best.

I wonder if we could just add a functional test somewhere to make sure labels look as expected. For example, run check_dataset(dataset) after instantiating the dataset from the config. This would avoid introducing another class and could be more flexible to handle nested things which is not possible with the proposed solution.

@hperrot hperrot closed this Dec 29, 2019
@pesser pesser mentioned this pull request Dec 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants