Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added more descriptive error message if indexing with all False bool … #1197

Closed
wants to merge 7 commits into from

Conversation

tammasloughran
Copy link

If one tries to load data using an all-False bool array, it returns an unhelpful error message that doesn't quite describe the problem.

import numpy as np
import netCDF4 as nc
ncfile = nc.Dataset('example.nc', 'r')
get_this = np.zeros(ncfile.variables['pastr'].shape[0]).astype(bool)
data =  ncfile.variables['pastr'][get_this]
Traceback (most recent call last):
  File "testing.py", line 5, in <module>
    data =  ncfile.variables['pastr'][get_this]
  File "netCDF4/_netCDF4.pyx", line 4385, in netCDF4._netCDF4.Variable.__getitem__
  File "/usr/lib/python3/dist-packages/netCDF4/utils.py", line 467, in _out_array_shape
    c = count[..., i].ravel()[0] # All elements should be identical.
IndexError: index 0 is out of bounds for axis 0 with size 0

This pull request catches the case where an invalid all-False bool array is used to load data and adds a more descriptive error message.

Cheers,
Tam

@CLAassistant
Copy link

CLAassistant commented Oct 4, 2022

CLA assistant check
All committers have signed the CLA.

@jswhit
Copy link
Collaborator

jswhit commented Oct 4, 2022

Seems like an empty array should be returned, instead of raising an error.

@tammasloughran
Copy link
Author

My experience has been that 10 times/10, doing this is erroneous, since loading nothing from a file is nonsensical. I would rather it fail fast than silently, since on the rare occasion one does want to load nothing from a file, they can exception handle the error.

@jswhit
Copy link
Collaborator

jswhit commented Oct 6, 2022

My experience has been that 10 times/10, doing this is erroneous, since loading nothing from a file is nonsensical. I would rather it fail fast than silently, since on the rare occasion one does want to load nothing from a file, they can exception handle the error.

That's OK with me, I only say that because that's why numpy does and it might be what many expect.

Copy link
Collaborator

@jswhit jswhit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like elem can still be an iterable here, in that case the individual elements of elem need to be checked to see if they are all False boolean areas.

@jswhit
Copy link
Collaborator

jswhit commented Oct 7, 2022

We are already checking for boolean index arrays (in order to convert them to integer index arrays), so how about this:

diff --git a/src/netCDF4/utils.py b/src/netCDF4/utils.py
index c96cc757..dcfeca85 100644
--- a/src/netCDF4/utils.py
+++ b/src/netCDF4/utils.py
@@ -238,6 +238,10 @@ def _StartCountStride(elem, shape, dimensions=None, grp=None, datashape=None,\
             unlim = False
         # convert boolean index to integer array.
         if np.iterable(ea) and ea.dtype.kind =='b':
+            # check that boolean array is not all False.
+            if not ea.any():
+                msg='Boolean index array is all False, at least one element must be True'
+                raise IndexError(msg)
             # check that boolean array not too long
             if not unlim and shape[i] != len(ea):
                 msg="""

Also, please add a test and a Changelog entry

@jswhit
Copy link
Collaborator

jswhit commented Oct 10, 2022

Tests are failing when an all False boolean index array is used on assignment, which should do nothing (but not raise an exception). The fix is to only do the check when put=False.

@jswhit
Copy link
Collaborator

jswhit commented Oct 10, 2022

this should fix it

diff --git a/src/netCDF4/utils.py b/src/netCDF4/utils.py
index c96cc757..dcfeca85 100644
--- a/src/netCDF4/utils.py
+++ b/src/netCDF4/utils.py
@@ -238,6 +238,10 @@ def _StartCountStride(elem, shape, dimensions=None, grp=None, datashape=None,\
             unlim = False
         # convert boolean index to integer array.
         if np.iterable(ea) and ea.dtype.kind =='b':
+            # check that boolean array is not all False when reading.
+            if not put and not ea.any():
+                msg='Boolean index array is all False, at least one element must be True'
+                raise IndexError(msg)
             # check that boolean array not too long
             if not unlim and shape[i] != len(ea):
                 msg="""

@jswhit
Copy link
Collaborator

jswhit commented Oct 11, 2022

I discovered that slicing a 1d variable with an all False boolean index array does return an empty array (consistent with numpy). An exception is raised on when slicing multi-dimensional arrays. PR #1198 ensures that empty arrays are always returned. Given that this is the current behavior for 1d vars, and that's what numpy does, I think this is the preferred solution.

@tammasloughran
Copy link
Author

There should at least be a warning.

@jswhit
Copy link
Collaborator

jswhit commented Oct 12, 2022

There should at least be a warning.

I am not convinced - curious what others think. Seems to me if you slice with an all False boolean array, you are getting back exactly what you asked for (an empty array).

@jswhit
Copy link
Collaborator

jswhit commented Oct 12, 2022

Plus the fact that this is what numpy does is a pretty big precendent.

@jswhit
Copy link
Collaborator

jswhit commented Oct 16, 2022

Decided to go with PR #1198 instead, based on the discussion at issue #1200

@jswhit jswhit closed this Oct 16, 2022
@tammasloughran
Copy link
Author

Sorry I'm so late returning to this, although it seems I'm too late.

I think we were talking about different problems. I wasn't so much concerned with whether an error should occur or something should be returned. My concern was that enough information was provided to users (both developers and application users) that potential errors aren't obscured and made difficult to diagnose. The boolean array can be determined programmatically, and users may not know exactly what they are asking for or even what's in the .nc file (e.g. when iterating over many files). In this case, a returned empty array would be propagated to far removed parts of user code. A warning would help diagnose indexing errors on that empty array that would eventually occur, and the fundamental behavior would still be consistent with numpy. If users want behavior like this to be erroneous, they can elevate warnings to errors themselves. So it's the best of both worlds.

Lastly, I don't think the warning would be a nuisance to anyone, since until now it had been an error. In fact, returning an empty array silently may break exception handling code for people that relied on this being an error in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants