New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot convert Variable to NumPy array if it has an invalid missing value #1152
Comments
the right way to convert a netcdf variable to a numpy array is to slice it ( |
It's the same problem because that goes through the same code path: >>> import netCDF4 as nc
>>> import numpy as np
>>> f = nc.Dataset("drifter_9911352.nc")
>>> f.variables["deploy_date"]
<class 'netCDF4._netCDF4.Variable'>
float32 deploy_date(traj)
long_name: Deployment date and time
units: seconds since 1970-01-01 00:00:00 UTC
missing_value: -1.e+34f
history: From deplog.dat
unlimited dimensions:
current shape = (1,)
filling on, default _FillValue of 9.969209968386869e+36 used
>>> np.asarray(f.variables["deploy_date"])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jpivarski/miniconda3/lib/python3.9/site-packages/numpy/core/_asarray.py", line 83, in asarray
return array(a, dtype, copy=False, order=order)
File "src/netCDF4/_netCDF4.pyx", line 3947, in netCDF4._netCDF4.Variable.__array__
File "src/netCDF4/_netCDF4.pyx", line 4445, in netCDF4._netCDF4.Variable.__getitem__
File "src/netCDF4/_netCDF4.pyx", line 4514, in netCDF4._netCDF4.Variable._toma
File "src/netCDF4/_netCDF4.pyx", line 4801, in netCDF4._netCDF4.Variable._check_safecast
ValueError: could not convert string to float: '-1.e+34f'
>>> f.variables["deploy_date"][:]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "src/netCDF4/_netCDF4.pyx", line 4445, in netCDF4._netCDF4.Variable.__getitem__
File "src/netCDF4/_netCDF4.pyx", line 4514, in netCDF4._netCDF4.Variable._toma
File "src/netCDF4/_netCDF4.pyx", line 4801, in netCDF4._netCDF4.Variable._check_safecast
ValueError: could not convert string to float: '-1.e+34f' |
ah, I see - as you said in your original post the missing_value attribute is a string when it should be a float32. It needs to be the same type as the variable, or least castable to that type. |
In this case, it doesn't get cast because the trailing "f" (valid in C) is not a number format NumPy recognizes. |
right. so I'd say this is not a bug (or if it is, the right place to fix it is in numpy) |
Is there a specification that puts constraints on >>> -1.e+34f
File "<stdin>", line 1
-1.e+34f
^
SyntaxError: invalid syntax (Recognizing this format when Python doesn't even do it would be an odd thing to expect of NumPy.) If, on the other hand, the NetCDF4 spec leaves To be able to read these files at all, I had to use |
the netcdf users guide states that |
@jpivarski does setting |
Yes, it does. After setting this parameter to |
Good. I've included a fix in PR #1154 that triggers a warning that the attribute can't be cast to the variable type, instead of failing when trying to create the masked array. |
Great, that should do it, thanks! |
The
missing_value
attribute of a Variable is a string, which means that it could be incorrectly formatted (in NumPy's opinion). For example,The error happens here:
netcdf4-python/src/netCDF4/_netCDF4.pyx
Lines 4897 to 4904 in 37a4088
The first
numpy.array
cast is okay: theattname
string becomes a NumPy array of unicode characters, but the second fails:-1.e+34f
is not a validnp.float32
. I can do the same thing here:Without the trailing
f
, it would be fine:The irony is that this is correct formatting for C code, and that must be why it was entered this way. I think a library that accepts NetCDF4 files made by other languages should probably accept those languages' number formats, at least. Admittedly, that would complicate the logic of this function.
I'm using NetCDF4 version 1.5.7 from conda-forge.
The text was updated successfully, but these errors were encountered: