Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for HDF5 dimension scales with null dataspace #1226

Open
itcarroll opened this issue Dec 12, 2022 · 3 comments
Open

support for HDF5 dimension scales with null dataspace #1226

itcarroll opened this issue Dec 12, 2022 · 3 comments

Comments

@itcarroll
Copy link

itcarroll commented Dec 12, 2022

I would like to use netCDF4-python (as backend to Xarray) to read some HDF5 files, and am unable to do so. Attempting to read the files actually crashes Python. I've traced the problem to a dimension scale with null dataspace in the HDF5 files. I understand that not all HDF5 files are netCDF4 files, but I don't think they should crash Python.

And in this particular case, the HDF5 file seems perfectly interpretable. As an enhancement to netCDF4-python, you could interpret a dimension scale with null dataspace for what it is equivalent to in netCDF4, which is "a netCDF dimension but not a netCDF variable."

Here is a reproducible example of code that crashes Python. I'm not totally sure the problem isn't just a mismatch between the HDF5 libraries used, since both netCDF4-python and h5py package their own libraries. My installs built nothing from source.

% cat danger.py
from h5py import File
from netCDF4 import Dataset

with File('danger.h5', 'w') as group:
    dataset = group.create_dataset('y', shape=(3,), dtype=float)
    dimension = group.create_dataset('x', shape=None, dtype=int)   # will crash python when read below
    # dimension = group.create_dataset('x', shape=(3,), dtype=int) # creates misleading dataset
    dimension.make_scale('x')
    dataset.dims[0].attach_scale(dimension)

with Dataset('danger.h5') as group:
    print(group)
% python danger.py
Assertion failed: (ndims), function get_scale_info, file hdf5open.c, line 1396.
zsh: abort      python danger.py

Here is the complete h5dump of danger.h5 created by h5py. While it is not a netCDF4 file, I can't think of any reason netCDF4-python shouldn't interpret it correctly (as it does in the above code but using the commented line). It is a dimension that has no coordinates, which is valid in the netCDF4 model.

HDF5 "danger.h5" {
GROUP "/" {
   DATASET "x" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  NULL
      DATA {
      }
      ATTRIBUTE "CLASS" {
         DATATYPE  H5T_STRING {
            STRSIZE 16;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "DIMENSION_SCALE"
         }
      }
      ATTRIBUTE "NAME" {
         DATATYPE  H5T_STRING {
            STRSIZE 2;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "x"
         }
      }
      ATTRIBUTE "REFERENCE_LIST" {
         DATATYPE  H5T_COMPOUND {
            H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
            H5T_STD_U32LE "dimension";
         }
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): {
               DATASET 0 "/y",
               0
            }
         }
      }
   }
   DATASET "y" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
      DATA {
      (0): 0, 0, 0
      }
      ATTRIBUTE "DIMENSION_LIST" {
         DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): (DATASET 0 "/x")
         }
      }
   }
}
}

Thank you for considering! Here are my versions ...

% pip list
Package    Version
---------- -------
cftime     1.6.2
h5py       3.7.0
netCDF4    1.6.2
numpy      1.23.5
pip        22.1.2
setuptools 62.3.3
wheel      0.37.1

[notice] A new release of pip available: 22.1.2 -> 22.3.1
[notice] To update, run: pip install --upgrade pip
% python --version
Python 3.10.8
% sw_vers
ProductName:	macOS
ProductVersion:	12.6.1
BuildVersion:	21G217
@itcarroll itcarroll changed the title support for HDF5 dimension scales with an empty/null dataspace support for HDF5 dimension scales with null dataspace Dec 12, 2022
@jswhit
Copy link
Collaborator

jswhit commented Dec 12, 2022

If there is a workaround for this, it has to happen in the netcdf-c library. Can you file this as an issue at https://github.com/Unidata/netcdf-c?

@itcarroll
Copy link
Author

Thanks, @jswhit. Filed as above. Or do I need to repeat/update the description? I hesitate to without knowing C.

@itcarroll
Copy link
Author

@jswhit Any idea why there has been no comment from the Unidata team on Unidata/netcdf-c#2571?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants