Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable created with ENUM type can't be read after reopen #1308

Open
kmuehlbauer opened this issue Jan 14, 2024 · 7 comments
Open

Variable created with ENUM type can't be read after reopen #1308

kmuehlbauer opened this issue Jan 14, 2024 · 7 comments

Comments

@kmuehlbauer
Copy link

kmuehlbauer commented Jan 14, 2024

To report a non-security related issue, please provide:

  • the version of the software with which you are encountering an issue
    netcdf4-python 1.6.5, netcdf-c '4.9.2 of Dec 10 2023 17:23:27 $'

  • environmental information (i.e. Operating System, compiler info, java version, python version, etc.)
    Linux Mint, python 3.12

  • a description of the issue with the steps needed to reproduce it
    When creating variables with enum types like the following:

    tmp_local_netcdf = "test.nc"
    enum_dict = dict(one=1, two=2, three=3, missing=255)
    enum_dict2 = dict(one=1, two=2, three=3, missing=254)
    
    with netCDF4.Dataset(tmp_local_netcdf, "w") as ds:
        ds.createDimension("enum_dim", 4)
        g1 = ds.createGroup("test1")
        g2 = ds.createGroup("test2")
        enum_type = g1.createEnumType(np.uint8, "enum_t", enum_dict)
        enum_type2 = g2.createEnumType(np.uint8, "enum_t2", enum_dict2)
    
        g1.createDimension("enum_dim", 4)    
        v = g1.createVariable(
            "enum_var1", enum_type, ("enum_dim",), fill_value=enum_dict["missing"]
        )
        v[:] = [1, 2, 255, 3]
        print(v[:])
        
        v = g1.createVariable(
            "enum_var2", enum_type2, ("enum_dim",), fill_value=enum_dict2["missing"]
        )
        v[:] = [1, 2, 254, 3]
        print(v[:])
    [1 2 -- 3]
    [1 2 -- 3]

    It seems that enum_var2 is lost when reopening the file:

    with netCDF4.Dataset(tmp_local_netcdf, "r") as ds:
      g1 = ds["test1"]
      g2 = ds["test2"]
    
      print(g1.enumtypes)
      print(g2.enumtypes)
    
      print(g1.variables["enum_var1"])
      print(g1.variables["enum_var2"])
    {'enum_t': <class 'netCDF4._netCDF4.EnumType'>: name = 'enum_t', numpy dtype = uint8, fields/values ={'one': 1, 'two': 2, 'three': 3, 'missing': 255}}
    {'enum_t2': <class 'netCDF4._netCDF4.EnumType'>: name = 'enum_t2', numpy dtype = uint8, fields/values ={'one': 1, 'two': 2, 'three': 3, 'missing': 254}}
    <class 'netCDF4._netCDF4.Variable'>
    enum enum_var1(enum_dim)
        _FillValue: 255
    enum data type: uint8
    path = /test1
    unlimited dimensions: 
    current shape = (4,)
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    Cell In[26], line 34
         31 print(g2.enumtypes)
         33 print(g1.variables["enum_var1"])
    ---> 34 print(g1.variables["enum_var2"])
    
    KeyError: 'enum_var2'

    Checking this it seems, that the variable is not recognized by netcdf-c:

    ncdump test.nc
    netcdf test {
    dimensions:
        enum_dim = 4 ;
    
    group: test1 {
      types:
        ubyte enum enum_t {one = 1, two = 2, three = 3, missing = 255} ;
      dimensions:
    	  enum_dim = 4 ;
      variables:
    	  enum_t enum_var1(enum_dim) ;
    		  enum_t enum_var1:_FillValue = missing ;
      data:
    
       enum_var1 = one, two, _, three ;
      } // group test1
    
    group: test2 {
      types:
        ubyte enum enum_t2 {one = 1, two = 2, three = 3, missing = 254} ;
      } // group test2
    }

    The variable was actually written to the file, as we can see from the h5dump:

    h5dump test.nc
    HDF5 "test.nc" {
    GROUP "/" {
       ATTRIBUTE "_NCProperties" {
          DATATYPE  H5T_STRING {
             STRSIZE 34;
             STRPAD H5T_STR_NULLTERM;
             CSET H5T_CSET_ASCII;
             CTYPE H5T_C_S1;
          }
          DATASPACE  SCALAR
          DATA {
          (0): "version=2,netcdf=4.9.2,hdf5=1.14.3"
          }
       }
       DATASET "enum_dim" {
          DATATYPE  H5T_IEEE_F32BE
          DATASPACE  SIMPLE { ( 4 ) / ( 4 ) }
          DATA {
          (0): 0, 0, 0, 0
          }
          ATTRIBUTE "CLASS" {
             DATATYPE  H5T_STRING {
                STRSIZE 16;
                STRPAD H5T_STR_NULLTERM;
                CSET H5T_CSET_ASCII;
                CTYPE H5T_C_S1;
             }
             DATASPACE  SCALAR
             DATA {
             (0): "DIMENSION_SCALE"
             }
          }
          ATTRIBUTE "NAME" {
             DATATYPE  H5T_STRING {
                STRSIZE 64;
                STRPAD H5T_STR_NULLTERM;
                CSET H5T_CSET_ASCII;
                CTYPE H5T_C_S1;
             }
             DATASPACE  SCALAR
             DATA {
             (0): "This is a netCDF dimension but not a netCDF variable.         4"
             }
          }
          ATTRIBUTE "_Netcdf4Dimid" {
             DATATYPE  H5T_STD_I32LE
             DATASPACE  SCALAR
             DATA {
             (0): 0
             }
          }
       }
       GROUP "test1" {
          DATASET "enum_dim" {
             DATATYPE  H5T_IEEE_F32BE
             DATASPACE  SIMPLE { ( 4 ) / ( 4 ) }
             DATA {
             (0): 0, 0, 0, 0
             }
             ATTRIBUTE "CLASS" {
                DATATYPE  H5T_STRING {
                   STRSIZE 16;
                   STRPAD H5T_STR_NULLTERM;
                   CSET H5T_CSET_ASCII;
                   CTYPE H5T_C_S1;
                }
                DATASPACE  SCALAR
                DATA {
                (0): "DIMENSION_SCALE"
                }
             }
             ATTRIBUTE "NAME" {
                DATATYPE  H5T_STRING {
                   STRSIZE 64;
                   STRPAD H5T_STR_NULLTERM;
                   CSET H5T_CSET_ASCII;
                   CTYPE H5T_C_S1;
                }
                DATASPACE  SCALAR
                DATA {
                (0): "This is a netCDF dimension but not a netCDF variable.         4"
                }
             }
             ATTRIBUTE "REFERENCE_LIST" {
                DATATYPE  H5T_COMPOUND {
                   H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
                   H5T_STD_U32LE "dimension";
                }
                DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
                DATA {
                (0): {
                      DATASET 94175680913824 "/test1/enum_var1",
                      0
                   },
                (1): {
                      DATASET 94175680921616 "/test1/enum_var2",
                      0
                   }
                }
             }
             ATTRIBUTE "_Netcdf4Dimid" {
                DATATYPE  H5T_STD_I32LE
                DATASPACE  SCALAR
                DATA {
                (0): 1
                }
             }
          }
          DATATYPE "enum_t" H5T_ENUM {
             H5T_STD_U8LE;
             "one"              1;
             "two"              2;
             "three"            3;
             "missing"          255;
          };
          DATASET "enum_var1" {
             DATATYPE  H5T_ENUM {
                H5T_STD_U8LE;
                "one"              1;
                "two"              2;
                "three"            3;
                "missing"          255;
             }
             DATASPACE  SIMPLE { ( 4 ) / ( 4 ) }
             DATA {
             (0): one, two, missing, three
             }
             ATTRIBUTE "DIMENSION_LIST" {
                DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT } }
                DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
                DATA {
                (0): (DATASET 94175680906768 "/test1/enum_dim")
                }
             }
             ATTRIBUTE "_FillValue" {
                DATATYPE  H5T_ENUM {
                   H5T_STD_U8LE;
                   "one"              1;
                   "two"              2;
                   "three"            3;
                   "missing"          255;
                }
                DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
                DATA {
                (0): missing
                }
             }
             ATTRIBUTE "_Netcdf4Coordinates" {
                DATATYPE  H5T_STD_I32LE
                DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
                DATA {
                (0): 1
                }
             }
          }
          DATASET "enum_var2" {
             DATATYPE  H5T_ENUM {
                H5T_STD_U8LE;
                "one"              1;
                "two"              2;
                "three"            3;
                "missing"          254;
             }
             DATASPACE  SIMPLE { ( 4 ) / ( 4 ) }
             DATA {
             (0): one, two, missing, three
             }
             ATTRIBUTE "DIMENSION_LIST" {
                DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT } }
                DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
                DATA {
                (0): (DATASET 94175680926768 "/test1/enum_dim")
                }
             }
             ATTRIBUTE "_FillValue" {
                DATATYPE  H5T_ENUM {
                   H5T_STD_U8LE;
                   "one"              1;
                   "two"              2;
                   "three"            3;
                   "missing"          254;
                }
                DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
                DATA {
                (0): missing
                }
             }
             ATTRIBUTE "_Netcdf4Coordinates" {
                DATATYPE  H5T_STD_I32LE
                DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
                DATA {
                (0): 1
                }
             }
          }
       }
       GROUP "test2" {
          DATATYPE "enum_t2" H5T_ENUM {
             H5T_STD_U8LE;
             "one"              1;
             "two"              2;
             "three"            3;
             "missing"          254;
          };
       }
    }
    }

I'm assuming that the enum type is not discovered on read as it is in the other group and the variable is silently dropped because of this.

It would be more fitting if an error would be raised when trying to create a variable with a type which is not accessible by the software.

@jswhit
Copy link
Collaborator

jswhit commented Jan 20, 2024

as you noted, the problem is occuring in netcdf-c. I suggest opening a ticket at https://github.com/Unidata/netcdf-c

@kmuehlbauer
Copy link
Author

Thanks @jswhit, would it be possible to transfer the issue to netcdf-c repo?

@jswhit
Copy link
Collaborator

jswhit commented Jan 22, 2024

@kmuehlbauer other than via copy and paste, I don't know of any way to do that.

@kmuehlbauer
Copy link
Author

@jswhit Thanks! Within the same organization issues can be transferred to another repo. See: https://docs.github.com/en/issues/tracking-your-work-with-issues/transferring-an-issue-to-another-repository. If that doesn't work for some reason, I'll copy and paste.

@jswhit
Copy link
Collaborator

jswhit commented Jan 23, 2024

@kmuehlbauer sorry but the option to transfer an issue to Unidata/netcdf-c is not available for me.

@kmuehlbauer
Copy link
Author

@jswhit Thanks for trying and no worries. I'll create a new issue over at netcdf-c.

@kmuehlbauer
Copy link
Author

I've opened a discussion over at netcdf-c Unidata/netcdf-c#2846.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants