Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

making my own netcdf4 wheel results in a broken package #1241

Open
bluppfisk opened this issue Mar 29, 2023 · 14 comments
Open

making my own netcdf4 wheel results in a broken package #1241

bluppfisk opened this issue Mar 29, 2023 · 14 comments

Comments

@bluppfisk
Copy link
Contributor

bluppfisk commented Mar 29, 2023

There is a problem with the pypi package (mentioned elsewhere, basically it spams megabytes worth of warnings). This error is circumvented by downloading the source and using pip install from inside the source folder.

However, to make things easier in my organisation, I want to build a for our target system(s) that can be installed from our internal repository without going through the hoops described above. However, the resulting package seems broken, as the netCDF4 module does not have the Dataset attribute. This is evidenced by the error:

ImportError: cannot import name 'Dataset' from 'netCDF4' (unknown location).

System:

  • netCDF4==1.6.2 and netCDF4==1.6.3
  • Debian Linux Bullseye
  • steps to reproduce: (NOTE: on some Debian systems, version 1.6.2 will faill to compile from source because there are comments in the header files. This was patched in https://github.com/Unidata/netcdf4-python/pull/1219/files and available in 1.6.3)
wget https://github.com/Unidata/netcdf4-python/archive/refs/tags/v1.6.2.tar.gz
tar -xvzf v1.6.2.tar.gz
cd netcdf4-python-1.6.2
python3 -m venv ./env && source env/bin/activate
pip install wheel
python3 setup.py bdist_wheel   # this calls build, install as well
python3 -m venv ./env && source env/bin/activate
pip3 install dist/netCDF4-1.6.2-cp310-cp310-linux_x86_64.whl
python3 -c "from netCDF4 import Dataset"  # I get an error

deactivate && rm -rf ./env
python3 -m venv ./env && source env/bin/activate
pip install .
python3 -c "from netCDF4 import Dataset"  #no error

What is it that pip install does differently from python3 -m setup.py (followed by a pip install of the wheel)?

Edit: came here after SO didn't yield any immediate answers (https://stackoverflow.com/questions/75874745/compiling-and-installing-with-pip-install-vs-python3-m-build)

@ocefpaf
Copy link
Contributor

ocefpaf commented Mar 29, 2023

What is it that pip install does differently from python3 -m setup.py (followed by a pip install of the wheel)?

Modern pip follow pep517/518 and will create a build environment specified in the pyproject.toml. One should not me calling setup.py directly like that anymore. However, if you want to build your own wheel I recommend to use the build:

python3 -m pip install build
python3 -m build --wheel . --outdir dist

Then install that wheel.

@bluppfisk
Copy link
Contributor Author

Thank you for answering!

This indeed works to some extent. It builds a wheel that does not cause the above problem. However, there still seem to be missing some things that are included in the pypi package, notably libnetcdf.so.18, as evidenced when trying to use the package:

>   from ._netCDF4 import *
E   ImportError: libnetcdf.so.18: cannot open shared object file: No such file or directory

.tox/py39/lib/python3.9/site-packages/netCDF4/__init__.py:3: ImportError

If I use the package on a system that has the packages libnetcdf-dev and libhdf5-dev installed, it works fine. But the pypi package does not require a system with those packages installed.

Note that the system that builds the custom netcdf wheel does have those packages installed as they are required for building - but they should not be required for running.

Both systems run the same debian version (docker images).

@ocefpaf
Copy link
Contributor

ocefpaf commented Mar 29, 2023

The PyPI ones have those built and bundled for you. That is why we recommend using the built wheels!

If you want to build your own wheel you will need the c-libs installed to compiled netcdf4. See https://github.com/Unidata/netcdf4-python#development-installation

@jswhit
Copy link
Collaborator

jswhit commented Mar 29, 2023

Can you be more specific about the problem you mentioned ("spams megabytes worth of warnings")? We can hopefully fix that in the pre-built wheels.

@bluppfisk
Copy link
Contributor Author

I would like to use the PyPI packages, but they spam so many warnings that they're technically unusable. I dug out a sample for @jswhit (see below).

By chance I found out that installing the source (from an unzipped release) did not exhibit this behaviour. In order to bring this improved experience to our team (but without having them install packages and compile code), I'd like to build the wheel and include the libs as well. I see now that the PyPI wheels have the libraries included. Would you be able to point me in the right direction as to how to do this for my own wheels?

Sample HDF5 errors when using the PyPI 1.6.2/3 packages (basically, this is repeated thousands of times) even for a very simple read action on an HDF file. This does not impact functionality, it just bloats the logs.

HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 1:
  #000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
    major: Attribute
    minor: Can't open object
  #001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
    major: Attribute
    minor: Can't open object
  #004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
    major: Attribute
    minor: Unable to initialize object
  #005: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeBitRoundNumberOfSignificantBits'
    major: Attribute
    minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 1:
  #000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
    major: Attribute
    minor: Can't open object
  #001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
    major: Attribute
    minor: Can't open object
  #004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
    major: Attribute
    minor: Unable to initialize object
  #005: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeBitGroomNumberOfSignificantDigits'
    major: Attribute
    minor: Object not found

@jswhit
Copy link
Collaborator

jswhit commented Mar 30, 2023

@bluppfisk what modules were imported by the script that produced these warnings? I suspect a binary incompatibility between HDF5 versions included in netCDF4 and some other package (perhaps xarray or h5py?).

@bluppfisk
Copy link
Contributor Author

@jswhit there's definitely xarray which uses netcdf4 as a backend. I also believe this problem occurs mostly in a multi-processing environment but I am not 100% sure.

@matthew-brett
Copy link
Contributor

Only to ask whether you discovered https://github.com/pypa/auditwheel for your wheels. This copies the linked libraries into the wheel, so you can install it on other compatible systems, along with the libraries.

@jswhit
Copy link
Collaborator

jswhit commented Mar 30, 2023

the multiprocessing bit may be the culprit - the netCDF4 and HDF5 libs are not thread-safe.

@bluppfisk
Copy link
Contributor Author

Only to ask whether you discovered https://github.com/pypa/auditwheel for your wheels. This copies the linked libraries into the wheel, so you can install it on other compatible systems, along with the libraries.

Thanks, this is indeed something I've found while searching for how to include libraries. But I'm not sure whether/how it will automatically decide which libraries it needs to include or whether/how to compile them.

I'm digging a bit through the travis files in this repository, but fail to understand. Too inexperienced, so appreciate any pointers.

@jswhit : it may very well be, but the beauty is that the versions installed from the source (i.e. not from pypi) do not exhibit this behaviour, so I was trying to recreate this experience (but distributed).

@jswhit
Copy link
Collaborator

jswhit commented Mar 30, 2023

agreed - it's not likely a threading issue then. How was the xarray package installed? via conda, pip, or a locally built wheel? I'm guessing it's using a different version of the HDF5 library. Simply changing the order of the imports may make it go away if this is the issue. If you can post a simple script that reproduces the warnings on your system, along with information about what versions of xarray and netcdf4-python were used and how they were installed, that would be a big help. (also what platform - windows, macosx or linux?)

@bluppfisk
Copy link
Contributor Author

I'll do so in the coming days (takes some time to extract something useful).

Answers to your questions:

  • xarray and netCDF4 are installed via pip (if I use locally built wheels, the errors disappear; same goes for the conda version)
  • versions:
xarray==2022.12.0
netCDF4==1.6.2 (but .3 exhibits the same behaviour)
  • OS: Linux (Debian bullseye)
  • netCDF4 isn't directly imported anywhere in my program; xarray does, somewhere down the line.

In the meantime, could you provide any further pointers on how to compile + include those libraries in a wheel? I tried running the ./configure script but it complains about having nothing to compile. I don't know how that github workspace is set up.

@ocefpaf
Copy link
Contributor

ocefpaf commented Mar 31, 2023

@bluppfisk are xarray and netCDF4 the only packages in your env? xarray is a pure python package, no c-lib is pulled unless you request some specific backend, like hdf5 via h5py/h5netcdf, or netcdf via netcdf4. That means the only c-libs you should have are the ones bundled in the netCDF wheel. Those are OK and work fine, there should be no conflicts. Here is an example of an env with them:

pip install xarray netcdf4 pooch
Collecting xarray
  Downloading xarray-2023.3.0-py3-none-any.whl (981 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 981.2/981.2 kB 6.2 MB/s eta 0:00:00
Collecting netcdf4
  Downloading netCDF4-1.6.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.2/5.2 MB 3.7 MB/s eta 0:00:00
Collecting numpy>=1.21
  Downloading numpy-1.24.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 13.8 MB/s eta 0:00:00
Collecting pandas<2,>=1.4
  Downloading pandas-1.5.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.0/12.0 MB 16.3 MB/s eta 0:00:00
Collecting packaging>=21.3
  Using cached packaging-23.0-py3-none-any.whl (42 kB)
Collecting cftime
  Downloading cftime-1.6.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 20.2 MB/s eta 0:00:00
Collecting python-dateutil>=2.8.1
  Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 kB 22.0 MB/s eta 0:00:00
Collecting pytz>=2020.1
  Downloading pytz-2023.3-py2.py3-none-any.whl (502 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 502.3/502.3 kB 16.3 MB/s eta 0:00:00
Collecting six>=1.5
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: pytz, six, packaging, numpy, python-dateutil, cftime, pandas, netcdf4, xarray
Successfully installed cftime-1.6.2 netcdf4-1.6.3 numpy-1.24.2 packaging-23.0 pandas-1.5.3 python-dateutil-2.8.2 pytz-2023.3 six-1.16.0 xarray-2023.3.0

and then, inside the Python interpreter:

>>> import xarray
>>> ds = xarray.tutorial.open_dataset("air_temperature", engine="netcdf4")
>>> ds
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly

As you can see I don't get a huge log output. This is also on Linux with an env that is isolated from any system libraries and using only the wheels provided on PyPI.

If you can provide means for us to reproduce your env maybe we can try to debug this further but at this point I suspect you mat have more packages installed and/or some conflicts in there.

PS: pooch is also pure Python and only required to fetch the tutorial datasets. You don't need it when testing with your local files and it does not interfere in the wrapped c-libs.

@ocefpaf
Copy link
Contributor

ocefpaf commented Mar 31, 2023

See #1242 for a way to reproduce this issue. I suggest the discuss is move to that issue to avoid mixing the problem of building your own wheel with the apparently new verbosity of HDF5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants