Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpectedly high memory usage opening netCDF4 file with many variables #1235

Open
dougiesquire opened this issue Feb 22, 2023 · 2 comments
Open

Comments

@dougiesquire
Copy link

Version : netCDF4-python 1.6.0
OS: Linux
Python version: 3.9.15

I have a set of netCDF4 files that use substantially more memory to open than expected. I’ve included a reduced-size version of one of these files in a public repo here: https://github.com/dougiesquire/um_output_memory/blob/main/cj877a.pm000101_mon.1x1.nc4

That file is 1.5 MB on disk, but uses something like 20 MB of memory to open a single variable:

Screenshot 2023-02-22 at 4 10 27 pm

Because of this issue, I am unable to open and concatenate many such files.

I’d really appreciate any help understanding/debugging/fixing what the issue is here. In the repo linked above, there's also a notebook showing examples of the high memory usage when opening the reduced-size example file using netCDF4-python.

Some things to note

  • Converting these files to NETCDF3 seems to fix the issue - the above code block with a NETCDF3 version of the same file uses ~1MB of memory.
  • Interestingly, the memory footprint is essentially the same for the reduced-size files included in the above repo as for the original full-size files. The reduced-size files include only one spatial grid point, whereas the full size files include 27,648. It's almost like it's the metadata that is responsible for the large memory footprint…?
  • These files contains 250 variables. I've never worked with NetCDF files containing this many variables - is the problem related to this perhaps?
  • These files have filling off. Out of desperation, I’ve tried recreating the data with filling on but that didn’t help.
  • Opening these files with h5netcdf uses less memory, but takes a prohibitively long time.
@jswhit
Copy link
Collaborator

jswhit commented Feb 23, 2023

netcdf4-python wraps the netcdf-c library, which in turn uses the HDF5 c library. I don't believe the large memory usage (which I was able to reproduce) is related to the python interface. Since you noted that using NETCDF3 fixes it, it's probably related to HDF5. I'm sorry but I don't have any suggestions for addressing this - perhaps you could get help on the netcdf-c issue tracker.

@dougiesquire
Copy link
Author

Thanks @jswhit, and thanks too for confirming you can reproduce the issue. I'll try to open something with netcdf-c as you suggest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants