You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am excited to use the new NetCDFDataSet class with xarray. My (ultrasound) data isn't an exact fit for NetCDF because it can be complex (I/Q data), but . In xarray, I use the workarounds:
ds.to_netcdf(
"data/intermediate_data_iq.h5",
# Needed when saving complex values, which are not supported in netCDF4 subset of HDF5
invalid_netcdf=True,
engine="h5netcdf",
)
This works pretty well for me for saving HDF5 that is close-but-not-exactly NetCDF.
Why is this change important to you? How would you use it? How can it benefit other users?
Ultrasound data has a lot of associated coordinates/metadata (e.g. image physical location) that I find helpful to organize with xarray . This change would enable me to fully use Kedro for a data processing pipeline.
This would also benefit other users who want to use netCDF version 4, because the current bytes buffer approach only supports the scipy engine and therefore NETCDF3_64BIT
File-like objects are only supported by the scipy engine. If no path is provided, this function returns the resulting netCDF file as bytes; in this case, we need to use scipy, which does not support netCDF version 4 (the default format becomes NETCDF3_64BIT).
the way we usually compensate for this is by copying from the fsspec location to a temporary file.
Possible Alternatives
(Optional) Describe any alternative solutions or features you've considered.
Complex numbers can be represented in NetCDF as an extra real/imaginary dimension. This is a workaround but it would be nice to work with complex data natively.
Use the HDF5Dataset. The Dataset/coordinate/metadata management of xarray is nice, and it would be great to use that in our pipelines.
Thanks for the help!
The text was updated successfully, but these errors were encountered:
#631)
* Change NetCDFDataset to use a temporary file for remote filesystems, to allow other to_netcdf engines
* Update unit test to include save engine for NetCDFDataset
* Fix unit-test error where folder was accessed before being created
Signed-off-by: Charles Guan <3221512+charlesincharge@users.noreply.github.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Description
I am excited to use the new NetCDFDataSet class with
xarray
. My (ultrasound) data isn't an exact fit for NetCDF because it can be complex (I/Q data), but . Inxarray
, I use the workarounds:This works pretty well for me for saving HDF5 that is close-but-not-exactly NetCDF.
However, the current Kedro implementation of saving to a bytes_buffer doesn't work with the
h5netcdf
ornetcdf4
engines.Context
Ultrasound data has a lot of associated coordinates/metadata (e.g. image physical location) that I find helpful to organize with
xarray
. This change would enable me to fully use Kedro for a data processing pipeline.This would also benefit other users who want to use
netCDF version 4
, because the current bytes buffer approach only supports thescipy
engine and thereforeNETCDF3_64BIT
Possible Implementation
@astrojuanlu suggested:
Possible Alternatives
HDF5Dataset
. The Dataset/coordinate/metadata management ofxarray
is nice, and it would be great to use that in our pipelines.Thanks for the help!
The text was updated successfully, but these errors were encountered: