Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add McIDAS AREA file reader #3489

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Add McIDAS AREA file reader #3489

wants to merge 6 commits into from

Conversation

nawendt
Copy link
Contributor

@nawendt nawendt commented Apr 26, 2024

Adds a McIDAS AREA file reader

Description Of Changes

This is a reasonably functional AREA file reader. It has been able to read any of the files I have wanted to display. That being said, there are some navigation formats and/or data sources that may not work as currently written. If there are other files that can be tested, then those capabilities could eventually be added. Another thing of note is that the reader uses rasterio to handle properly projecting the image data. Whether you want that as a dependency for MetPy is something to discuss.

As there are things to discuss and potentially features you would like add before merging, I will leave this as a draft PR for now. I can worry about writing some tests for it once we decide how best to proceed. I just saw that this was a very old feature request and I had something that I've been using with success for some time. Time to get it on the board.

Checklist

src/metpy/io/mcidas.py Fixed Show fixed Hide fixed
src/metpy/io/mcidas.py Fixed Show fixed Hide fixed
src/metpy/io/mcidas.py Fixed Show fixed Hide fixed
src/metpy/io/mcidas.py Fixed Show fixed Hide fixed
@dopplershift
Copy link
Member

I'm curious what you mean about "projecting the data"? In my mind, we should just be reading in the raster directly and presenting it as such as an xarray dataset with the correct metadata? Then clients can decide whether manual projection is necessary.

@dopplershift dopplershift added Type: Feature New functionality Area: IO Pertains to reading data labels Apr 26, 2024
@nawendt
Copy link
Contributor Author

nawendt commented Apr 26, 2024

Yeah, that makes more sense. I just always had to project things so I had that piece built in. That should be able to be taken out without much issue.

@dopplershift
Copy link
Member

dopplershift commented Apr 26, 2024

If you consider it core functionality, we can certainly try to figure out how to include it in a way that fits with the rest of MetPy's design and philosophy. Though I will say, bringing in rasterio as an optional dependency (as awesome as it is) means depending on GDAL, which sounds like a version of hell for our CI system.

@nawendt
Copy link
Contributor Author

nawendt commented Apr 26, 2024

I understand. I don't think it's core functionality. Users will have options to handle the projection of the image.

val_code = self._buffer.read_int(4, self.endian, False)
if val_code != self.directory_block.validity_code:
continue
_docs = self._buffer.read_ascii(

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable _docs is not used.
dtype = np.dtype('int32')
if self.prefmt:
dtype = dtype.newbyteorder(self.prefmt)
_cal = self._buffer.read_array(

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable _cal is not used.
self.directory_block.prefix_calibration_length,
dtype
)
_bands = self._buffer.read_array(

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable _bands is not used.
src/metpy/io/mcidas.py Fixed Show fixed Hide fixed
@nawendt
Copy link
Contributor Author

nawendt commented May 2, 2024

Things are cleaned up and will output a proper xarray.DatraArray along with lat/lon and, if applicable, projected coordinates. This will read GOES-16/17 data as well as several other data sources. I did find that it could not read the test file attached to #171 as I do not have GVAR georeferencing or calibration set up. I took a look at it and it is not straightforward to me what needs to be done. pygvar and pyarea may offer some insight, however.

@dopplershift
Copy link
Member

This is looking good. Interesting that GVAR, which is what I think I was stumbling on originally, remains a problem.

Some higher level design questions I have:

  • Would we want this to be accessible directly using xarray? i.e. mc = xr.open_dataset('my_mcidas_file.mc') ?
  • With the above, we're only able to present information from the file in the xarray/netcdf data model, so only variables, dims, and attributes--is there anything this would prohibit? Would there be any value in having a "low level" interface?
  • I'm wondering if there would be value in making this a bit more "lazy", in that opening the value only sets up the pointers necessary to get the raster. That might make it easier to open a bunch of files as a virtual dataset using xr.open_mfdataset(), but I'm not sure if that's a use case on your end?

@nawendt
Copy link
Contributor Author

nawendt commented May 10, 2024

  • Would we want this to be accessible directly using xarray? i.e. mc = xr.open_dataset('my_mcidas_file.mc') ?

I think this should be fine to do. I think the GINI example you have will probably get me most of the way there.

  • With the above, we're only able to present information from the file in the xarray/netcdf data model, so only variables, dims, and attributes--is there anything this would prohibit? Would there be any value in having a "low level" interface?

I think this could go either way. The xarray backend will use this lower-level code which, with proper organization, should also be able to be used by the user directly. That being said, GOES Level 1b and Level 2 netCDF files seem to just put lots of the metadata into attributes so we could seemingly bundle it all into one interface via xarray.

  • I'm wondering if there would be value in making this a bit more "lazy", in that opening the value only sets up the pointers necessary to get the raster. That might make it easier to open a bunch of files as a virtual dataset using xr.open_mfdataset(), but I'm not sure if that's a use case on your end?

It is not a use case for what I do. I see the xarray documentation on how to do this, but I'm not familiar with it enough to know how to go about implementing it in this case.

@dopplershift
Copy link
Member

I think this could go either way. The xarray backend will use this lower-level code which, with proper organization, should also be able to be used by the user directly. That being said, GOES Level 1b and Level 2 netCDF files seem to just put lots of the metadata into attributes so we could seemingly bundle it all into one interface via xarray.

I think I'd lean keeping xarray as the public interface for this then, just because it keeps the API surface area small and lessens thought about what we do/do not expose.

On the xarray front, I'm happy to have us pick up where you get stuck.

Certainly anything regarding lazy loading could be done later, I'm just thinking through all of the aspects of xarray-integration.

@nawendt
Copy link
Contributor Author

nawendt commented May 15, 2024

Implemented an xarray backend and things seem to work as expected. I had some question as to how to handle the CRS information. Based on what I saw with how metpy_crs was being used elsewhere, it looked like putting it as a Dataset coordinate variable made sense. Let me know if this implementation makes sense to you.

@dopplershift
Copy link
Member

dopplershift commented May 15, 2024

Thanks, this is looking pretty good. For CRS, I think it might be better for general purpose to just put the CF metadata directly in the xarray view, then we can rely on the standard metpy workflow for other netCDF datasets. Basically, this could be done by adding a grid_mapping attribute to all raster variables, with a value of e.g. "projection". Then add a projection variable (generally a scalar integer) that has all the attribute metadata that PROJ generates with to_cf(). The other benefit would allow using this backend with any CF-aware workflows even outside MetPy. We could consider making sure that the Dataset that results has effectively had ds.metpy.parse_cf() called.

One other thought I've had is to refactor _set_georeference() to avoid the massive if...else`:

def _set_georeference(self):
    yres = self.directory_block.line_resolution
    xres = self.directory_block.element_resolution
    origin_line = self.directory_block.upper_left_line_coordinate
    origin_elem = self.directory_block.upper_left_image_element

    ny = self.directory_block.image_lines
    nx = self.directory_block.data_per_line

    try:
        get_attr(self, f'_nav_{self.navigation_type.lower()}')(nx, ny, xres, yres, origin_elem, origin_line)
    except AttributeError:
        raise NotImplementedError(f'{self.navigation_type} navigation not currently supported.')

def _nav_merc(self, nx, ny, xres, yres, origin_elem, origin_line):
    lat_ts = self.navigation_block.standard_lat
    lat_ts_res = self.navigation_block.standard_lat_resolution
    lon_0 = self.navigation_block.central_lon

    # Account for area resolution and map to area coordinates
    diff_y = (self.navigation_block.equator_line - origin_line) / yres
    diff_x = (self.navigation_block.central_lon_element - origin_elem) / xres

    # Account for area resolution in dx, dy too
    dx = lat_ts_res * xres
    dy = lat_ts_res * yres
    ...

In my mind that nicely breaks up a pretty long function. Thoughts?


_ecc = self.navigation_block.sphere_eccentricity / 1e6
_semimajor_r = self.navigation_block.sphere_radius
_semiminor_r = (1 - _ecc**2) * _semimajor_r**2

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable _semiminor_r is not used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: IO Pertains to reading data Type: Feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add McIDAS Area File reader
2 participants