Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Projecting numpy masked arrays returns plain numpy arrays #1102

Closed
paultcochrane opened this issue Jul 9, 2022 · 6 comments · Fixed by #1103
Closed

Projecting numpy masked arrays returns plain numpy arrays #1102

paultcochrane opened this issue Jul 9, 2022 · 6 comments · Fixed by #1103
Labels

Comments

@paultcochrane
Copy link
Contributor

While upgrading an older system from pyproj 2.6.1 to the 3.x series I found that when projecting arrays of coordinates (as a numpy masked array) that in pyproj 2.6.1 masked arrays of the projected coordinates were returned, however as of the 3.x series the arrays are no longer masked (i.e. "plain" numpy arrays). I'm not sure if this an intentional change or if this is a regression of some kind. I couldn't find a GH issue relating to this problem, so I thought I'd let you know about it. I bisected the issue and found that commit where the problem first appears is 4ab3ff7.

If you require any more information (beyond that mentioned below), please just let me know and I'll be more than happy to provide it.

Code Sample

Here's a sample piece of code which illustrates the problem (referred to as test_projected_masked_arrays.py further below):

# -*- coding: utf-8 -*-

import numpy as np
from pyproj import Proj
from unittest import TestCase


class TestProjectMaskedArrays(TestCase):
    def test_projected_masked_array_is_masked(self):
        lat = np.ma.array(data=[30, 35, 40, 45], mask=[0, 0, 1, 0])
        lon = np.ma.array(data=[0, 5, 10, 15], mask=[0, 0, 1, 0])

        proj = Proj('+ellps=WGS84 +proj=stere +lat_0=75.0 '
                    '+lon_0=-14.0 +x_0=0.0 +y_0=0.0 +no_defs')

        x, y = proj(lon, lat)

        self.assertTrue(hasattr(x, "mask"))
        self.assertTrue(hasattr(y, "mask"))

# vim: expandtab shiftwidth=4 softtabstop=4

Setting up a virtual environment and installing base packages (assuming a Debian-bullseye system with Python 3.9):

$ virtualenv --python=/usr/bin/python3 venv
$ source venv/bin/activate
$ pip install numpy==1.21.6 pytest==7.1.2
$ pip install pyproj==2.6.1 && pytest test_projected_masked_arrays.py  # passes
$ pip install pyproj==3.0.0 && pytest test_projected_masked_arrays.py  # fails
# last good commit
$ pip install --force-reinstall git+https://github.com/pyproj4/pyproj.git@8eb145e13ba8133ba624fa2cc1cbc0a0733d69c0 && pytest test_projected_masked_arrays.py  # passes
# first bad commit
pip install --force-reinstall git+https://github.com/pyproj4/pyproj.git@4ab3ff7cf2e3ff089509b921f103dd2fe57ddfda && pytest test_projected_masked_arrays.py  # fails

Problem description

My expectation is that the masked array behaviour from the 2.x pyproj series would continue (this could be an incorrect expectation; I'm not sure!). Specifically, i expect that projecting masked arrays would return masked arrays.

Expected Output

I would expect that after executing

x, y = proj(lon, lat)

where lon and lat are numpy masked arrays, that x and y would have the mask attribute. I.e. that both hasattr(x, "mask") and hasattr(y, "mask") return True.

Environment Information

  • Output from: python -m pyproj -v
pyproj info:
    pyproj: 3.0.dev0
      PROJ: 7.2.1
  data dir: /usr/share/proj

System:
    python: 3.9.2 (default, Feb 28 2021, 17:03:44)  [GCC 10.2.1 20210110]
executable: /venv/bin/python
   machine: Linux-5.10.0-0.bpo.8-amd64-x86_64-with-glibc2.31

Python deps:
       pip: 20.3.4
setuptools: 44.1.1
    Cython: None
  • PROJ version (python -c "import pyproj; print(pyproj.proj_version_str)")
7.2.1
  • PROJ data directory (python -c "import pyproj; print(pyproj.datadir.get_data_dir())")
/usr/share/proj
  • Python version (python -c "import sys; print(sys.version.replace('\n', ' '))")
3.9.2 (default, Feb 28 2021, 17:03:44)  [GCC 10.2.1 20210110]
  • Operation System Information (python -c "import platform; print(platform.platform())")
Linux-5.10.0-0.bpo.8-amd64-x86_64-with-glibc2.31

Installation method

Via pip in a virtual environment.

@snowman2
Copy link
Member

snowman2 commented Jul 9, 2022

The fix is likely to check for the mask attribute here:

if hasattr(xxx, "__array__") and callable(xxx.__array__):

@snowman2 snowman2 added this to To do in 3.4.0 Release via automation Jul 9, 2022
@snowman2
Copy link
Member

snowman2 commented Jul 9, 2022

A PR with the fix is welcome.

@paultcochrane
Copy link
Contributor Author

I'm not that familiar with the codebase, but I'll give it a go!

@snowman2
Copy link
Member

snowman2 commented Jul 9, 2022

Thanks 👍. I recommend adding a test with a masked array in this file: https://github.com/pyproj4/pyproj/blob/main/test/test_utils.py

@paultcochrane
Copy link
Contributor Author

After a bit more hunting I found that looking for the mask attribute might not be clear enough: it seems that using isinstance() and comparing the output array from _copytobuffer() with numpy.ma.MaskedArray is a clearer description of the expectation.

For instance, with pyproj 2.6.1 it's possible to show that _copytobuffer() returns a masked array from being given a masked array as argument:

from pyproj.utils import _copytobuffer
import numpy

in_arr = numpy.ma.array([1])
out_arr = _copytobuffer(in_arr)

isinstance(out_arr[0], numpy.ma.MaskedArray)  # => True

In pyproj 3.x the isinstance() call returns False.

To cut a long story short: I'll add a test along these lines :-)

@paultcochrane
Copy link
Contributor Author

Interesting extra piece of information: a pandas.Series also has a mask attribute. It seems that hardmask or sharedmask is an attribute specific to numpy masked arrays.

paultcochrane added a commit to paultcochrane/pyproj that referenced this issue Jul 10, 2022
As noted in pyproj4#1102, projecting numpy masked arrays returned numpy masked
arrays in the 2.x series of pyproj.  This behaviour changed in commit
4ab3ff7, where a "plain" numpy ndarray was returned.  The change
implemented here ensures that projecting numpy masked arrays returns
masked arrays as was previously the case.
paultcochrane added a commit to paultcochrane/pyproj that referenced this issue Jul 10, 2022
As noted in pyproj4#1102, projecting numpy masked arrays returned numpy masked
arrays in the 2.x series of pyproj.  This behaviour changed in commit
4ab3ff7, where a "plain" numpy ndarray was returned.  The change
implemented here ensures that projecting numpy masked arrays returns
masked arrays as was previously the case.
paultcochrane added a commit to paultcochrane/pyproj that referenced this issue Jul 10, 2022
As noted in pyproj4#1102, projecting numpy masked arrays returned numpy masked
arrays in the 2.x series of pyproj.  This behaviour changed in commit
4ab3ff7 as part of the 3.x series, where a "plain" numpy ndarray was
returned.  The change implemented here ensures that projecting numpy
masked arrays returns masked arrays as was previously the case.
paultcochrane added a commit to paultcochrane/pyproj that referenced this issue Jul 11, 2022
As noted in pyproj4#1102, projecting numpy masked arrays returned numpy masked
arrays in the 2.x series of pyproj.  This behaviour changed in commit
4ab3ff7 as part of the 3.x series, where a "plain" numpy ndarray was
returned.  The change implemented here ensures that projecting numpy
masked arrays returns masked arrays as was previously the case.
3.4.0 Release automation moved this from To do to Done Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants