Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured MaskedColumn do not display the mask correctly #16387

Open
mhvk opened this issue May 5, 2024 · 8 comments
Open

Structured MaskedColumn do not display the mask correctly #16387

mhvk opened this issue May 5, 2024 · 8 comments

Comments

@mhvk
Copy link
Contributor

mhvk commented May 5, 2024

Description

As noted by @taldcroft in #16380, a structured MaskedColumn does not look quite right (see example below). It likely means adjusting the default format for columns.

Expected behavior

The mask on individual parts of the structure should be shown, with unmasked elements properly displayed.

How to Reproduce

import numpy as np
from astropy.table import MaskedColumn
a = np.array([(1, 2), (3, 4)], dtype="i,i")
mc = MaskedColumn(a, mask=[(True, False), (False, False)])
mc
# <MaskedColumn dtype='(int32, int32)' length=2>
#     --
# (3, 4)
mc.data
# masked_array(data=[(--, 2), (3, 4)],
#              mask=[( True, False), (False, False)],
#        fill_value=(999999, 999999),
#             dtype=[('f0', '<i4'), ('f1', '<i4')])

Versions

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import astropy; print("astropy", astropy.__version__)
import numpy; print("Numpy", numpy.__version__)
import erfa; print("pyerfa", erfa.__version__)
try:
    import scipy
    print("Scipy", scipy.__version__)
except ImportError:
    print("Scipy not installed")
try:
    import matplotlib
    print("Matplotlib", matplotlib.__version__)
except ImportError:
    print("Matplotlib not installed")
# Copy the result here
@neutrinoceros
Copy link
Contributor

I'm not sure the issue is a with the column repr alone; digging a little bit in that direction I find that there's already a problem with how column elements are represented, and it seems that the problem with columns is merely a symptom of it

>>> mc[0].data
array(0.)
>>> mc[1].data
array((3, 4), dtype=[('f0', '<i4'), ('f1', '<i4')])

I'll take the rest of the day off, but I'm game to take a closer look tomorrow.

@neutrinoceros
Copy link
Contributor

neutrinoceros commented May 6, 2024

It might actually go even deeper. Internally, MaskedColumn.__getitem__ retrieves the element as

>>> np.ma.MaskedArray.__getitem__(mc, 0).data
array(0.)

see

value = MaskedArray.__getitem__(self, item)

I'm starting to suspect this might be a numpy bug.

@neutrinoceros
Copy link
Contributor

I'm starting to suspect this might be a numpy bug.

but then again, maybe not:

>>> import numpy as np
>>> a = np.array([(1, 2), (3, 4)], dtype="i,i")
>>> ma = np.ma.masked_array(a, mask=[(True, False), (False, False)])
>>> ma[0]
(--, 2)

@neutrinoceros
Copy link
Contributor

The following patch in numpy seems to resolve the problem

diff --git a/numpy/ma/core.py b/numpy/ma/core.py
index 89b4f0703..7f1039267 100644
--- a/numpy/ma/core.py
+++ b/numpy/ma/core.py
@@ -3299,7 +3299,7 @@ def _scalar_heuristic(arr, elem):
         # Did we extract a single item?
         if scalar_expected:
             # A record
-            if isinstance(dout, np.void):
+            if isinstance(dout, (np.void, mvoid)):
                 # We should always re-cast to mvoid, otherwise users can
                 # change masks on rows that already have masked values, but not
                 # on rows that have no masked values, which is inconsistent.

however I'm a little hesitant to upstream this without a deeper understanding. In particular, I don't know how to illustrate the issue without using astropy, I cannot yet write a pure-numpy test for it.

@neutrinoceros
Copy link
Contributor

@mhvk @taldcroft what do you think ? Should I look for a pure numpy reproducer and report upstream or do you still think it's a problem on our side ?

@neutrinoceros
Copy link
Contributor

neutrinoceros commented May 8, 2024

It looks like np.ma.mvoid is pretty inconsistent with how masked/unmasked data is displayed

>>> import numpy as np
>>> mask = [(True, False), (False, False)]
>>> mv = np.ma.mvoid([(1, 2), (3, 4)], dtype="i,i")
>>> ma = np.ma.masked_array(mv, mask=mask, fill_value=-1)
>>> print(ma)
[(--, 2) (3, 4)]
>>> print(ma[0])
0.0
>>> print(ma[0].data)
--
>>> print(ma[1])
(3, 4)
>>> print(ma[1].data)
<memory at 0x105446980>

Unfortunalely this is not a reproducer for this issue (and is not solved by my patch), because it uses np.ma.moid.__getitem__, whereas the reproducer we have for astropy uses np.ma.MaskedArray.__getitem__, but it seems connected.
I'm still looking for a way to re-create the original problem without astropy.

@taldcroft
Copy link
Member

@neutrinoceros - what numpy version is this? is anything magically better in numpy 2.0?

@neutrinoceros
Copy link
Contributor

It's not, I tried numpy 1.26 and 2.0.0rc1 and they behave exactly the same here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants