Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: raise an exception when adding a multi-dimensional column as an index #16360

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

neutrinoceros
Copy link
Contributor

Description

Fixes #13292

running the test without the fix gives out a warning

================================ test session starts =================================
platform darwin -- Python 3.12.2, pytest-8.1.1, pluggy-1.5.0
Matplotlib: 3.8.4
Freetype: 2.6.1

Running tests with Astropy version 7.0.0.dev41+gdcf60561fc.
Running tests in astropy/table.

Date: 2024-04-30T17:02:19

Platform: macOS-14.4.1-arm64-arm-64bit

Executable: /Users/clm/.pyenv/versions/astropy.dev/bin/python

Full Python Version: 
3.12.2 (main, Feb 22 2024, 09:53:39) [Clang 15.0.0 (clang-1500.1.0.2.5)]

encodings: sys: utf-8, locale: UTF-8, filesystem: utf-8
byteorder: little
float info: dig: 15, mant_dig: 15

Package versions: 
Numpy: 1.26.4
Scipy: 1.12.0
Matplotlib: 3.8.4
h5py: 3.11.0
Pandas: not available
PyERFA: 2.0.1.4
Cython: not available
Scikit-image: not available
asdf-astropy: not available
pyarrow: not available

Using Astropy options: remote_data: none.

CI: undefined
ARCH_ON_CI: undefined
IS_CRON: undefined

rootdir: /Users/clm/dev/astropy-project/coordinated/astropy
configfile: pyproject.toml
plugins: astropy-0.11.0, mpi-0.6, cov-5.0.0, hypothesis-6.100.1, remotedata-0.4.1, mpl-0.17.0, filter-subpackage-0.2.0, doctestplus-1.2.1, astropy-header-0.2.2, arraydiff-0.6.1, xdist-3.5.0, mock-3.14.0
collected 1198 items / 1197 deselected / 1 selected                                  
run-last-failure: rerun previous 1 failure (skipped 58 files)

astropy/table/tests/test_index.py F                                            [100%]

====================================== FAILURES ======================================
________________________________ test_masked_nd_index ________________________________

    def test_masked_nd_index():
        # see https://github.com/astropy/astropy/issues/13292
        t = Table()
        data = np.arange(0, 6)
        data_ma = np.ma.masked_inside(data, 2, 4)
        t.add_column(data_ma.reshape(3, -1), name="arr")
>       t.add_index("arr")

astropy/table/tests/test_index.py:615: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
astropy/table/table.py:1093: in add_index
    index = Index(columns, engine=engine, unique=unique)
astropy/table/index.py:95: in __init__
    row_index = Column(col.argsort(kind="stable"))
/Users/clm/.pyenv/versions/astropy.dev/lib/python3.12/site-packages/numpy/ma/core.py:5576: in argsort
    axis = _deprecate_argsort_axis(self)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

arr = <MaskedColumn name='arr' dtype='int64' shape=(2,) length=3>
  0 .. 1
-- .. --
 -- .. 5

    def _deprecate_argsort_axis(arr):
        """
        Adjust the axis passed to argsort, warning if necessary
    
        Parameters
        ----------
        arr
            The array which argsort was called on
    
        np.ma.argsort has a long-term bug where the default of the axis argument
        is wrong (gh-8701), which now must be kept for backwards compatibility.
        Thankfully, this only makes a difference when arrays are 2- or more-
        dimensional, so we only need a warning then.
        """
        if arr.ndim <= 1:
            # no warning needed - but switch to -1 anyway, to avoid surprising
            # subclasses, which are more likely to implement scalar axes.
            return -1
        else:
            # 2017-04-11, Numpy 1.13.0, gh-8701: warn on axis default
>           warnings.warn(
                "In the future the default for argsort will be axis=-1, not the "
                "current None, to match its documentation and np.argsort. "
                "Explicitly pass -1 or None to silence this warning.",
                MaskedArrayFutureWarning, stacklevel=3)
E           numpy.ma.core.MaskedArrayFutureWarning: In the future the default for argsort will be axis=-1, not the current None, to match its documentation and np.argsort. Explicitly pass -1 or None to silence this warning.

/Users/clm/.pyenv/versions/astropy.dev/lib/python3.12/site-packages/numpy/ma/core.py:107: MaskedArrayFutureWarning
============================== short test summary info ===============================
FAILED astropy/table/tests/test_index.py::test_masked_nd_index - numpy.ma.core.MaskedArrayFutureWarning: In the future the default for argsort wil...
========================= 1 failed, 1197 deselected in 0.32s =========================

Note that decorating the test with @pytest.mark.filterwarnings("ignore") allows to reproduce
the exception that was originally reported.

To my surprise, simply following the advice given by the warning resolved the bug, and did not appear to break anything else !

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@github-actions github-actions bot added the table label Apr 30, 2024
Copy link

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

Copy link

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

@neutrinoceros neutrinoceros force-pushed the table/bug/13292/nd_masked_index branch from 9d6c260 to 63d771b Compare April 30, 2024 15:10
@pllim pllim added this to the v7.0.0 milestone Apr 30, 2024
@pllim pllim added the Bug label Apr 30, 2024
@neutrinoceros neutrinoceros marked this pull request as ready for review April 30, 2024 15:39
@taldcroft
Copy link
Member

This stops the crash but I'm not sure it is creating a functional index. This snippet sets up the data like in the test (but as np.ma.array not MaskedColumn) and then runs some of the index code near the patch. The outputs don't entirely make sense to me.

In [20]: data = np.arange(0, 6)
    ...: data_ma = np.ma.masked_inside(data, 2, 4)
    ...: data_ma3 = data_ma.reshape(3, -1)

# I'm not quite sure what this is telling us, but if I understand this should
# be a 1-d array of indices from 0..2.
In [21]: data_ma3.argsort(kind="stable", axis=-1)
Out[21]: 
array([[0, 1],
       [0, 1],
       [1, 0]])

In [22]: row_index = data_ma3.argsort(kind="stable", axis=-1)
In [24]: print(data_ma3[row_index])  # ??
[[[0 1]
  [-- --]]

 [[0 1]
  [-- --]]

 [[-- --]
  [0 1]]]

@taldcroft
Copy link
Member

taldcroft commented Apr 30, 2024

Even for the unmasked case I don't think a multi-dim column can be used for the index. When I tried, the index gets created but it seems nonsensical and I could not use it via .loc. I think the fix here may be raising a useful exception for the user in that case.

@neutrinoceros
Copy link
Contributor Author

Indeed, my patch makes the behavior consistent between masked and unmasked multi-dim indexes, so if the result is nonsensical in one case, it must be the case for the other one too ! I'll admit I didn't pay attention to anything other than consistency since the report was specifically about the masked case and my brain quickly overheats from multi-dim thinking 😅 .

I think the fix here may be raising a useful exception for the user in that case.

Agreed, let's go with that !

@neutrinoceros neutrinoceros force-pushed the table/bug/13292/nd_masked_index branch from 63d771b to fee2b46 Compare May 1, 2024 07:49
@neutrinoceros neutrinoceros changed the title BUG: fix a crash when indexing a Table with a multi-dimensional masked array API: raise an exception when adding a multi-dimensional column as an index May 1, 2024
@neutrinoceros
Copy link
Contributor Author

@taldcroft do you like it now ? 😸

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add_index on multi-dimensional masked column failing with index error
3 participants