Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/COMPAT: fix assert_* functions for nested arrays with latest numpy #50396

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Dec 22, 2022

This adds tests for assert_almost_equal (used by assert_series_equal et al in case of object dtype arrays) that all pass on pandas main with released numpy (except for two cases with dict of array), and fixes the implementation of array_equivalent (used by assert_almost_equal) to return False instead of raising an error if numpy cannot compare the arrays (numpy 1.25.dev starts to raise for this instead of returning False).

I also added a bunch of equivalent tests for array_equivalent itself, but those don't all pass (and added xfails). Trying to fix those as well might need more extensive changes, and I would prefer to do that separate from this PR (to keep this one possible to backport).

See #50360 (comment) for an illustration of the behaviour that changed in numpy 1.25.dev that causes this.

@jorisvandenbossche jorisvandenbossche added Testing pandas testing functions or related to the test suite Compat pandas objects compatability with Numpy or Python functions labels Dec 22, 2022
@jorisvandenbossche jorisvandenbossche added this to the 1.5.3 milestone Dec 22, 2022
# reached in groupby aggregations, make sure we use np.any when checking
# if the comparison is truthy
left = np.array([np.array([50, 70, 90]), np.array([20, 30, 40])], dtype=object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have both same-sized and mismatched-size cases? i expect these will be non-equivalent through different paths

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally changed that here, because I assume the intent of the test was to test a numpy array of arrays. But if the arrays are the same length, the np.array(..) constructor actually converts this into a 2D array.

Below in another test, I added a case for same length (and then constructed the array with a workaround first creating an empty and then filling), see # same-length lists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a same-length subarrays case in this test as well.

@jorisvandenbossche
Copy link
Member Author

Any other comments here?

assert not array_equivalent(left, right[::-1], strict_nan=strict_nan)


@pytest.mark.xfail(reason="failing")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify my understanding, these xfails are not dependent on a future numpy version correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not, but not 100% sure by heart. In any case, there were a bunch of cases that also with current numpy already failed with a direct array_equivalent while passing with assert_almost_equal because those two take slightly different code paths.

@mroeschke mroeschke merged commit 2c994c7 into pandas-dev:main Jan 13, 2023
@mroeschke
Copy link
Member

Thanks @jbrockmendel

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 13, 2023
lithomas1 pushed a commit that referenced this pull request Jan 14, 2023
…s for nested arrays with latest numpy) (#50739)

Backport PR #50396: BUG/COMPAT: fix assert_* functions for nested arrays with latest numpy

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche jorisvandenbossche deleted the gh-50360-assert-nested-data-numpy-125 branch January 14, 2023 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants