Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 #22878

Merged
merged 9 commits into from Dec 25, 2022

Conversation

MilesCranmer
Copy link
Contributor

@MilesCranmer MilesCranmer commented Dec 23, 2022

This fixes #22877 raised by @TonyXiang8787. The bug, introduced by #12065, results in integer overflows occurring in the following line:

outgoing_array[basic_mask] = isin_helper_ar[ar1[basic_mask] -
ar2_min]
when mixed dtype input was passed to in1d.

The fix is to simply test for these in advance of the kind='table' method being used:

        #  2. Check overflows for (ar2 - ar2_min); dtype=ar2.dtype
        range_safe_from_overflow = ar2_range <= np.iinfo(ar2.dtype).max
        #  3. Check overflows for (ar1 - ar2_min); dtype=ar1.dtype
        range_safe_from_overflow &= int(ar1_max) - int(ar2_min) <= np.iinfo(ar1.dtype).max
        range_safe_from_overflow &= int(ar1_min) - int(ar2_min) >= np.iinfo(ar1.dtype).min

I also added some unittests to evaluate this behavior.

cc @seberg

@MilesCranmer MilesCranmer changed the title [WIP] Fix integer overflow in in1d for mixed integer dtypes #22877 BUG: [WIP] Fix integer overflow in in1d for mixed integer dtypes #22877 Dec 23, 2022
@MilesCranmer
Copy link
Contributor Author

Local tests pass. Ready for review @seberg

@MilesCranmer MilesCranmer changed the title BUG: [WIP] Fix integer overflow in in1d for mixed integer dtypes #22877 BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 Dec 23, 2022
below_memory_constraint = ar2_range <= 6 * (ar1.size + ar2.size)
# 2. Check overflows for (ar2 - ar2_min); dtype=ar2.dtype
range_safe_from_overflow = ar2_range <= np.iinfo(ar2.dtype).max
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR also corrects the bounds of the overflow check. It should have really been <= in #12065, rather than <. I noticed this after adding some new tests.

@charris charris added 00 - Bug 09 - Backport-Candidate PRs tagged should be backported labels Dec 23, 2022
@charris charris added this to the 1.24.1 release milestone Dec 23, 2022
@seberg
Copy link
Member

seberg commented Dec 23, 2022

This is that last thing that would be great to have some fix for in 1.24.1 considering that it returns bad results. I expect this is good, but I need fresher eyes. But if it looks good to others: Maybe we should get it in, and I look at it again later and just follow up if I feel a different approach is better.

@MilesCranmer
Copy link
Contributor Author

MilesCranmer commented Dec 23, 2022

I'm thinking about this proposed change now and while it does fix the problem, it is more conservative than necessary. Recall we are looking for overflows in this calculation:

basic_mask = (ar1 <= ar2_max) & (ar1 >= ar2_min)
outgoing_array[basic_mask] = isin_helper_ar[ar1[basic_mask] - ar2_min]

basic_mask will trim ar1 to only the elements within the range of ar2. Thus, we only technically need to consider min(ar1_max, ar2_max), and max(ar1_min, ar2_min), in these calculations. i.e., the following:

# After masking, the range of ar1 is guaranteed to be
# within the range of ar2:
ar1_upper = min(int(ar1_max), int(ar2_max))
ar1_lower = max(int(ar1_min), int(ar2_min))

range_safe_from_overflow &= all((
    ar1_upper - int(ar2_min) <= np.iinfo(ar1.dtype).max,
    ar1_lower - int(ar2_min) >= np.iinfo(ar1.dtype).min
))

does that make sense?


Edit: pushed this change.

@charris
Copy link
Member

charris commented Dec 24, 2022

I wonder why integers larger the uint16 are not tested, are they too big?

@charris charris merged commit 235dbe1 into numpy:main Dec 25, 2022
@charris
Copy link
Member

charris commented Dec 25, 2022

Thanks @MilesCranmer. @seberg If you see anything that bothers you we can make another PR.

1 similar comment
@charris
Copy link
Member

charris commented Dec 25, 2022

Thanks @MilesCranmer. @seberg If you see anything that bothers you we can make another PR.

charris pushed a commit to charris/numpy that referenced this pull request Dec 25, 2022
numpy#22878)

* TST: Mixed integer types for in1d

* BUG: Fix mixed dtype overflows for in1d (numpy#22877)

* BUG: Type conversion for integer overflow check

* MAINT: Fix linting issues in in1d

* MAINT: ar1 overflow check only for non-empty array

* MAINT: Expand bounds of overflow check

* TST: Fix integer overflow in mixed boolean test

* TST: Include test for overflow on mixed dtypes

* MAINT: Less conservative overflow checks
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Dec 25, 2022
@charris charris removed this from the 1.24.1 release milestone Dec 25, 2022
@MilesCranmer MilesCranmer deleted the isin-fix-dtype branch December 25, 2022 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: numpy.isin does not function correctly with two arrays with different integer type
3 participants