BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 #22878

MilesCranmer · 2022-12-23T16:29:32Z

This fixes #22877 raised by @TonyXiang8787. The bug, introduced by #12065, results in integer overflows occurring in the following line:

numpy/numpy/lib/arraysetops.py

Lines 683 to 684 in 2b9851b

    
           outgoing_array[basic_mask] = isin_helper_ar[ar1[basic_mask] - 
        
                                                       ar2_min]

when mixed dtype input was passed to in1d.

The fix is to simply test for these in advance of the kind='table' method being used:

        #  2. Check overflows for (ar2 - ar2_min); dtype=ar2.dtype
        range_safe_from_overflow = ar2_range <= np.iinfo(ar2.dtype).max
        #  3. Check overflows for (ar1 - ar2_min); dtype=ar1.dtype
        range_safe_from_overflow &= int(ar1_max) - int(ar2_min) <= np.iinfo(ar1.dtype).max
        range_safe_from_overflow &= int(ar1_min) - int(ar2_min) >= np.iinfo(ar1.dtype).min

I also added some unittests to evaluate this behavior.

cc @seberg

MilesCranmer · 2022-12-23T17:02:50Z

Local tests pass. Ready for review @seberg

MilesCranmer · 2022-12-23T17:08:38Z

numpy/lib/arraysetops.py

        below_memory_constraint = ar2_range <= 6 * (ar1.size + ar2.size)
+        #  2. Check overflows for (ar2 - ar2_min); dtype=ar2.dtype
+        range_safe_from_overflow = ar2_range <= np.iinfo(ar2.dtype).max


This PR also corrects the bounds of the overflow check. It should have really been <= in #12065, rather than <. I noticed this after adding some new tests.

seberg · 2022-12-23T18:11:24Z

This is that last thing that would be great to have some fix for in 1.24.1 considering that it returns bad results. I expect this is good, but I need fresher eyes. But if it looks good to others: Maybe we should get it in, and I look at it again later and just follow up if I feel a different approach is better.

MilesCranmer · 2022-12-23T19:40:01Z

I'm thinking about this proposed change now and while it does fix the problem, it is more conservative than necessary. Recall we are looking for overflows in this calculation:

basic_mask = (ar1 <= ar2_max) & (ar1 >= ar2_min)
outgoing_array[basic_mask] = isin_helper_ar[ar1[basic_mask] - ar2_min]

basic_mask will trim ar1 to only the elements within the range of ar2. Thus, we only technically need to consider min(ar1_max, ar2_max), and max(ar1_min, ar2_min), in these calculations. i.e., the following:

# After masking, the range of ar1 is guaranteed to be
# within the range of ar2:
ar1_upper = min(int(ar1_max), int(ar2_max))
ar1_lower = max(int(ar1_min), int(ar2_min))

range_safe_from_overflow &= all((
    ar1_upper - int(ar2_min) <= np.iinfo(ar1.dtype).max,
    ar1_lower - int(ar2_min) >= np.iinfo(ar1.dtype).min
))

does that make sense?

Edit: pushed this change.

charris · 2022-12-24T22:28:26Z

I wonder why integers larger the uint16 are not tested, are they too big?

charris · 2022-12-25T18:44:30Z

Thanks @MilesCranmer. @seberg If you see anything that bothers you we can make another PR.

charris · 2022-12-25T18:44:39Z

Thanks @MilesCranmer. @seberg If you see anything that bothers you we can make another PR.

numpy#22878) * TST: Mixed integer types for in1d * BUG: Fix mixed dtype overflows for in1d (numpy#22877) * BUG: Type conversion for integer overflow check * MAINT: Fix linting issues in in1d * MAINT: ar1 overflow check only for non-empty array * MAINT: Expand bounds of overflow check * TST: Fix integer overflow in mixed boolean test * TST: Include test for overflow on mixed dtypes * MAINT: Less conservative overflow checks

MilesCranmer added 2 commits December 23, 2022 11:21

TST: Mixed integer types for in1d

54aa5bc

BUG: Fix mixed dtype overflows for in1d (numpy#22877)

dbfdcbd

MilesCranmer mentioned this pull request Dec 23, 2022

BUG: numpy.isin does not function correctly with two arrays with different integer type #22877

Closed

MilesCranmer changed the title ~~[WIP] Fix integer overflow in in1d for mixed integer dtypes #22877~~ BUG: [WIP] Fix integer overflow in in1d for mixed integer dtypes #22877 Dec 23, 2022

BUG: Type conversion for integer overflow check

f7a1439

MilesCranmer force-pushed the isin-fix-dtype branch from 83d5b2b to f7a1439 Compare December 23, 2022 16:42

MilesCranmer added 4 commits December 23, 2022 11:47

MAINT: Fix linting issues in in1d

c688977

MAINT: ar1 overflow check only for non-empty array

1d09d09

MAINT: Expand bounds of overflow check

b7f1701

TST: Fix integer overflow in mixed boolean test

96cd847

MilesCranmer changed the title ~~BUG: [WIP] Fix integer overflow in in1d for mixed integer dtypes #22877~~ BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 Dec 23, 2022

MilesCranmer commented Dec 23, 2022

View reviewed changes

charris added 00 - Bug 09 - Backport-Candidate PRs tagged should be backported labels Dec 23, 2022

charris added this to the 1.24.1 release milestone Dec 23, 2022

TST: Include test for overflow on mixed dtypes

c8299cb

MAINT: Less conservative overflow checks

c8499c6

charris approved these changes Dec 24, 2022

View reviewed changes

charris merged commit 235dbe1 into numpy:main Dec 25, 2022

charris mentioned this pull request Dec 25, 2022

BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 #22884

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Dec 25, 2022

charris removed this from the 1.24.1 release milestone Dec 25, 2022

MilesCranmer deleted the isin-fix-dtype branch December 25, 2022 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 #22878

BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 #22878

MilesCranmer commented Dec 23, 2022 •

edited

MilesCranmer commented Dec 23, 2022

MilesCranmer Dec 23, 2022

seberg commented Dec 23, 2022

MilesCranmer commented Dec 23, 2022 •

edited

charris commented Dec 24, 2022

charris commented Dec 25, 2022

charris commented Dec 25, 2022

	outgoing_array[basic_mask] = isin_helper_ar[ar1[basic_mask] -
	ar2_min]

BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 #22878

BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 #22878

Conversation

MilesCranmer commented Dec 23, 2022 • edited

MilesCranmer commented Dec 23, 2022

MilesCranmer Dec 23, 2022

Choose a reason for hiding this comment

seberg commented Dec 23, 2022

MilesCranmer commented Dec 23, 2022 • edited

charris commented Dec 24, 2022

charris commented Dec 25, 2022

charris commented Dec 25, 2022

MilesCranmer commented Dec 23, 2022 •

edited

MilesCranmer commented Dec 23, 2022 •

edited