BUG: `numpy.isin` does not function correctly with two arrays with different integer type #22877

TonyXiang8787 · 2022-12-23T10:49:52Z

Describe the issue:

The function numpy.isin sometimes returns the wrong answer, if the integer type of the two arrays (to be compared) are not the same.

The example below shows a int8 array with one zero and a int64 array with two values. It should return one True. However, it returns False.

Reproduce the code example:

>>> import numpy as np
>>> np.isin(np.zeros(1, dtype=np.int8), np.array([-128, 0], dtype=np.int64), kind='table')
array([False])

Error message:

No response

Runtime information:

Numpy version

>>> print(numpy.__version__)
1.24.0

Sys version

>>> print(sys.version)
3.11.1 (main, Dec  7 2022, 01:11:34) [GCC 11.3.0]

Numpy runtime

[{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_KNL',
                                    'AVX512_KNM',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Haswell',
  'filepath': 'PATH_DELETED_DUE_TO_PRIVACY/.venv/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-15028c96.3.21.so',
  'internal_api': 'openblas',
  'num_threads': 20,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.21'}]

Context for the issue:

The problem seems to only occur in version 1.24.

The text was updated successfully, but these errors were encountered:

nkit-chahal · 2022-12-23T11:07:02Z

NumPy version 1.20.3
it's working fine

TonyXiang8787 · 2022-12-23T11:11:24Z

@nkit-chahal It's a new problem in version 1.24.

seberg · 2022-12-23T11:29:47Z

@MilesCranmer do you have time to have a quick look. This looks like it must be related to gh-12065 misfiring.
Would be nice to fix for 1.24.1, but considering that needs to go out soon not sure that will happen.

MilesCranmer · 2022-12-23T13:46:24Z

Sure, I have time to look at this today.

TonyXiang8787 · 2022-12-23T13:55:02Z

@MilesCranmer After some research I think the problem might be in this line:

numpy/numpy/lib/arraysetops.py

Line 683 in 2b9851b

outgoing_array[basic_mask] = isin_helper_ar[ar1[basic_mask] -

If ar1 has a smaller data type than ar2, it may overflow here.

I cannot think about an easy solution here. We can convert ar1 to the same type of ar2, but it can increase memory usage dramatically (int8 to int64 is 8 times bigger). We can check the overflow for ar1 and if overflow can happen we switch back to sort method, but this sacrifices the performance.

MilesCranmer · 2022-12-23T13:59:39Z

Quick question for @seberg: is this behaviour actually incorrect? In other words, do we want int8(0) == int64(0) in the context of isin, or to have it be !=? I could see either thing being technically correct, I suppose it depends on NumPy conventions. (I guess I could see isin having different behaviour than == if someone has a mixed type array)

TonyXiang8787 · 2022-12-23T14:00:59Z

Quick question for @seberg: is this behaviour actually incorrect? In other words, do we want int8(0) == int64(0) in the context of isin, or to have it be !=? I could see either thing being technically correct, I suppose it depends on NumPy conventions. (I guess I could see isin having different behaviour than == if someone has a mixed type array)

The behavior before version 1.24 is int8(0) == int64(0) in the context of isin.

MilesCranmer · 2022-12-23T14:02:21Z

@TonyXiang8787 I think the solution should be implemented by correcting the overflow detection here:

numpy/numpy/lib/arraysetops.py

Line 652 in 2b9851b

range_safe_from_overflow = ar2_range < np.iinfo(ar2.dtype).max

To include information about the type of ar1.

MilesCranmer · 2022-12-23T14:13:03Z

Yep this looks like the solution:

-  range_safe_from_overflow = ar2_range < np.iinfo(ar2.dtype).max 
+  range_safe_from_overflow = ar2_range < np.iinfo(ar2.dtype).max and (ar1_max - ar2_min) < np.iinfo(ar1.dtype).max and (ar1_min - ar2_max) > np.iinfo(ar1.dtype).min

I can make a PR when I get into the office, double check this fixes the bug, and also include some new unit tests.

MilesCranmer · 2022-12-23T16:29:45Z

PR created on #22878. Thanks for raising this issue @TonyXiang8787.

…2878) * TST: Mixed integer types for in1d * BUG: Fix mixed dtype overflows for in1d (#22877) * BUG: Type conversion for integer overflow check * MAINT: Fix linting issues in in1d * MAINT: ar1 overflow check only for non-empty array * MAINT: Expand bounds of overflow check * TST: Fix integer overflow in mixed boolean test * TST: Include test for overflow on mixed dtypes * MAINT: Less conservative overflow checks

numpy#22878) * TST: Mixed integer types for in1d * BUG: Fix mixed dtype overflows for in1d (numpy#22877) * BUG: Type conversion for integer overflow check * MAINT: Fix linting issues in in1d * MAINT: ar1 overflow check only for non-empty array * MAINT: Expand bounds of overflow check * TST: Fix integer overflow in mixed boolean test * TST: Include test for overflow on mixed dtypes * MAINT: Less conservative overflow checks

BUG: Fix integer overflow in in1d for mixed integer dtypes #22877

TonyXiang8787 added the 00 - Bug label Dec 23, 2022

seberg added this to the 1.24.1 release milestone Dec 23, 2022

MilesCranmer added a commit to MilesCranmer/numpy that referenced this issue Dec 23, 2022

BUG: Fix mixed dtype overflows for in1d (numpy#22877)

dbfdcbd

MilesCranmer mentioned this issue Dec 23, 2022

BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 #22878

Merged

charris closed this as completed in #22878 Dec 25, 2022

charris mentioned this issue Dec 25, 2022

BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 #22884

Merged

charris added a commit that referenced this issue Dec 25, 2022

Merge pull request #22884 from charris/backport-22878

002c60d

BUG: Fix integer overflow in in1d for mixed integer dtypes #22877

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: `numpy.isin` does not function correctly with two arrays with different integer type #22877

BUG: `numpy.isin` does not function correctly with two arrays with different integer type #22877

TonyXiang8787 commented Dec 23, 2022 •

edited

nkit-chahal commented Dec 23, 2022

TonyXiang8787 commented Dec 23, 2022

seberg commented Dec 23, 2022

MilesCranmer commented Dec 23, 2022

TonyXiang8787 commented Dec 23, 2022 •

edited

MilesCranmer commented Dec 23, 2022

TonyXiang8787 commented Dec 23, 2022

MilesCranmer commented Dec 23, 2022

MilesCranmer commented Dec 23, 2022

MilesCranmer commented Dec 23, 2022 •

edited

BUG: numpy.isin does not function correctly with two arrays with different integer type #22877

BUG: numpy.isin does not function correctly with two arrays with different integer type #22877

Comments

TonyXiang8787 commented Dec 23, 2022 • edited

Describe the issue:

Reproduce the code example:

Error message:

Runtime information:

Context for the issue:

nkit-chahal commented Dec 23, 2022

TonyXiang8787 commented Dec 23, 2022

seberg commented Dec 23, 2022

MilesCranmer commented Dec 23, 2022

TonyXiang8787 commented Dec 23, 2022 • edited

MilesCranmer commented Dec 23, 2022

TonyXiang8787 commented Dec 23, 2022

MilesCranmer commented Dec 23, 2022

MilesCranmer commented Dec 23, 2022

MilesCranmer commented Dec 23, 2022 • edited

BUG: `numpy.isin` does not function correctly with two arrays with different integer type #22877

BUG: `numpy.isin` does not function correctly with two arrays with different integer type #22877

TonyXiang8787 commented Dec 23, 2022 •

edited

TonyXiang8787 commented Dec 23, 2022 •

edited

MilesCranmer commented Dec 23, 2022 •

edited