Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: random.hypergeometic assumes npy_long is npy_int64, hangs ppc64 #14458

Merged
merged 1 commit into from Sep 9, 2019

Conversation

ahaldane
Copy link
Member

@ahaldane ahaldane commented Sep 9, 2019

Fixes #14457

This caused a incorrect int64* pointer-dereference to an int32 array in 32-bit mode where npy_long is not npy_int64.

@ahaldane ahaldane added 00 - Bug 09 - Backport-Candidate PRs tagged should be backported labels Sep 9, 2019
@ahaldane ahaldane added this to the 1.17.3 release milestone Sep 9, 2019
@charris charris modified the milestones: 1.17.3 release, 1.16.6 Sep 9, 2019
@eric-wieser eric-wieser merged commit 42ac121 into numpy:master Sep 9, 2019
@eric-wieser
Copy link
Member

Looks obviously correct, thanks. Might be worth looking for other similar cases.

@WarrenWeckesser
Copy link
Member

It looks like the actual source of the bug was down about 20 lines in the code:

    # Convert to int64, if necessary, to use int64 infrastructure
    ongood = ongood.astype(np.int64)
    onbad = onbad.astype(np.int64)
    onbad = onbad.astype(np.int64)
    out = discrete_broadcast_iii(&legacy_random_hypergeometric,&self._bitgen, size, self.lock,
                                 ongood, 'ngood', CONS_NON_NEGATIVE,
                                 onbad, 'nbad', CONS_NON_NEGATIVE,
                                 onsample, 'nsample', CONS_GTE_1)

Note that instead of converting onsample to np.int64, we have onbad = onbad.astype(np.int64) twice. That means onsample was passed to discrete_broadcast_iii as an array of long instead of int64, and that probably triggered the problem reported in #14457.

I think the change in this PR is still fine, but now we can do a little more clean up. I'll submit a PR.

@WarrenWeckesser
Copy link
Member

It should be noted that this PR introduces a change in behavior for systems where a C long is 32 bits, but only in cases where the inputs exceed the limits of a 32 bit integer. In the original version of the code, the initial conversion to long would result in these large values being set to 0 in the calls to PyArray_FROM_OTF. In the revised code, these large values are passed to the legacy_random_hypergeometric. Then any large values in out will be set to 0 in the conversion of the result to long that occurs in the call int64_to_long(out). Either way, the results will not be correct, because somewhere along the line, values have been silently set to 0. Should we worry about this? We haven't really worried about this extreme edge case in the past. And the algorithm is not reliable for such large values anyway (see #11443). The version of the code in the new Generator requires that the values in ngood and nbad are less than 1000000000. So this comment is mostly FYI.

@seberg
Copy link
Member

seberg commented Sep 9, 2019

I think I would vote for considering it a bug fix and not caring. But, we should maybe give it a release note, since it is in the legacy (stable) random API. But others may disagree.

@WarrenWeckesser
Copy link
Member

Follow-up PR: #14465

@charris
Copy link
Member

charris commented Sep 9, 2019

Is this safe? Note that discnmN_array functions expect long.

@charris
Copy link
Member

charris commented Sep 9, 2019

The int64 arrays are iterated OK in discnmN_array (I think), but the pointers returned by the iterator are cast to (long *) and then indexed. How is that correct?

@WarrenWeckesser
Copy link
Member

@charris wrote

The int64 arrays are iterated OK in discnmN_array (I think), but the pointers returned by the iterator are cast to (long *) and then indexed. How is that correct?

What code are you looking at?

@charris
Copy link
Member

charris commented Sep 9, 2019

@WarrenWeckesser NVM, I was looking at the 1.16 version of the code, everything is different in 1.17.

WarrenWeckesser added a commit to WarrenWeckesser/numpy that referenced this pull request Sep 12, 2019
charris added a commit that referenced this pull request Sep 12, 2019
@charris charris modified the milestones: 1.16.6, 1.17.3 release Sep 13, 2019
charris pushed a commit to charris/numpy that referenced this pull request Sep 13, 2019
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Sep 13, 2019
@charris charris removed this from the 1.17.3 release milestone Sep 13, 2019
charris added a commit that referenced this pull request Sep 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

near-infinite loop on ppc64 in np.random.hypergeometric
5 participants