New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Histogramdd breaks on big arrays in Windows #22561
Conversation
Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py
numpy/lib/histograms.py
Outdated
@@ -970,7 +970,8 @@ def histogramdd(sample, bins=10, range=None, density=None, weights=None): | |||
sample = np.atleast_2d(sample).T | |||
N, D = sample.shape | |||
|
|||
nbin = np.empty(D, int) | |||
#nbin = np.empty(D, int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and committed.
Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py
@@ -397,6 +397,14 @@ def test_histogram_bin_edges(self): | |||
edges = histogram_bin_edges(arr, bins='auto', range=(0, 1)) | |||
assert_array_equal(edges, e) | |||
|
|||
def test_big_arrays(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. We need to mark this for large memory use, have a look for @requires_memory
. I suspect it should also be marked with @pytest.mark.slow
to skip on most runs.
There are two binary files that got accidentally commit, please delete them. |
@navpreetnp7 would be great to finish this up, since branching will be in a week or two probably. Could you add that CI/tests are currently failing because the test requires a huge amount of memory to run successfully. |
Yeah I just finished it up. Sorry for the delay |
numpy/lib/tests/test_histograms.py
Outdated
@@ -397,6 +398,16 @@ def test_histogram_bin_edges(self): | |||
edges = histogram_bin_edges(arr, bins='auto', range=(0, 1)) | |||
assert_array_equal(edges, e) | |||
|
|||
@requires_memory(free_bytes=100000000 * 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Lets refine this a little bit. If you check sample.nbytes
, you will see that it is 8 times larger. Also the result array (with the current choices) has a significant size. You could just reduce that size by a lot, even then it probably makes sense to round up a little bit.
(This is a sanity check that we should be good after all. The decorator also catches MemoryError
so we should be fine, but I would just round up generously.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. That makes sense. I made it 100 times larger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if I do the math, this should be about double of what is needed, and 10 GiB are not forbidding these days, so this should be fine.
Thanks @navpreetnp7. |
BUG: Histogramdd breaks on big arrays in Windows (numpy#22561)
* BUG: Histogramdd breaks on big arrays in Windows Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py * BUG: Histogramdd breaks on big arrays in Windows Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py * Removed the binary files * Update test_histograms.py * Update test_histograms.py * Update test_histograms.py
Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py
New test function was added called test_big_arrays on numpy/lib/tests/test_histograms.py
Resolved #22288
Pydata Sprint NYC 2022