Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Histogramdd breaks on big arrays in Windows #22561

Merged
merged 6 commits into from Nov 18, 2022

Conversation

navpreetnp7
Copy link
Contributor

Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py
New test function was added called test_big_arrays on numpy/lib/tests/test_histograms.py

Resolved #22288
Pydata Sprint NYC 2022

Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py
@@ -970,7 +970,8 @@ def histogramdd(sample, bins=10, range=None, density=None, weights=None):
sample = np.atleast_2d(sample).T
N, D = sample.shape

nbin = np.empty(D, int)
#nbin = np.empty(D, int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and committed.

Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py
@@ -397,6 +397,14 @@ def test_histogram_bin_edges(self):
edges = histogram_bin_edges(arr, bins='auto', range=(0, 1))
assert_array_equal(edges, e)

def test_big_arrays(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. We need to mark this for large memory use, have a look for @requires_memory. I suspect it should also be marked with @pytest.mark.slow to skip on most runs.

@seberg
Copy link
Member

seberg commented Nov 9, 2022

There are two binary files that got accidentally commit, please delete them.

@seberg
Copy link
Member

seberg commented Nov 16, 2022

@navpreetnp7 would be great to finish this up, since branching will be in a week or two probably. Could you add that @requires_memory and @pytest.mark.slow mark to the test?

CI/tests are currently failing because the test requires a huge amount of memory to run successfully.

@navpreetnp7
Copy link
Contributor Author

Yeah I just finished it up. Sorry for the delay

@@ -397,6 +398,16 @@ def test_histogram_bin_edges(self):
edges = histogram_bin_edges(arr, bins='auto', range=(0, 1))
assert_array_equal(edges, e)

@requires_memory(free_bytes=100000000 * 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Lets refine this a little bit. If you check sample.nbytes, you will see that it is 8 times larger. Also the result array (with the current choices) has a significant size. You could just reduce that size by a lot, even then it probably makes sense to round up a little bit.
(This is a sanity check that we should be good after all. The decorator also catches MemoryError so we should be fine, but I would just round up generously.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. That makes sense. I made it 100 times larger

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if I do the math, this should be about double of what is needed, and 10 GiB are not forbidding these days, so this should be fine.

@seberg
Copy link
Member

seberg commented Nov 18, 2022

Thanks @navpreetnp7.

@seberg seberg merged commit 72af24d into numpy:main Nov 18, 2022
@seberg seberg added the 09 - Backport-Candidate PRs tagged should be backported label Nov 18, 2022
kernel-loophole added a commit to kernel-loophole/numpy that referenced this pull request Nov 18, 2022
BUG: Histogramdd breaks on big arrays in Windows (numpy#22561)
charris pushed a commit to charris/numpy that referenced this pull request Nov 19, 2022
* BUG: Histogramdd breaks on big arrays in Windows

Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py

* BUG: Histogramdd breaks on big arrays in Windows

Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py

* Removed the binary files

* Update test_histograms.py

* Update test_histograms.py

* Update test_histograms.py
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Nov 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

BUG: Histogramdd breaks on big arrays in Windows
4 participants