Add np.allclose and np.isclose support ref. issue #4074 #6286

jeertmans · 2020-09-27T21:50:06Z

Hi everyone,

As proposed in issue #4074, I added support for the np.allclose function.

I also wrote unit test. One of them could be enhanced (about checking data type) but support to numpy.issubdtype(a, b) should be first brought. Anyway, I think that the function works quite as expected.

If I made any mistake or forgot anything, don't hesitate :)

If you think the implementation is correct, I can easily implement the numpy.isclose function.

I timed the function and it seems to work pretty well:

a = np.random.rand(100, 100)
np.allclose(a, a)
>>> True
allclose(a, a)
>>> True
%timeit allclose(a, a)
>>> 10.9 µs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.allclose(a, a)
>>> 59.4 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

make more sens

esc · 2020-09-28T09:27:47Z

@jeertmans thanks for submitting this! I have scheduled it for review.

esc · 2020-09-29T07:31:43Z

@jeertmans so, we discussed this in the core-dev meeting yesterday and I think the implementation may be overly complex. I don't think that much of the error checking will be needed as much of the functionality may already be present? Probably the core of the algorithm can be done using the vectorized:

return False if any(abs(a - b) > (atol + rtol * abs(b))) else return True

Where both a and b are arrays.

With perhaps a special case for Nan? Or did I miss something?

esc · 2020-09-29T07:51:16Z

I'm just looking into this now so I'll leave messages here as I come across things.

For example, I am not not convinced we need an extra _broadcastable_, because using subtraction on arrays will yield it:

In [13]: from numba import njit

In [14]: @njit
    ...: def sub(a, b):
    ...:     return (a-b)
    ...:

In [15]: sub(a, b)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-0a247c27eacb> in <module>
----> 1 sub(a, b)

ValueError: unable to broadcast argument 1 to output array
File "/Users/vhaenel/git/numba/numba/np/npyimpl.py", line 228,

Granted, the error message isn't the same as numpy:

In [16]: a - b
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-09bd029d0285> in <module>
----> 1 a - b

ValueError: operands could not be broadcast together with shapes (4,3) (3,5)

But that is probably something that should be fixed elsewhere in Numba.

jeertmans · 2020-09-29T08:07:12Z

Thank you @esc !
I will take a deeper look at it and I keep you up to date.
About vectorizing, I did not know the any() function was implemented, my bad :)

For the special case of NaN, I still have to think about how to do this.

esc · 2020-09-29T08:10:28Z

Another thought is, to look at the relevant Numpy functions:

As far as I can tell allclose is implemented using isclose and isclose seems a little more involved. 😁

jeertmans · 2020-09-29T10:22:47Z

So @esc , I looked a bit into it and it seems that using vectorized function will just be slower than the current implementation.
Even without taking into account the special case of NaNs, using np.all(abs(a - b) > (atol + rtol * abs(b))) takes much more time. Same happens if a swap np.all to np.any.

The differences in time are shown:

a = np.random.rand(1000, 1000)
b = a  # All values are the same
...
(numbaenv) jerome@jerome-N501VW:~/Documents/programmation/numba$ python3 ../allclose_perf.py
>>> 0.23322463035583496 (using vectorised op.)
>>> 0.15993094444274902 (my implementation)
>>> 0.9217619895935059 (np.allclose)
b = -a  # All values are different
...
(numbaenv) jerome@jerome-N501VW:~/Documents/programmation/numba$ python3 ../allclose_perf.py
>>> 0.2327868938446045 (using vectorised op.)
>>> 3.337860107421875e-06 (my implementation)
>>> 0.9791421890258789 (np.allclose)

From my point of view, the problem is that the np.any (resp. the np.all) function is not implemented "smartly".
The np.allclose implementation uses np.iclose function. But we actually don't need this O(n) memory allocated by this function. It is actually worse that O(n) because, in order to handle NaN, they allocate several arrays for this purpose.

So, if you want to purely replicate Numpy's behavior, I can easily do this, but I think it will come with a slowdown in performances.

About the _broadcastable_ function, I agree that it is maybe not the best solution, but that is the only one I saw that was without any useless memory allocation, which Numpy does.

esc · 2020-09-29T12:05:14Z

@jeertmans excellent, thank you for looking into this with such detail. I'll give it some thought and get back to you!

esc · 2020-09-30T09:04:34Z

@jeertmans I gave this some thought and discussed with some of the other developers. We believe that correctness is more important than performance, initially. That is to say, performance does of course matter greatly as Numba is all about performance, but the implementation needs to be 100% correct (i.e. matching Numpy behaviour) first. If you would like to add code that diverges from the Numpy approach, that is of course encouraged, especially if it improves performance, however the tests must be very rigorous and thorough in such cases and demonstrate that correctness was not sacrificed. As a start, the _broadcastable_ function will need to be extensively tested and covered by test cases. Also, it is probably worth looking into why np.all(abs(a - b) > (atol + rtol * abs(b))) is slower and by how much. Having a vectorized form is significantly beneficial in many cases as it helps to avoid temporaries and lends itself to be being parallelized with Numba's parallel accelerator (aka parfors).

However, my recommendation in this case would actually be to take step back, look at the is_close operation of Numpy and then use that as a basis for allclose. It seems to have a significant portion of code dedicated to dealing with specific edge cases, such handling of masked arrays, Nan and inf values which I don't see in this PR at present. In any case, thank you very much for looking into this, we do appreciate your efforts! This is a long standing issue and it would be awesome to see it resolved!

jeertmans · 2020-09-30T12:12:16Z

@esc Thank you for you complete answer ! I will give it a moment and will reach you back soon =)

jeertmans · 2020-10-01T10:43:57Z

So @esc , I have implemented the function according to Numpy's source code.

It works but I have some small problems:

numpy.asanyarray and other functions are not implemented in Numba, so that the implementation is not a perfect mirror of Numpy's. I don't see when it should cause problem but I guess a good idea should be to implement those functions, but it is maybe to much for a single PR.
Numpy's way to handle 0d arrays is not compliant with Numba's way: most Numpy functions accept scalar elements and return scalar. But Numba doesn't seem to like working with both scalars and arrays:

>>> Rejected as the implementation raised a specific error:
>>>     TypingError: Failed in nopython mode pipeline (step: nopython frontend)
>>> Can't unify return type from the following types: array(bool, 1d, C), bool

For the moment, I decided to used numpy.atleast_1d function to overcome this problem. The problem with this is that the output will not have the same format as numpy.isclose function.

Thus, I also wrote similar tests for isclose (and updated PR title).

Now, performances:

a = np.random.rand(1000, 1000)
%timeit allclose(a,a)
>>> 7.49 ms ± 54.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.allclose(a,a)
>>> 9.27 ms ± 90.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
b = - a
%timeit allclose(a,b)
>>> 7.23 ms ± 469 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.allclose(a,b)
>>> 10.2 ms ± 572 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I am of course open to any critic or suggestion :)

esc · 2020-10-01T15:54:53Z

@jeertmans great! Excited to see this coming along. I have marked it as "waiting on reviewer" for now. Until we get around to taking a closer look, perhaps you could fix the remaining flake8 errors in the meantime:

numba/np/arraymath.py:4495:81: E501 line too long (83 > 80 characters)
numba/np/arraymath.py:4516:81: E501 line too long (82 > 80 characters)

jeertmans · 2020-10-01T16:15:16Z

@esc Thanks ! I have fixed it (I guess, waiting now for the results from Flake8). I also made a slight modification but nothing drastic :)

jeertmans · 2020-10-01T21:23:49Z

Well there seems to be another error in the checks but I really don't think my code does not pass checks, see the errors logs:

# Install latest llvmlite build
$CONDA_INSTALL -c numba llvmlite
\dirname "${CONDA_EXE}"
\dirname "${SYSP}"
Collecting package metadata (current_repodata.json): ...working... failed

CondaHTTPError: HTTP 504 GATEWAY TIME-OUT for url <https://conda.anaconda.org/numba/linux-64/current_repodata.json>
Elapsed: 01:00.130693
CF-RAY: 5db791664bde9fe8-IAD

A remote server error occurred when trying to retrieve this URL.

A 500-type error (e.g. 500, 501, 502, 503, etc.) indicates the server failed to
fulfill a valid request.  The problem may be spurious, and will resolve itself if you
try your request again.  If the problem persists, consider notifying the maintainer
of the remote server.

(https://dev.azure.com/numba/numba/_build/results?buildId=6740&view=logs&j=f2076665-76cb-5d7c-2ee9-55e9ea449f59&t=3c2cd5ad-7adf-505c-38f1-d4511566d1f6&l=3865)

If I'm wrong, can you help me finding what is the problem ? :)

esc · 2020-10-02T09:32:42Z

/azp run

azure-pipelines · 2020-10-02T09:32:51Z

Azure Pipelines successfully started running 1 pipeline(s).

esc · 2020-10-02T09:33:21Z

@jeertmans that is an error/glitch from anaconda.org -- I've re-queued the tests now 🤞

jeertmans · 2020-10-02T09:37:31Z

Thanks @esc !

jeertmans · 2020-10-06T14:40:29Z

Hello @stuartarchibald , you added this PR to the "PR Backlog" milestone, but I was wondering what it was and it could not find any information about it on the internet :/
Can you help me ? :)

stuartarchibald · 2020-10-06T21:20:01Z

Hello @stuartarchibald , you added this PR to the "PR Backlog" milestone, but I was wondering what it was and it could not find any information about it on the internet :/
Can you help me ? :)

Hi @jeertmans, I can try and explain. "PR Backlog" is a Numba specific milestone we use to help manage the project. By default all pull requests which are not yet targeted at a specific release but are in progress are added to this milestone. If a PR is critical for a particular release it will get added to a specific milestone, for all other PRs, they are added to a release milestone when they get near completion and the release milestone to which they are added depends on where in the release cycle they happen to land, how risky the code change is, if there are any dependencies etc. Hope this explanation helps?

jeertmans · 2020-10-06T22:24:22Z

@stuartarchibald Yes it helped ! Thank you :)

esc · 2020-10-09T08:15:47Z

@jeertmans just as a heads up, we are now in the burndown phase for the next release candidate, 0.52.0RC1, which essentially means, we will have to put reviewing this on hold, until after the release. Hopefully we can resume reviewing this soon.

jeertmans · 2020-10-09T13:18:55Z

@esc thank you for keeping me up to date ! Go luck with the new release ;)
This PR can wait for sure

gmarkall · 2021-06-01T12:45:41Z

@esc What hall we do about re-reviewing this? I guess it's a bit close to 0.54 to look at, but maybe could go in the 0.55 milestone to remind us to take a look at it before then?

jeertmans added 7 commits September 27, 2020 23:36

allclose implementation

0bb2c14

allclose tests

e76c96f

docs support

be26840

renamed to close

a9bf04a

make more sens

remove unused import

0b63cb7

changed assertion

bd98e58

correct acc. to Flake8

69fa9d1

esc added the 3 - Ready for Review label Sep 28, 2020

jeertmans added 3 commits October 1, 2020 12:42

allclose + iclose

9446fe1

allclose + isclose (acc. to review)

42cbaa4

Merge remote-tracking branch 'origin/add_allclose' into add_allclose

1a0aac4

jeertmans changed the title ~~Add np.allclose support ref. issue #4074~~ Add np.allclose and np.isclose support ref. issue #4074 Oct 1, 2020

add to doc supported

e69f2e4

esc added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 3 - Ready for Review labels Oct 1, 2020

flake8 corr. + flatten to ravel (no need to copy)

064aef9

stuartarchibald added this to the PR Backlog milestone Oct 5, 2020

jeertmans mentioned this pull request May 28, 2021

Implement np.isclose #7067

Merged

stuartarchibald mentioned this pull request Mar 29, 2023

Fix broken np.allclose, add np.isclose #8857

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add np.allclose and np.isclose support ref. issue #4074 #6286

Add np.allclose and np.isclose support ref. issue #4074 #6286

jeertmans commented Sep 27, 2020

esc commented Sep 28, 2020

esc commented Sep 29, 2020 •

edited

esc commented Sep 29, 2020 •

edited

jeertmans commented Sep 29, 2020

esc commented Sep 29, 2020 •

edited

jeertmans commented Sep 29, 2020

esc commented Sep 29, 2020

esc commented Sep 30, 2020

jeertmans commented Sep 30, 2020

jeertmans commented Oct 1, 2020

esc commented Oct 1, 2020

jeertmans commented Oct 1, 2020

jeertmans commented Oct 1, 2020 •

edited

esc commented Oct 2, 2020

azure-pipelines bot commented Oct 2, 2020

esc commented Oct 2, 2020

jeertmans commented Oct 2, 2020

jeertmans commented Oct 6, 2020

stuartarchibald commented Oct 6, 2020

jeertmans commented Oct 6, 2020

esc commented Oct 9, 2020

jeertmans commented Oct 9, 2020

gmarkall commented Jun 1, 2021

Add np.allclose and np.isclose support ref. issue #4074 #6286

Are you sure you want to change the base?

Add np.allclose and np.isclose support ref. issue #4074 #6286

Conversation

jeertmans commented Sep 27, 2020

esc commented Sep 28, 2020

esc commented Sep 29, 2020 • edited

esc commented Sep 29, 2020 • edited

jeertmans commented Sep 29, 2020

esc commented Sep 29, 2020 • edited

jeertmans commented Sep 29, 2020

esc commented Sep 29, 2020

esc commented Sep 30, 2020

jeertmans commented Sep 30, 2020

jeertmans commented Oct 1, 2020

esc commented Oct 1, 2020

jeertmans commented Oct 1, 2020

jeertmans commented Oct 1, 2020 • edited

esc commented Oct 2, 2020

azure-pipelines bot commented Oct 2, 2020

esc commented Oct 2, 2020

jeertmans commented Oct 2, 2020

jeertmans commented Oct 6, 2020

stuartarchibald commented Oct 6, 2020

jeertmans commented Oct 6, 2020

esc commented Oct 9, 2020

jeertmans commented Oct 9, 2020

gmarkall commented Jun 1, 2021

esc commented Sep 29, 2020 •

edited

esc commented Sep 29, 2020 •

edited

esc commented Sep 29, 2020 •

edited

jeertmans commented Oct 1, 2020 •

edited