New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add np.allclose and np.isclose support ref. issue #4074 #6286
base: main
Are you sure you want to change the base?
Conversation
@jeertmans thanks for submitting this! I have scheduled it for review. |
@jeertmans so, we discussed this in the core-dev meeting yesterday and I think the implementation may be overly complex. I don't think that much of the error checking will be needed as much of the functionality may already be present? Probably the core of the algorithm can be done using the vectorized:
Where both With perhaps a special case for Nan? Or did I miss something? |
I'm just looking into this now so I'll leave messages here as I come across things. For example, I am not not convinced we need an extra
Granted, the error message isn't the same as numpy:
But that is probably something that should be fixed elsewhere in Numba. |
Thank you @esc ! For the special case of NaN, I still have to think about how to do this. |
Another thought is, to look at the relevant Numpy functions:
As far as I can tell |
So @esc , I looked a bit into it and it seems that using vectorized function will just be slower than the current implementation. The differences in time are shown: a = np.random.rand(1000, 1000)
b = a # All values are the same
...
(numbaenv) jerome@jerome-N501VW:~/Documents/programmation/numba$ python3 ../allclose_perf.py
>>> 0.23322463035583496 (using vectorised op.)
>>> 0.15993094444274902 (my implementation)
>>> 0.9217619895935059 (np.allclose)
b = -a # All values are different
...
(numbaenv) jerome@jerome-N501VW:~/Documents/programmation/numba$ python3 ../allclose_perf.py
>>> 0.2327868938446045 (using vectorised op.)
>>> 3.337860107421875e-06 (my implementation)
>>> 0.9791421890258789 (np.allclose) From my point of view, the problem is that the np.any (resp. the np.all) function is not implemented "smartly". So, if you want to purely replicate Numpy's behavior, I can easily do this, but I think it will come with a slowdown in performances. About the |
@jeertmans excellent, thank you for looking into this with such detail. I'll give it some thought and get back to you! |
@jeertmans I gave this some thought and discussed with some of the other developers. We believe that correctness is more important than performance, initially. That is to say, performance does of course matter greatly as Numba is all about performance, but the implementation needs to be 100% correct (i.e. matching Numpy behaviour) first. If you would like to add code that diverges from the Numpy approach, that is of course encouraged, especially if it improves performance, however the tests must be very rigorous and thorough in such cases and demonstrate that correctness was not sacrificed. As a start, the However, my recommendation in this case would actually be to take step back, look at the |
@esc Thank you for you complete answer ! I will give it a moment and will reach you back soon =) |
So @esc , I have implemented the function according to Numpy's source code. It works but I have some small problems:
>>> Rejected as the implementation raised a specific error:
>>> TypingError: Failed in nopython mode pipeline (step: nopython frontend)
>>> Can't unify return type from the following types: array(bool, 1d, C), bool For the moment, I decided to used Thus, I also wrote similar tests for Now, performances: a = np.random.rand(1000, 1000)
%timeit allclose(a,a)
>>> 7.49 ms ± 54.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.allclose(a,a)
>>> 9.27 ms ± 90.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
b = - a
%timeit allclose(a,b)
>>> 7.23 ms ± 469 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.allclose(a,b)
>>> 10.2 ms ± 572 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) I am of course open to any critic or suggestion :) |
@jeertmans great! Excited to see this coming along. I have marked it as "waiting on reviewer" for now. Until we get around to taking a closer look, perhaps you could fix the remaining
|
@esc Thanks ! I have fixed it (I guess, waiting now for the results from Flake8). I also made a slight modification but nothing drastic :) |
Well there seems to be another error in the checks but I really don't think my code does not pass checks, see the errors logs:
If I'm wrong, can you help me finding what is the problem ? :) |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@jeertmans that is an error/glitch from anaconda.org -- I've re-queued the tests now 🤞 |
Thanks @esc ! |
Hello @stuartarchibald , you added this PR to the "PR Backlog" milestone, but I was wondering what it was and it could not find any information about it on the internet :/ |
Hi @jeertmans, I can try and explain. "PR Backlog" is a Numba specific milestone we use to help manage the project. By default all pull requests which are not yet targeted at a specific release but are in progress are added to this milestone. If a PR is critical for a particular release it will get added to a specific milestone, for all other PRs, they are added to a release milestone when they get near completion and the release milestone to which they are added depends on where in the release cycle they happen to land, how risky the code change is, if there are any dependencies etc. Hope this explanation helps? |
@stuartarchibald Yes it helped ! Thank you :) |
@jeertmans just as a heads up, we are now in the burndown phase for the next release candidate, 0.52.0RC1, which essentially means, we will have to put reviewing this on hold, until after the release. Hopefully we can resume reviewing this soon. |
@esc thank you for keeping me up to date ! Go luck with the new release ;) |
@esc What hall we do about re-reviewing this? I guess it's a bit close to 0.54 to look at, but maybe could go in the 0.55 milestone to remind us to take a look at it before then? |
Hi everyone,
As proposed in issue #4074, I added support for the np.allclose function.
I also wrote unit test. One of them could be enhanced (about checking data type) but support to numpy.issubdtype(a, b) should be first brought. Anyway, I think that the function works quite as expected.
If I made any mistake or forgot anything, don't hesitate :)
If you think the implementation is correct, I can easily implement the numpy.isclose function.
I timed the function and it seems to work pretty well: