New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Improve performance of np.broadcast_arrays and np.broadcast_shapes #26160
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! To me, it seems fine that arrays are only made read-only if their shape is actually changes, so what you have would seem an improvement, even if it means the tests have to be adjusted slightly.
Some queries inside about what gives the real improvement in time.
Also, small thing, but could you ensure the indentation remains correct?
@@ -546,13 +546,12 @@ def broadcast_arrays(*args, subok=False): | |||
# return np.nditer(args, flags=['multi_index', 'zerosize_ok'], | |||
# order='C').itviews | |||
|
|||
args = tuple(np.array(_m, copy=None, subok=subok) for _m in args) | |||
args = [np.array(_m, copy=None, subok=subok) for _m in args] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really help? I thought that these days list comprehension and creating a tuple via an iterator made very little speed difference, and *args
should be very slightly faster for a tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does help, although you are right the cost of list comprehensions has gone down. On Python 3.12:
In [16]: import dis
...: args=[1, 2, 3]
...: %timeit tuple(2*j for j in args)
...: %timeit tuple([2*j for j in args])
298 ns ± 22.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
108 ns ± 0.781 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
So about 200ns for a small size, which does matter in the benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I confirm this. It is funny because I really thought python had solved the speed difference, but clearly I was wrong.
|
||
shape = _broadcast_shape(*args) | ||
|
||
if all(array.shape == shape for array in args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice catch.
if all(array.shape == shape for array in args): | ||
# Common case where nothing needs to be broadcasted. | ||
return args | ||
result = [array if array.shape == shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the above, I wonder if it is actually slower to directly return tuple(array if array.shape ...)
?
Just saw the emails and hadn't noticed this, so a brief chiming in. I am on edge about "unpredictable" readonly flag: That is the exact kind of things that the tuple vs. list return discussion was about, although I don't know if e.g. For (The magic EDIT: Actually, probably need |
Results:
Benchmark script:
There are some tests failing/modified due to the results not being writable. Maybe that is ok, but I am not sure.
See DEP: finish deprecating readonly result from numpy.broadcast_arrays and the links in that issue.
Notes:
_broadcast_to
on arrays that do not require broadcasting. Also two generators are avoided.np.broadcast_arrays
matters for the argument parsing in the random distributions ofscipy.stats
.np.nditer
. Addingmakes the method much faster, but there are two failing tests. Both related to the result of
np.nditer
being read-only (output included in the details below). I could not find an option to make the output ofnp.nditer
writable.The
np.broadcast_shapes
is made faster (and more memory efficient) by selecting a more efficientdtype
for the helper arrays.