Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Improve performance of np.broadcast_arrays and np.broadcast_shapes #26160

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Mar 28, 2024

Results:

main:
broadcast_arrays(*args) 1.12 us
broadcast_arrays(*args_two) 6.98 us
broadcast_arrays(*args_multi) 9.01 us

PR:
broadcast_arrays(*args) 0.77 us
broadcast_arrays(*args_two) 3.30 us
broadcast_arrays(*args_multi) 3.72 us

Benchmark script:


import timeit
import numpy as np
from numpy import broadcast_arrays
args = (np.array([1.]), )
args_two = (np.array([1.]), np.arange(10) )
args_multi = (np.array([1.]), np.arange(10), np.arange(2, 12) )

number=100_000
dt = timeit.timeit(stmt="broadcast_arrays(*args)", globals=globals(), number=number)
print(f'broadcast_arrays(*args) {1e6*dt/number:.2f} us')
dt = timeit.timeit(stmt="broadcast_arrays(*args_two)", globals=globals(), number=number)
print(f'broadcast_arrays(*args_two) {1e6*dt/number:.2f} us')
dt = timeit.timeit(stmt="broadcast_arrays(*args_multi)", globals=globals(), number=number)
print(f'broadcast_arrays(*args_multi) {1e6*dt/number:.2f} us')

There are some tests failing/modified due to the results not being writable. Maybe that is ok, but I am not sure.
See DEP: finish deprecating readonly result from numpy.broadcast_arrays and the links in that issue.

Notes:

  • The PR avoids calling _broadcast_to on arrays that do not require broadcasting. Also two generators are avoided.
  • The performance of np.broadcast_arrays matters for the argument parsing in the random distributions of scipy.stats.
  • In the code there is a comment about possibly using np.nditer. Adding
    if len(args)< 33 and not subok:        
        return np.nditer(args, flags=['multi_index', 'zerosize_ok', 'reduce_ok',  'refs_ok'], order='C').itviews

makes the method much faster, but there are two failing tests. Both related to the result of np.nditer being read-only (output included in the details below). I could not find an option to make the output of np.nditer writable.

_____________________________________________ test_writeable ____________________________________________

    def test_writeable():
        # broadcast_to should return a readonly array
        original = np.array([1, 2, 3])
        result = broadcast_to(original, (2, 3))
        assert_equal(result.flags.writeable, False)
        assert_raises(ValueError, result.__setitem__, slice(None), 0)

        # but the result of broadcast_arrays needs to be writeable, to
        # preserve backwards compatibility
        for is_broadcast, results in [(False, broadcast_arrays(original,)),
                                      (True, broadcast_arrays(0, original))]:
            for result in results:
                # This will change to False in a future version
                if is_broadcast:
                    with assert_warns(FutureWarning):
                        assert_equal(result.flags.writeable, True)
                    with assert_warns(DeprecationWarning):
                        result[:] = 0
                    # Warning not emitted, writing to the array resets it
                    assert_equal(result.flags.writeable, True)
                else:
                    # No warning:
>                   assert_equal(result.flags.writeable, True)
E                   AssertionError:
E                   Items are not equal:
E                    ACTUAL: False
E                    DESIRED: True

is_broadcast = False
original   = array([1, 2, 3])
result     = array([1, 2, 3])
results    = (array([1, 2, 3]),)

numpy\lib\tests\test_stride_tricks.py:594: AssertionError
_____________________________________ test_writeable_memoryview _______________________________________

    def test_writeable_memoryview():
        # The result of broadcast_arrays exports as a non-writeable memoryview
        # because otherwise there is no good way to opt in to the new behaviour
        # (i.e. you would need to set writeable to False explicitly).
        # See gh-13929.
        original = np.array([1, 2, 3])

        for is_broadcast, results in [(False, broadcast_arrays(original,)),
                                      (True, broadcast_arrays(0, original))]:
            for result in results:
                # This will change to False in a future version
                if is_broadcast:
                    # memoryview(result, writable=True) will give warning but cannot
                    # be tested using the python API.
                    assert memoryview(result).readonly
                else:
>                   assert not memoryview(result).readonly
E                   assert not True
E                    +  where True = <memory at 0x0000029E1779FB80>.readonly
E                    +    where <memory at 0x0000029E1779FB80> = memoryview(array([1, 2, 3]))

is_broadcast = False
original   = array([1, 2, 3])
result     = array([1, 2, 3])
results    = (array([1, 2, 3]),)

numpy\lib\tests\test_stride_tricks.py:635: AssertionError
===================================================================================================== short test summary info =====================================================================================================
FAILED numpy/lib/tests/test_stride_tricks.py::test_writeable - AssertionError:
FAILED numpy/lib/tests/test_stride_tricks.py::test_writeable_memoryview - assert not True
========================================================================= 2 failed, 4360 passed, 165 skipped, 6 deselected, 4 xfailed, 1 xpassed in 9.38s =========================================================================

The np.broadcast_shapes is made faster (and more memory efficient) by selecting a more efficient dtype for the helper arrays.

%timeit broadcast_shapes( (10, 10), (1,1))

MAIN: 1.39 µs ± 94.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
PR: 1.08 µs ± 7.32 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

@eendebakpt eendebakpt changed the title ENH: Improve performance of np.broadcast_arrays Draft: ENH: Improve performance of np.broadcast_arrays Mar 28, 2024
@eendebakpt eendebakpt changed the title Draft: ENH: Improve performance of np.broadcast_arrays ENH: Improve performance of np.broadcast_arrays Mar 28, 2024
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! To me, it seems fine that arrays are only made read-only if their shape is actually changes, so what you have would seem an improvement, even if it means the tests have to be adjusted slightly.

Some queries inside about what gives the real improvement in time.

Also, small thing, but could you ensure the indentation remains correct?

@@ -546,13 +546,12 @@ def broadcast_arrays(*args, subok=False):
# return np.nditer(args, flags=['multi_index', 'zerosize_ok'],
# order='C').itviews

args = tuple(np.array(_m, copy=None, subok=subok) for _m in args)
args = [np.array(_m, copy=None, subok=subok) for _m in args]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really help? I thought that these days list comprehension and creating a tuple via an iterator made very little speed difference, and *args should be very slightly faster for a tuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does help, although you are right the cost of list comprehensions has gone down. On Python 3.12:

In [16]: import dis
    ...: args=[1, 2, 3]
    ...: %timeit tuple(2*j for j in args)
    ...: %timeit tuple([2*j for j  in args])
298 ns ± 22.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
108 ns ± 0.781 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

So about 200ns for a small size, which does matter in the benchmarks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I confirm this. It is funny because I really thought python had solved the speed difference, but clearly I was wrong.


shape = _broadcast_shape(*args)

if all(array.shape == shape for array in args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice catch.

if all(array.shape == shape for array in args):
# Common case where nothing needs to be broadcasted.
return args
result = [array if array.shape == shape
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above, I wonder if it is actually slower to directly return tuple(array if array.shape ...)?

@seberg
Copy link
Member

seberg commented Apr 4, 2024

fine that arrays are only made read-only if their shape is actually changes

Just saw the emails and hadn't noticed this, so a brief chiming in. I am on edge about "unpredictable" readonly flag: That is the exact kind of things that the tuple vs. list return discussion was about, although I don't know if e.g. numba encodes it (or we might ever encode it in static typing).
It seems fine and is a pretty small thing but I also don't think there is any up-side. If a user knows that it doesn't change, np.broadcast_to with the unchanged array shape makes more sense.

For nditer to set this you need to pass op_flags=[["writeonly"]]*len(arrs) or so, probably. TBH, it may also make sense to move broadcast_to, broadcast_shapes or something similar to C instead (with the goal of reusing at least the core broadcasting code in nditer one day... (the code there can use some love, IMO)

(The magic nargs "if" should be <=64 now.)

EDIT: Actually, probably need "readwrite" to allow the broadcasting, I first thought it might be the other way around.

@eendebakpt
Copy link
Contributor Author

@mhvk @seberg For comparison I created #26214 which uses np.nditer to be perform the broadcasting. That PR makes all output of np.broadcast_arrays readonly, but I cannot judge wether that is acceptable (now or after some longer deprecation period).

@eendebakpt eendebakpt changed the title ENH: Improve performance of np.broadcast_arrays ENH: Improve performance of np.broadcast_arrays and np.broadcast_shapes Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants