Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: faster chebyshev distance with ndarray.max #20561

Open
lozanodorian opened this issue Apr 23, 2024 · 2 comments
Open

ENH: faster chebyshev distance with ndarray.max #20561

lozanodorian opened this issue Apr 23, 2024 · 2 comments
Labels
enhancement A new feature or improvement scipy.spatial

Comments

@lozanodorian
Copy link

lozanodorian commented Apr 23, 2024

Is your feature request related to a problem? Please describe.

Context

The Chebyshev distance $\mathrm{d}(u, v) \overset{\text{def}}{=} \underset{1\leq i \leq n}{\max}\left|u_i-v_i\right|$ can be computed in scipy.spatial.distance by using chebyshev(u, v).
The last line of this chebyshev function is:

return max(abs(u - v))

where u and v are two NDArrays.

Benchmark

A basic benchmark shows that for len(a) > 13, the use of max(a) is less efficient than the use of np.max(a) or the use of a.max().

scipy1

In this benchmark:

  • a = np.random.random(n),
  • each point quantifies the time of 20 computations of the considered max function.

Describe the solution you'd like.

Replace:

return max(abs(u - v))

By:

return abs(u - v).max()

At least if len(u) > something.

Describe alternatives you've considered.

No response

Additional context (e.g. screenshots, GIFs)

Code of the benchmark

The code of the benchmark is short, we include it in this additional context:

import perfplot
import numpy as np

perfplot.show(
    setup = lambda n: np.random.random(n),
    kernels=[
            lambda a : [max(a) for _ in range(20)],
            lambda a : [np.max(a) for _ in range(20)],
            lambda a : [a.max() for _ in range(20)],
            ],
    labels=['max', 'np.max', 'ndarray.max'],
    n_range=[int(1.2**k) for k in range(10, 92, 2)],
    xlabel='len(a)',
    title='Comparison of time for max, np.max, and ndarray.max of a NumPy array'
    )

Additional remark

Note that the differences of time for the abs functions/method are negligible (ndarray.abs is here ndarray.__abs__).
scipy2

@lozanodorian lozanodorian added the enhancement A new feature or improvement label Apr 23, 2024
@lozanodorian lozanodorian changed the title ENH: Chebyshev distance ENH: faster chebyshev distance with ndarray.max Apr 23, 2024
@tupui
Copy link
Member

tupui commented Apr 23, 2024

Thank you @lozanodorian. Want to make a PR?

np.max(a) should be preferred to make it future proof to work with the Array API standard as oppose to a.max().

@lozanodorian
Copy link
Author

Thank you @tupui, I'm going to make a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.spatial
Projects
None yet
Development

No branches or pull requests

4 participants
@tupui @j-bowhay @lozanodorian and others