Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid overhead in 2-D (or higher) hist operations #3443

Open
SimonHeybrock opened this issue May 7, 2024 · 1 comment
Open

Avoid overhead in 2-D (or higher) hist operations #3443

SimonHeybrock opened this issue May 7, 2024 · 1 comment
Labels
optimisation Increases performance (hopefully)

Comments

@SimonHeybrock
Copy link
Member

hist uses bin when more than 1 dimension is involved. If there are many auxiliary coordinates that do not participate in the bin or subsequent hist operation then bin has to handle them, i.e., copy all the elements, etc. This can become costly:

import scipp as sc

da = sc.data.table_xyz(100_000_000)
da.variances = da.values
da.masks["mask1"] = da.coords["y"] > 0.5 * sc.Unit("m")
da.masks["mask2"] = da.coords["z"] > 0.5 * sc.Unit("m")

dummy = [f"dummy{i}" for i in range(10)]
for name in dummy:
    da.coords[name] = da.coords["x"].copy()

x = sc.linspace('x', 0, 1, 14*32+1, unit='m')
da.hist(x=x, y=100)  # 2 s
da.drop_coords(dummy).hist(x=x, y=100)  # 1 s

It should be simply do avoid this by changing the implementation of hist on the Python side.

@jokasimr
Copy link
Contributor

jokasimr commented May 7, 2024

Ping #3439

@jokasimr jokasimr added the optimisation Increases performance (hopefully) label May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimisation Increases performance (hopefully)
Projects
Status: Triage
Development

No branches or pull requests

2 participants