Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behavior of distributed client.compute vs. dask.compute when using dask-awkward, dask-histogram #8545

Open
lgray opened this issue Feb 29, 2024 · 0 comments

Comments

@lgray
Copy link

lgray commented Feb 29, 2024

This is an unverified bug, so really more just startling behavior that I would like to understand.

Unfortunately the reproducer from this is high energy physicist analysis code and I have not yet been able to create a concise reproducer for this behavior. High energy physics task graphs tend to be quite large, thousands of layers, I'm not sure if this is somehow related.

Steps for a "maximal" reproducer are here:
https://gist.github.com/lgray/8a28c5fcd707a2a6778f92cd598f0ca6
I'll continue to try to find something minimal but it really takes time to figure out from non-expert code.
You'll have to setup calls to dask.compute yourself. I'm happy to help you.

Here is the performance report with client.compute:
wwz-dask-report-client-compute.html.zip

Here is the performance report on the same code and task graph for dask.compute:
wwz-dask-report-dask-compute.html.zip

In both cases I am using a local distributed client to perform the computation.

There are two major issues, one being that that taskgraph in the client.compute case doesn't appear to be optimized or is partially optimized (the number of tasks is different by 3k), and the other that the later calls to agglomerating histograms seem to be stalling for reasons I cannot deduce in the client.computecase. Along these lines the client.compute case is nearly 2x slower than the dask.compute case, and the memory usage in the client.compute case is 3x-4x than the dask.compute case.

Of note - it seems that the Client thread itself is stalling when using client.compute, since the dashboard stalled when the histogram tree-reduce starts. However, this doesn't happen in the dask.compute case, which is truly odd.

If I optimize the task graph ahead of time and pass that to client.compute the number of tasks run by client.compute makes sense but the stalling issue and corresponding compute slowdown are still there.

cc: @martindurant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant