Strange behavior of distributed client.compute vs. dask.compute when using dask-awkward, dask-histogram #8545

lgray · 2024-02-29T17:38:12Z

This is an unverified bug, so really more just startling behavior that I would like to understand.

Unfortunately the reproducer from this is high energy physicist analysis code and I have not yet been able to create a concise reproducer for this behavior. High energy physics task graphs tend to be quite large, thousands of layers, I'm not sure if this is somehow related.

Steps for a "maximal" reproducer are here:
https://gist.github.com/lgray/8a28c5fcd707a2a6778f92cd598f0ca6
I'll continue to try to find something minimal but it really takes time to figure out from non-expert code.
You'll have to setup calls to dask.compute yourself. I'm happy to help you.

Here is the performance report with client.compute:
wwz-dask-report-client-compute.html.zip

Here is the performance report on the same code and task graph for dask.compute:
wwz-dask-report-dask-compute.html.zip

In both cases I am using a local distributed client to perform the computation.

There are two major issues, one being that that taskgraph in the client.compute case doesn't appear to be optimized or is partially optimized (the number of tasks is different by 3k), and the other that the later calls to agglomerating histograms seem to be stalling for reasons I cannot deduce in the client.computecase. Along these lines the client.compute case is nearly 2x slower than the dask.compute case, and the memory usage in the client.compute case is 3x-4x than the dask.compute case.

Of note - it seems that the Client thread itself is stalling when using client.compute, since the dashboard stalled when the histogram tree-reduce starts. However, this doesn't happen in the dask.compute case, which is truly odd.

If I optimize the task graph ahead of time and pass that to client.compute the number of tasks run by client.compute makes sense but the stalling issue and corresponding compute slowdown are still there.

cc: @martindurant

The text was updated successfully, but these errors were encountered:

github-actions bot added the needs triage label Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange behavior of distributed client.compute vs. dask.compute when using dask-awkward, dask-histogram #8545

Strange behavior of distributed client.compute vs. dask.compute when using dask-awkward, dask-histogram #8545

lgray commented Feb 29, 2024 •

edited

Strange behavior of distributed client.compute vs. dask.compute when using dask-awkward, dask-histogram #8545

Strange behavior of distributed client.compute vs. dask.compute when using dask-awkward, dask-histogram #8545

Comments

lgray commented Feb 29, 2024 • edited

lgray commented Feb 29, 2024 •

edited