Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stealing sensitive tests #305

Closed
Tracked by #6993
fjetter opened this issue Sep 5, 2022 · 0 comments · Fixed by #308
Closed
Tracked by #6993

Stealing sensitive tests #305

fjetter opened this issue Sep 5, 2022 · 0 comments · Fixed by #308
Assignees

Comments

@fjetter
Copy link
Member

fjetter commented Sep 5, 2022

We'd like to have a couple of tests that are related to stealing sensitive workloads.

  1. Very simple example, currently causing problems (Root-ish tasks all schedule onto one worker dask/distributed#6573)
import time
import dask
import distributed

# dask.config.set({"distributed.scheduler.work-stealing": False})
client = distributed.Client(n_workers=4, threads_per_worker=1)
root = dask.delayed(lambda n: "x" * n)(dask.utils.parse_bytes("1MiB"), dask_key_name="root")
results = [dask.delayed(lambda *args: None)(root, i) for i in range(10000)]
dask.compute(results)
  1. Upscaling situation (example from Poor work scheduling when cluster adapts size dask/distributed#4471)
import time
import distributed
import dask.array as da
import dask


client = distributed.Client(n_workers=1, threads_per_worker=1, memory_limit='1G')
print(client.dashboard_link)

# Slow task.
def func1(chunk):
    if sum(chunk.shape) != 0: # Make initialization fast
        time.sleep(5)
    return chunk

def func2(chunk):
    return chunk

data = da.zeros((30, 30, 30), chunks=5)
result = data.map_overlap(func1, depth=1, dtype=data.dtype)
result = result.map_overlap(func2, depth=1, dtype=data.dtype)
future = client.compute(result)

print('started computation')

time.sleep(11)
# print('scaling to 4 workers')
# client.cluster.scale(4)

time.sleep(5)
print('scaling to 20 workers')
client.cluster.scale(20)

_ = future.result()
  1. An example that shows a very inhomogeneous compute. This should mimic a long tail in a computation graph where work stealing should reduce wall time, e.g. by using an asymmetric distribution of sleep values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants