Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add all-to-all benchmark #760

Open
wants to merge 44 commits into
base: branch-0.21
Choose a base branch
from

Conversation

pentschev
Copy link
Member

@pentschev pentschev commented Jul 9, 2021

To run the benchmark in single-node:

$ python benchmark_multiple_processes_all_to_all.py
[('10.33.225.163', 49583) -> 57555] Transferred bytes: 30.00 MiB, average bandwidth: 1.27 GiB/s, median bandwidth: 1.27 GiB/s
[('10.33.225.163', 57555) -> 49583] Transferred bytes: 30.00 MiB, average bandwidth: 1.00 GiB/s, median bandwidth: 1.00 GiB/s

To run each process separately, allowing multi-node as well:

# Monitor process, used to synchronize workers only
# In this case it will wait for 2 worker processes to connect
$ python benchmark_multiple_processes_all_to_all.py --monitor --listen-interface enp1s0f0 --port 12345 --num-workers 2 --communication-lib ucx
Monitor listening at 10.33.227.163:12345
# Worker 1
$ python benchmark_multiple_processes_all_to_all.py --worker --listen-interface enp1s0f0 --endpoints-per-worker 2 --communication-lib ucx --monitor-address 10.33.227.163:12345
[10.33.227.163:58306 -> 10.33.227.163:56515] Transferred bytes: 60.00 MiB, average bandwidth: 354.44 MiB/s, median bandwidth: 354.44 MiB/s
# Worker 2
$ python benchmark_multiple_processes_all_to_all.py --worker --listen-interface enp1s0f0 --endpoints-per-worker 2 --communication-lib ucx --monitor-address 10.33.227.163:12345
[10.33.227.163:56515 -> 10.33.227.163:58306] Transferred bytes: 60.00 MiB, average bandwidth: 356.47 MiB/s, median bandwidth: 356.47 MiB/s

@pentschev pentschev requested a review from a team as a code owner July 9, 2021 14:33
@quasiben
Copy link
Member

quasiben commented Jul 9, 2021

cc @jakirkham @gjoseph92 you both may find this PR interesting for running networking experiments which compare Tornado, UCX, and Asyncio

This prevets asyncio.iscoroutinefunction from returning False in Python < 3.8
@pentschev
Copy link
Member Author

rerun tests

@jakirkham
Copy link
Member

It might be interesting to try with uvloop as well. Since the event loop is largely handled by libuv in C, would expect that performs better than asyncio alone

@pentschev
Copy link
Member Author

Thanks for the suggestion @jakirkham , I added that now.

@pentschev
Copy link
Member Author

rerun tests

1 similar comment
@pentschev
Copy link
Member Author

rerun tests

@mrocklin
Copy link
Collaborator

I will be curious to see the results that come out of this. If anyone has anything preliminary that they want to share I highly encourage that :)

@@ -0,0 +1,240 @@
import argparse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this in tests/ or would it make sense to include in benchmarks/?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still thinking about it. I don't want it in tests, but I want a test there (for the UCX part only). However, a lot of the code is going to be shared and we don't have a good place currently where that common code would be visible to both. I'm not even sure the non-UCX code should live in this repo as we'll soon upstream it to OpenUCX, so it doesn't really make sense to have non-UCX code there. I'm still thinking of an appropriate place for this, if you have any suggestions, please let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense. Will think about it as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

None yet

4 participants