Skip to content

Commit

Permalink
NcclManager sort first by global rank, not device ID
Browse files Browse the repository at this point in the history
  • Loading branch information
jeffdaily committed Jan 20, 2020
1 parent 6817e26 commit c954406
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions tensorflow/core/nccl/nccl_manager.cc
Expand Up @@ -250,14 +250,14 @@ string NcclManager::GenerateCommunicatorKey() {

Status NcclManager::GetCommunicator(NcclManager::Collective* collective,
NcclManager::Communicator** communicator) {
// Sort by device ID to make ordering of participants deterministic.
// Sort by global rank to make ordering of participants deterministic.
std::sort(collective->participants.begin(), collective->participants.end(),
[](const std::unique_ptr<Participant>& a,
const std::unique_ptr<Participant>& b) {
if (a->gpu_device_id == b->gpu_device_id) {
if (a->global_rank == b->global_rank) {
return a->executor < b->executor;
}
return a->gpu_device_id < b->gpu_device_id;
return a->global_rank < b->global_rank;
});

mutex_lock l(mu_);
Expand Down

0 comments on commit c954406

Please sign in to comment.