Skip to content

Proper way to log things when using DDP #6501

Discussion options

You must be logged in to vote

Hi all,
Sorry we have not got back to you in time, let me try to answer some of your questions:

  1. Is validation_epoch_end only called on rank 0?

No, it is called by all processes

  1. What does the sync_dist flag do:

Here is the essential code:
https://github.com/PyTorchLightning/pytorch-lightning/blob/a72a7992a283f2eb5183d129a8cf6466903f1dc8/pytorch_lightning/core/step_result.py#L108-L115
If sync_dist=True then it will as default call the sync_ddp function which will sum the value across all processes using torch.distributed.all_reduce
https://github.com/PyTorchLightning/pytorch-lightning/blob/a72a7992a283f2eb5183d129a8cf6466903f1dc8/pytorch_lightning/utilities/distributed.py#L120
Use this …

Replies: 5 comments 27 replies

Comment options

You must be logged in to vote
1 reply
@jandono
Comment options

Comment options

You must be logged in to vote
2 replies
@jandono
Comment options

@rudaoshi
Comment options

Comment options

You must be logged in to vote
3 replies
@williamFalcon
Comment options

@jandono
Comment options

@jandono
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
21 replies
@12michi34
Comment options

@Alec-Stashevsky
Comment options

@SkafteNicki
Comment options

@krunolp
Comment options

@mfoglio
Comment options

Answer selected by jandono
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment