You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based the problems from this issue: #1297
By implementing join context (https://pytorch.org/tutorials/advanced/generic_join.html) for our distributed syncronization we would remove the limitation that to correctly calculate a metric the number of samples needs to be divisible by num_gpus * batch_size (because pytorch by default is adding additional samples to load balance).
Pitch
Base class should derive from Joinable class and implement appropriate methods. It should hopefully not be too much trouble as all the sync logic is already encapsulated in a function.
Alternatives
Additional context
The text was updated successfully, but these errors were encountered:
馃殌 Feature
Motivation
Based the problems from this issue: #1297
By implementing
join
context (https://pytorch.org/tutorials/advanced/generic_join.html) for our distributed syncronization we would remove the limitation that to correctly calculate a metric the number of samples needs to be divisible bynum_gpus * batch_size
(because pytorch by default is adding additional samples to load balance).Pitch
Base class should derive from
Joinable
class and implement appropriate methods. It should hopefully not be too much trouble as all the sync logic is already encapsulated in a function.Alternatives
Additional context
The text was updated successfully, but these errors were encountered: