-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Horovod Ops not XLA compatible #2590
Comments
Hey @jtchilders, thanks for raising this issue. This is a known incompatibility between XLA and Horovod at the moment, we're actively working with Nvidia to come up with a way to make AsyncOps work with Horovod. We'll use this issue to track. |
What is the current best practice for distributed training with hovorod on xla compiled code? Is there any workaround? |
Currently, FYI. I already have finished XLA implementations of some Horovod ops (not yet upstreamed). So, with the new implementations, |
I still hit the following error, even with those environment settings
|
You need to replace all of your |
@tgaddair shall we close ? |
For clarity, this PR completes only There are other Horovod ops to be added. ^^" I noticed in the description in the issue report that Currently I don't have a plan to implement all of the Horovod ops for XLA. However, I have two more ops implemented in my local repo, |
Environment:
I've run into errors when trying to XLA compile my Tensorflow train/test steps. In my custom model, if I use
to force compilation of the training operations I can run successfully without Horovod with 1 process. Then when I try to run with Horovod, I receive errors like:
You can see my code here:
https://github.com/jtchilders/atlas_dgcnn
I can reproduce the issue using your example:
examples/tensorflow2_mnist.py
by simply changing
@tf.function
to@tf.function(jit_compile=True)
And if I run
mpirun -n $RANKS -npernode $PPN python tensorflow2_mnist.py
I see a similar error like this:
The text was updated successfully, but these errors were encountered: