Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in-place broadcast for TensorFlow #3128
Add in-place broadcast for TensorFlow #3128
Changes from all commits
90d7564
1ecaf2a
bccbade
c9abe2b
f3627fb
5747673
206ebf8
dd942ff
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive question: Which TF 2.6 API does this depend on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ashahab,
that's not a naive question at all.
It's not primarily a specific API that this depends on, rather at least versions 2.2 through 2.5 of TensorFlow were buggy in the sense that resource variables did not work in custom ops built outside of the TF source trees (like the Horovod library). Here is one example issue highlighting the problem: tensorflow/tensorflow#48058 A fix came in eventually via this PR: tensorflow/tensorflow#47072
In addition to that we do in fact need some APIs to handle resource variables, which were originally internal (everything from
training_op_helpers.h
in particular). See this issue for some context: tensorflow/tensorflow#27899 Eventually these header files were included in the public TF packages on PyPI and the symbols became available also through a separate library_pywrap_tensorflow_internal.so
. I don't know in which version exactly this was resolved, but in combination with the first problem, I think we need at least TF 2.6...