New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TF - Fix] Fix imports from tensorflow.python.keras with tf.__version__ >= 2.6.0 #3403
Conversation
Unit Test Results 766 files - 7 766 suites - 7 9h 0m 7s ⏱️ - 6m 52s For more details on these failures, see this check. Results for commit 201c0a9. ± Comparison against base commit 046c071. ♻️ This comment has been updated with latest results. |
Unit Test Results (with flaky tests) 936 files + 61 936 suites +61 9h 48m 48s ⏱️ - 14m 57s For more details on these failures, see this check. Results for commit 201c0a9. ± Comparison against base commit 046c071. ♻️ This comment has been updated with latest results. |
I will check the failed test locally, for example
|
I can reproduce the failure locally and get it fixed by pinning
|
b77cf6f
to
414f744
Compare
It looks like CI should increase timeout for
|
554e7ab
to
0cfdfe3
Compare
Create issue #3417 to track known failed tests on CI, though I cannot reproduce them locally. |
@chongxiaoc looks good, you are planning to fix the Spark 2.4.x issues in this PR, right? |
@EnricoMi |
Run this to build the docker image locally:
|
Finally got the reproducer in the docker container: Error stack:
|
@EnricoMi @tgaddair I can reproduce the error in the CI docker container, but outside container using The error seems unfamiliar to me and related to decode/code in Horovod and Spark 2. I suggest to land this PR to unblock TF 2.6+ support, since Spark 2 is a minor config in our CI. What do you think? |
That only tells you that the driver failed to read from the workers. The workers all fail with:
|
So worker failed with
|
Okay, I got it fixed in local docker CI image (at least for TF 2.6.2).
|
It looks like TF head and Lightning head are breaking the CI but I think that's irrelevant to this PR. |
240c541
to
6e4b8f5
Compare
6e4b8f5
to
d47fe91
Compare
@tgaddair Added |
TF nightly failures are tracked here: #3422 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, but good!
@chongxiaoc was pinning pytorch_lightning to 1.3.8 really necessary here? It would be great if the setup is a bit more flexible such that we can use a more recent version of lightning. |
Lets see what works in our CI: #3480 |
@JoostvDoorn When testing this PR, I remember the incompatibility only showed up for lighting head |
@chongxiaoc any chance we can get this working for |
@EnricoMi |
@JoostvDoorn Horovod works fine with pytorch_lightning up to 1.5.9, see #3480. In which way does the 1.3.8 affect you? Are you referring to the released images on docker hub? |
@EnricoMi I was installing horovod through pypi, and this will downgrade pytorch_lightning because of the line |
@JoostvDoorn I see, what about this: 58bc4bb I suspect |
Checklist before submitting
Description
See Tensorflow 2.6.0 release notes for Keras package change:
tf.keras:
Keras been split into a separate PIP package (keras), and its code has been moved to the GitHub repositorykeras-team/keras. The API endpoints for tf.keras stay unchanged, but are now backed by the keras PIP package. The existing code in tensorflow/python/keras is a staled copy and will be removed in future release (2.7). Please remove any imports to tensorflow.python.keras and replace them with public tf.keras API instead.
Review process to land