Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TF - Fix] Fix imports from tensorflow.python.keras with tf.__version__ >= 2.5.0 #3157

Closed

Conversation

DEKHTIARJonathan
Copy link
Collaborator

@DEKHTIARJonathan DEKHTIARJonathan commented Sep 8, 2021

Since Tensorflow 2.5, any import to tensorflow.python.keras is to be considered unstable / unsupported.

As a matter of fact, if you check the wheel manifest, starting with 2.5.0, Tensorflow directly depends on Keras:

Requires-Dist: numpy (~=1.19.2)
Requires-Dist: absl-py (~=0.10)
Requires-Dist: astunparse (~=1.6.3)
Requires-Dist: flatbuffers (~=1.12.0)
Requires-Dist: google-pasta (~=0.2)
Requires-Dist: h5py (~=3.1.0)
Requires-Dist: keras-preprocessing (~=1.1.2)
Requires-Dist: opt-einsum (~=3.3.0)
Requires-Dist: protobuf (>=3.9.2)
Requires-Dist: six (~=1.15.0)
Requires-Dist: termcolor (~=1.1.0)
Requires-Dist: typing-extensions (~=3.7.4)
Requires-Dist: wheel (~=0.35)
Requires-Dist: wrapt (~=1.12.1)
Requires-Dist: gast (==0.4.0)
Requires-Dist: tensorboard (~=2.5)
Requires-Dist: tensorflow-estimator (<2.6.0,>=2.5.0rc0)
Requires-Dist: keras-nightly (~=2.5.0.dev)                          # <============== Here
Requires-Dist: grpcio (~=1.34.0)

Due to the backward compatibility nature of HVD, we can't just remove the code. Hence, I am introducing conditional imports based on the value of tf.__version__

Copy link
Collaborator

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The only question I have is how this relates to horovod.keras vs horovod.tensorflow.keras. We created the latter because TF moved Keras into TF natively, but now they're moving it back to its own repo. Should users just use horovod.keras going forward?

@github-actions
Copy link

github-actions bot commented Sep 8, 2021

Unit Test Results

   385 files  ±0     385 suites  ±0   5h 22m 0s ⏱️ + 8m 9s
   705 tests ±0     564 ✔️  -   2     139 💤 ±  0  2 +2 
8 474 runs  ±0  5 801 ✔️ +63  2 671 💤  - 65  2 +2 

For more details on these failures, see this check.

Results for commit 4660683. ± Comparison against base commit 9cd8f79.

♻️ This comment has been updated with latest results.

@DEKHTIARJonathan
Copy link
Collaborator Author

@tgaddair : The only question I have is how this relates to horovod.keras vs horovod.tensorflow.keras. We created the latter because TF moved Keras into TF natively, but now they're moving it back to its own repo. Should users just use horovod.keras going forward?

IMHO if the "APIs" we depend on can exclusively use "import keras", then we should unify them going forward. Though code backward compat may not be straight forward. If we do this, let's do this in a subsequent PR.

@DEKHTIARJonathan
Copy link
Collaborator Author

DEKHTIARJonathan commented Sep 8, 2021

@tgaddair let me give a bit of clarification, there used to be 3 ways to import Keras:

  1. import keras
  2. import tensorflow.keras
  3. import tensorflow.python.keras # Should not be used anymore

Solution 3) is 100% deprecated and will be removed in TF 2.7 or 2.8.
Solution 2) is actually identical to 1) since TF 2.6.0: tensorflow/tensorflow@2e6acf753381


So knowing this, I think we should stop doing from tensorflow import keras and just go for import keras.
I'm going to edit my PR in this direction, and we can have a subsequent PR to unify horovod.keras and horovod.tensorflow.keras.

Does it sound fair to you ?

@DEKHTIARJonathan
Copy link
Collaborator Author

DEKHTIARJonathan commented Sep 8, 2021

I fixed my PR everywhere except for Spark stuff. This part of the code is very convoluted, with "normal Keras" and "TF Keras". Which essentially now are the same.

Horovod now dynamically decides what import keras means depending of context:

@DEKHTIARJonathan DEKHTIARJonathan force-pushed the tensorflow_keras_fix branch 3 times, most recently from f3fdcad to b2b8d3c Compare September 8, 2021 22:38
@chongxiaoc
Copy link
Collaborator

I fixed my PR everywhere except for Spark stuff. This part of the code is very convoluted, with "normal Keras" and "TF Keras". Which essentially now are the same.

I can take a look of Spark estimator stuff.

@chongxiaoc
Copy link
Collaborator

After using this diff, keras module's load_model function is broken as below:

Sun Sep 26 22:19:57 2021[0]<stderr>:  File "/Users/chongxiaoc/build/hvd-py3.7/lib/python3.7/site-packages/horovod-0.22.1-py3.7-macosx-11-x86_64.egg/horovod/spark/task/__init__.py", line 61, in task_exec
Sun Sep 26 22:19:57 2021[0]<stderr>:    result = fn(*args, **kwargs)
Sun Sep 26 22:19:57 2021[0]<stderr>:  File "/Users/chongxiaoc/build/hvd-py3.7/lib/python3.7/site-packages/horovod-0.22.1-py3.7-macosx-11-x86_64.egg/horovod/spark/keras/remote.py", line 123, in train
Sun Sep 26 22:19:57 2021[0]<stderr>:    serialized_model, lambda x: hvd.load_model(x))
Sun Sep 26 22:19:57 2021[0]<stderr>:  File "/Users/chongxiaoc/build/hvd-py3.7/lib/python3.7/site-packages/horovod-0.22.1-py3.7-macosx-11-x86_64.egg/horovod/spark/keras/remote.py", line 284, in deserialize_keras_model
Sun Sep 26 22:19:57 2021[0]<stderr>:    return load_model_fn(f)
Sun Sep 26 22:19:57 2021[0]<stderr>:  File "/Users/chongxiaoc/build/hvd-py3.7/lib/python3.7/site-packages/horovod-0.22.1-py3.7-macosx-11-x86_64.egg/horovod/spark/keras/remote.py", line 123, in <lambda>
Sun Sep 26 22:19:57 2021[0]<stderr>:    serialized_model, lambda x: hvd.load_model(x))
Sun Sep 26 22:19:57 2021[0]<stderr>:  File "/Users/chongxiaoc/build/hvd-py3.7/lib/python3.7/site-packages/horovod-0.22.1-py3.7-macosx-11-x86_64.egg/horovod/tensorflow/keras/__init__.py", line 249, in load_model
Sun Sep 26 22:19:57 2021[0]<stderr>:    return _impl.load_model(keras, wrap_optimizer, _OPTIMIZER_MODULES, filepath, custom_optimizers, custom_objects)
Sun Sep 26 22:19:57 2021[0]<stderr>:  File "/Users/chongxiaoc/build/hvd-py3.7/lib/python3.7/site-packages/horovod-0.22.1-py3.7-macosx-11-x86_64.egg/horovod/_keras/__init__.py", line 207, in load_model
Sun Sep 26 22:19:57 2021[0]<stderr>:    return keras.models.load_model(filepath, custom_objects=horovod_objects)
Sun Sep 26 22:19:57 2021[0]<stderr>:  File "/Users/chongxiaoc/build/hvd-py3.7/lib/python3.7/site-packages/keras/models.py", line 225, in load_model
Sun Sep 26 22:19:57 2021[0]<stderr>:    f = h5py.File(filepath, mode='r')
Sun Sep 26 22:19:57 2021[0]<stderr>:  File "/Users/chongxiaoc/build/hvd-py3.7/lib/python3.7/site-packages/h5py/_hl/files.py", line 395, in __init__
Sun Sep 26 22:19:57 2021[0]<stderr>:    name = filename_encode(name)
Sun Sep 26 22:19:57 2021[0]<stderr>:  File "/Users/chongxiaoc/build/hvd-py3.7/lib/python3.7/site-packages/h5py/_hl/compat.py", line 111, in filename_encode
Sun Sep 26 22:19:57 2021[0]<stderr>:    filename = fspath(filename)
Sun Sep 26 22:19:57 2021[0]<stderr>:TypeError: expected str, bytes or os.PathLike object, not File

I cannot run keras_spark_mnist.py successfully, though the backed is pointed to tensorflow.keras, which is correct as expected.

@chongxiaoc
Copy link
Collaborator

Test in #3183:

  • Remove support of bare Keras in Spark estimator
  • Replace all import keras with import tensorflow.keras as keras

@github-actions
Copy link

github-actions bot commented Oct 22, 2021

Unit Test Results (with flaky tests)

   393 files  +    4     393 suites  +4   5h 38m 48s ⏱️ + 8m 11s
   705 tests ±    0     564 ✔️  -     2     139 💤 ±  0  2 +2 
8 702 runs  +114  6 003 ✔️ +172  2 693 💤  - 64  6 +6 

For more details on these failures, see this check.

Results for commit 4660683. ± Comparison against base commit 9cd8f79.

♻️ This comment has been updated with latest results.

@chongxiaoc
Copy link
Collaborator

It looks like the version switching point should be tf 2.6.0.

I saw test_spark_keras.py::SparkKerasTests::test_model_serialization serialization failure for TF2, I will create a separate ticket for it.

@stale
Copy link

stale bot commented Dec 25, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Dec 25, 2021
@stale stale bot closed this Jan 1, 2022
@EnricoMi
Copy link
Collaborator

EnricoMi commented Jan 2, 2022

What is the reason this did not make it into master? Is this still a desirable fix?

@chongxiaoc
Copy link
Collaborator

I think TF 2.6.0 is still having failures. @EnricoMi

@chongxiaoc
Copy link
Collaborator

Continue and test in #3403

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

4 participants