Fix h5py group naming while model saving #13477

Tbuhet · 2019-10-21T14:28:52Z

Summary

This pull request is related to issue #12195 . There is a bug while saving .h5 models. In a model, if a layer name is prefix of a previously defined layer name, then h5py will fail with error: "ValueError: Unable to create group (name already exists)".

Traceback (most recent call last):
  File "$PATH/test.py", line 12, in <module>
    model.save_weights('test.h5')
  File "$PATH/keras/keras/engine/saving.py", line 449, in save_wrapper
    save_function(obj, filepath, overwrite, *args, **kwargs)
  File "$PATH/keras/keras/engine/network.py", line 1184, in save_weights
    saving.save_weights_to_hdf5_group(f, self.layers)
  File "$PATH/keras/keras/engine/saving.py", line 748, in save_weights_to_hdf5_group
    g = group.create_group(layer.name)
  File "$PATH/keras_pr_py3/lib/python3.5/site-packages/h5py/_hl/group.py", line 68, in create_group
    gid = h5g.create(self.id, name, lcpl=lcpl, gcpl=gcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5g.pyx", line 161, in h5py.h5g.create
ValueError: Unable to create group (name already exists)

This problem is even more present in Tensorflow 2.0 because saving a model with tf.ones_like, tf.zeros_like, tf.boolean_mask, ... Tensors will lead to this error.

The first commit of this pull request sorts the model layers by name to ensure that h5py group names are strictly growing to avoid prefix issues. File: keras/engine/saving.py
The second commit of this pull request add a test described problem.
Function: test_saving_group_naming_h5py() in File: tests/test_model_saving.py

Related Issues

Fixes #12195 .

PR Overview

[y] This PR requires new unit tests [y/n] (make sure tests are included)
Included in second commit
[n] This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
[y] This PR is backwards compatible [y/n]
[n] This PR changes the current API [y/n] (all API changes need to be approved by fchollet)

1. Sort model layers before groups creation

keras/tests/test_model_saving.py (keras-team#12195) 1. Add test for h5py group_name fix (a group name couldn't be prefix to an existing group name)

gabrieldemarmiesse

Thank you for this much needed pull request!

keras/engine/saving.py

tests/test_model_saving.py

gabrieldemarmiesse · 2019-10-21T15:11:16Z

@pavithrasv It seems there is a bug in the test_mean_iou which is unrelated to this pull request. This test was added in this pull request: #13297 . Could you please take a look? Thank you.

…13477

gabrieldemarmiesse · 2019-10-22T09:54:23Z

Hi @Tbuhet , can you try to add the following to the test_mean_iou:

from flaky import flaky

@flaky(rerun_filter=lambda err, *args: issubclass(err[0], InvalidArgumentError))
def test_mean_iou():
    ....

It will hopefully fix the CI problem that this PR seems to have. Thank you.

Tbuhet · 2019-10-22T10:56:29Z

Hi @gabrieldemarmiesse , I added the modifications to the test_mean_iou

gabrieldemarmiesse · 2019-10-22T11:53:01Z

tests/keras/metrics_training_test.py

@@ -81,6 +82,7 @@ def test_sensitivity_metrics():


 @pytest.mark.skipif(K.backend() != 'tensorflow', reason='requires tensorflow')


I believe that we have too much problems with this tests for what it's worth. Can you disable it? We'll enable it again later once we find a fix.

Suggested change

@pytest.mark.skipif(K.backend() != 'tensorflow', reason='requires tensorflow')

@pytest.mark.skipif(True, reason='It is a flaky test, see #13477 for more context.')

You can also remove the flaky decorator and import.

OK I update it right now, Thanks.

gabrieldemarmiesse

Thanks!

@symbolic

* First pass. * 2nd pass. * 3rd pass * 4th pass * Revert backend rnn. * 5th pass * Quick fixes * Simplification * Update Travis config to test on TF 2 * Fix some syntax error * Fix travis issues again * Fixes * Unit test fixes * Small fix. * Tf 2: fix optimizer weights naming collision issue (keras-team#12466) * rem get_session() call from multi gpu utils * tiny fix * optimizer fixes * fix tempfile * rem os.remove * rem tmpdir * ws fix * Fix docstring of util function. * Fix docstring style * TF-2: Remove get_session() call in multi_gpu_utils.py (keras-team#12465) * rem get_session() call from multi gpu utils * tiny fix * Fix in_top_k * Simplify bidirectional test * Move TensorBoard callback to v2 -- still need to fix some tests. * Small fixes. * Fix v1 tests. * Fix PEP8. * Fix initializers and update ops. * Disable test for TF print * Fix gradient test failure. * Fix test_model_with_external_loss * Small backend simplification * Fix convrnn tests. * Fix identity init * Remove irrelevant tests. * Fix conv2d_transpose * Fix PEP8 * Disable multiprocessing tests. * Fix tests. * Fix conv_rnn bug with cntk/theano * Fix TF1 test * Adding Loss, LossFunctionWrapper, MeanSquaredError classes. (keras-team#12859) * Adding Loss, LossFunctionWrapper, MeanSquaredError classes. * Fixing formatting issues. * Adding arguments list to MeanSquaredError. * Fix abstract method * Adding MeanAbsoluteError, MeanAbsolutePercentageError, MeanSquaredLogarithmicError and BinaryCrossentropy loss classes. (keras-team#12894) * Adding MeanAbsoluteError loss * Adding Binary Crossentropy loss. * Adding MeanAbsolutePercentageError loss. * Adding MeanSquaredLogarithmicError loss. * Adding #Arguments for some classes. * remove print statements * Update image preprocessing. * Update applications. * Remove ResNeXt networks (bug) and add tests. * Adding CategoricalCrossentropy and SparseCategoricalCrossentropy Loss classes (keras-team#12903) * Add categorical crossentropy loss. * Add sparse categorical crossentropy loss. * Adding support for `Loss` instances in model compile. (keras-team#12915) * Updating training utils. * training changes for supporting Loss instances. * Add weight broadcasting. * Saving test * Add correctness test. * Fix a number of tests. * Remove outdated test * Fix PEP8 * Change defaults of GRU and LSTM layers. * Rename lr to learning_rate in optimizers * Add missing loss classes * Fix tests. * Removing @symbolic from few tf backend ops. * Creating helper function for broadcast weights. * Adding Metric class. * Adding Metric class. * Adding sample weight unit test. * Adding Mean metric class and unit tests. * Addressed comments. * Adding control dependency op. * Adding no-op control dependencies to Theano and CNTK backends. * Fixing tests for TF1. * Fix doc string. * Fix backend issues. * Adding MeanMetricWrapper class. * Adding MeanSquaredError metric. * Framework changes for metrics part 1 * Metrics framework changes part 2 * Adding metrics correctness test. * Fix integration tests * Remove references to ResNeXt from docs. * Prepare 2.2.5 release. * Fix sklearn wrapper unit test in Python 3? * Fix sklearn regressor test? * Some Theano fixes. * Make metrics compatible with Theano * Theano fixes * Fix * Fixes * Recompute steps_per_epoch after each epoch in traingin_generator (keras-team#13037) * pep8 config in setup.cfg (keras-team#13196) * Theano fixes * Update optimizers for TF2. (keras-team#13246) 1. remove epsilon and decay for all optimizers 2. add iterations into weight list into RMSProp, Adagrad, Adadelta, Nadam * Fix results tracking for metrics in multi-output case * sync changes to _TfDeviceCaptureOp (keras-team#13255) * sync changes to _TfDeviceCaptureOp * adjust for wider compatibility * Documentation for `array_to_img`, `img_to_array` and `save_img` under `preprocessing.image` keras-team#12711 (keras-team#13252) * Add docstring of `save_img()` in `keras/preprocessing/image.py` Signed-off-by: Karel Ha <mathemage@gmail.com> * Add docstring of `img_to_array()` in `keras/preprocessing/image.py` Signed-off-by: Karel Ha <mathemage@gmail.com> * Add docstring of `array_to_img()` in `keras/preprocessing/image.py` Signed-off-by: Karel Ha <mathemage@gmail.com> * Add metric API changes (keras-team#13256) * Add metric API changes part 1 * Add metric API changes part 2 * Update metric __call__ calls to update state and result calls * Changing metric call for output loss metric to update_state and result calls. * Fix metrics support in Theano * Fix PEP8 * Introduces fixes for tensor equality in TF 2.0 * Minor fixes * Improve exception testing in test_training * Improve test syntax * Remove outdated tests * Only run label smoothing logic when necessary * Fix PEP8 * Update CI to run on TF 1.14 for TF1 * Disable a backend test for CNTK * Fix docs test * CNTK fixes * Disable test that hangs Travis * Disabled flaky cntk test * Reduce test flakiness * Disable CNTK SGD test * Disable test causing Travis to hang * Disable flaky CNTK test * Disable test that hangs Travis * Disable a couple more multiprocessing tests * Add ability for Layer to track sublayer attributes * Add support for layer attribute tracking (loss, updates, metrics) in layer subclasses, and standalone weight tracking * Fix theano backend * Fix PEP8 * Adding accuracy metric classes. (keras-team#13265) * Adding Accuracy metric class. * Adding BinaryAccuracy metric class. * Adding CategoricalAccuracy metric class. * Adding SparseCategoricalAccuracy metric class. * Adding TopK Accuracy metric classes. * Fixed review comments * Add metrics Hinge, SquaredHinge, CategoricalHinge * Add label conversion to hinge losses * Adding LogCosh, Poisson, KLDivergence, crossentropy metrics. (keras-team#13271) * Adding LogCosh, Poisson, KLDivergence metrics. * Adding crossentropy metrics. * Add metrics CosineSimilarity, MeanAbsoluteError, MeanAbsolutePercentageError, MeanSquaredError, MeanSquaredLogarithmicError, RootMeanSquaredError. * Reverse sign of cosine_similarity metric * Adding TruePositives, TrueNegatives, FalsePositives, FalseNegatives metric classes. (keras-team#13280) * Adding FalsePositive metric class. * Adding TruePositives, TrueNegatives, FalseNegatives metric classes. * Adding AUC, SensitivityAtSpecificity metrics. (keras-team#13289) * Adding AUC, SensitivityAtSpecificity metrics. * Fixed failing test. * Add SpecificityAtSensitivity metric. (keras-team#13294) * Add SpecificityAtSensitivity metric. * Fixing some lint issues. * Fixing some lint issues. * Adding Precision, Recall, Mean IoU part 1. * Adding Precision, Recall, Mean IoU part 1. * Add MeanIoU metric. * Add MeanIoU metric. * Fix metrics reporting / accumulation with fit_generator and evaluate_generator. * Remove deprecated example script * Remove deprecated example, fix conv filter example * Update examples * Fix some bugs * Update examples * Addressed PR comments. * Fix Theano tests. * Disable top_k metrics for TF1 * Reenable Precision and Recall with TF1 * Fix Py2 tests * Skip metric tests for CNTK * Fix py2 test * Fix py2 test * Update coverage threshold * Add back CPU to multi_gpu_utils available devices * Using K.is_tensor and K.is_variable (keras-team#13307) is_tensor_or_variable(x) is undefined and replaced by K.is_tensor(x) or K.is_variable(x). Fixes keras-team#13306. * Update lstm_seq2seq.py(from 22% to 87% acc) (keras-team#13269) * keras-team#13266 Update lstm_seq2seq.py(from 22% to 87% acc) I added the codes for applying one-hot encoding on the end of sentences about encoder_input_data, decoder_input_data, and decoder_target_data. I added an accuracy metric for model training. The original code has 22% accuracy, but the proposed code had 87% validation accuracy. * Update lstm_seq2seq.py I update code according to PEP8. * Update lstm_seq2seq.py Remove whitespace according to PEP 8. * Update babi_rnn.py (keras-team#13263) Change line 82 so the reg exp works as it says on token above in the comment because else it gives - AttributeError: 'NoneType' object has no attribute 'strip' * typo fixed (keras-team#13230) * Correct the DepthwiseConv2d docstrings - output shape (keras-team#13225) * fix in "Layer.compute_output_shape" description (keras-team#13210) * Added batch_normalization in the numpy backend. (keras-team#11556) * Finished the functions. * Started doing the test function. * Added the batch_normalization operation to the numpy backend. * forgot an argument * Complete the docs by adding data to multi-input/output example (keras-team#12775) * Complete the docs by adding data to multi-input/output example * Add seed for reproducibility * Fix Travis SSL issue. * keras-team#13239 Improved documentation for EarlyStopping/ReduceLROnPlateau, take validation_freq into account. (keras-team#13240) * Added messages about the future of multi-backend Keras. (keras-team#13315) * Added comments about the future of Keras. * Changed msg. * Fix sequence timeout deadlock (keras-team#13322) * Add a test for deadlock after sequence worker timeout * Call task_done even if the task timeouted * catch dead worker warning * fix line length * Increase deadlock detection timeout to prevent flakiness * Fix deprecation warnings related to TF v1 * Update README * Add a link to the metrics document (keras-team#13334) Link to the metrics document(/metrics) was missing in 'Compilation' section. Added one just as other explained arguments. * Fix thread safety issue * Correct spelling mistake (keras-team#13339) * fix keras-team#13341 math_ops to K (keras-team#13342) * Fix encoding error (keras-team#13355) * Add utf-8 encoding * Fix PEP8 error * Fix PEP8 error * Fix issue where the disable_tracking decorator obfuscates layer constructors. * Fix yaml version compat issue * Update local.py docstrings (keras-team#13373) * Update local.py stride value implies the input argument of 'stride', and dilation_rate implies the input argument of 'dilation rate' of Conv1D function. It is more explicit to express as code rather than using words stride value, dilation value. Or, at least both stride and dilation_rate should be written in code, not only dilation rate as before document * Update local.py mark as code snippet * Allowed to return the image as a Jupyter Image only if the extension is not pdf (keras-team#13383). (keras-team#13384) keras/utils/vis_utils.py * Fix file leak in CSVLogger (keras-team#13378) * Fix file leak in CSVLogger * Update callbacks.py * fix: `recurrent_activation` parameter's docstring (keras-team#13401) * typo_fix (keras-team#13395) * Prepare 2.3.1 release * Added the default activation of convolutional LSTM in the docs. (keras-team#13409) * Small refactors on the keras.utils module (keras-team#13388) * Use .format calls for string interpolation on utils * Use generators over listcomps whenever possible to save memory * Bumped tf2 version to 2.0.0 (keras-team#13412) * Change `batch_size` descriptions to proper ones (keras-team#13422) * Change `batch_size` descriptions to proper ones Since there're no gradients updated during `evaulate` and `predict` processes, changed their `batch_size` docstrings from `"Number of samples per gradient update"` to `"Number of samples per evaluation step"` and `"Number of samples to be predicted at once"`. (The sentence in fit remains unchanged.) I hope this fix would change related auto-generated documents as well. * Correct `callbacks` description docstrings Corrected `callbacks` description docstrings in `evaluate_generator` and `predict_generator`: "List of callbacks to apply during training" -> "- during evaluation", "- during prediction". * Update autogen.py (keras-team#13426) fix duplicate module name for callbacks module * Update io_utils.py (keras-team#13429) I just fixed Numpy -> NumPy in HDF5Matrix class. * Update pooling.py (keras-team#13467) * Update pooling.py Added Integer at the `pool_size` of `MaxPooling3D` * Update pooling.py Add Integer in `strides` and `pool_size` of 3D layers Added "If None, it will default to `pool_size`." to be consistent with explanation of 1D, 2D layer * Update pooling.py `channels_first` ->`"channels_first"` `channels_last` ->`"channels_last"` "channels_last"->`"channels_last"` * Update core.py (keras-team#13472) `channels_first` -> `'channels_first'` `channels_last`, "channels_last" -> `'channels_last'` data_format='channels_first' -> `data_format='channels_first'` data_format='channels_last' -> `data_format='channels_last'` * Fix h5py group naming while model saving (keras-team#13477) * Update np_utils.py (keras-team#13481) * Fix too many values to unpack error (keras-team#13511) * fix too many values to unpack error In the example script lstm_seq2seq_restore.py and lstm_seq2seq.py, when parse the data using line.split("\t"), it will return 3 values rather than 2, a simple modification can fix it. * add blankspace around operator Co-authored-by: François Chollet <francois.chollet@gmail.com> Co-authored-by: Fariz Rahman <farizrahman4u@gmail.com> Co-authored-by: Pavithra Vijay <psv@google.com> Co-authored-by: Victor Kovryzhkin <vik.kovrizhkin@gmail.com> Co-authored-by: Philip May <eniak.info@gmail.com> Co-authored-by: tanzhenyu <tanzheny@google.com> Co-authored-by: Taylor Robie <taylorrobie@google.com> Co-authored-by: Karel Ha <mathemage@gmail.com> Co-authored-by: Sebastian Höffner <info@sebastian-hoeffner.de> Co-authored-by: tykimos <adam.tykim@gmail.com> Co-authored-by: Kostas <kvogiat@gmail.com> Co-authored-by: Arnout Devos <arnoutdev@gmail.com> Co-authored-by: Keunwoo Choi <gnuchoi+github@gmail.com> Co-authored-by: Alexander Ivanov <avi2011class@yandex.ru> Co-authored-by: Gabriel de Marmiesse <gabrieldemarmiesse@gmail.com> Co-authored-by: Bharat Raghunathan <bharatraghunthan9767@gmail.com> Co-authored-by: Hendrik Schreiber <hs@tagtraum.com> Co-authored-by: Andrey Zakharevich <andreyzakharevich@gmail.com> Co-authored-by: Shiv Dhar <shivdhar@gmail.com> Co-authored-by: djstrong <djstrong@gmail.com> Co-authored-by: fuzzythecat <fuzzy0427@gmail.com> Co-authored-by: Naruu <esara2021@gmail.com> Co-authored-by: ftesser <fabio.tesser@gmail.com> Co-authored-by: Gregory Morse <gregory.morse@live.com> Co-authored-by: Andrew Naguib <andrew@fci.helwan.edu.eg> Co-authored-by: Haifeng Jin <jhfjhfj1@gmail.com> Co-authored-by: Michelle Vivita <mvivita88@gmail.com> Co-authored-by: Elton Viana <eltonvs@outlook.com> Co-authored-by: Junyoung Kim <Junyoung.JK.Kim@gmail.com> Co-authored-by: Denny-Hwang <48212469+Denny-Hwang@users.noreply.github.com> Co-authored-by: Thibault Buhet <38053590+Tbuhet@users.noreply.github.com> Co-authored-by: xemcerk <lisislzx@sina.com>

Original PR: keras-team/keras#13477 Fixes: #33888 b/143774288 PiperOrigin-RevId: 292030348 Change-Id: I68032d72cf2cfd167c6ed065bce7a22d1f136011

thibault-buhet added 2 commits October 21, 2019 15:08

Fix group naming in keras/engine/saving.py (keras-team#12195)

7dee298

1. Sort model layers before groups creation

Add test test_saving_group_naming_h5py in

43ec7a4

keras/tests/test_model_saving.py (keras-team#12195) 1. Add test for h5py group_name fix (a group name couldn't be prefix to an existing group name)

gabrieldemarmiesse changed the title ~~Fix h5py group naming while model saving (#12195)~~ Fix h5py group naming while model saving Oct 21, 2019

gabrieldemarmiesse reviewed Oct 21, 2019

View reviewed changes

keras/engine/saving.py Outdated Show resolved Hide resolved

tests/test_model_saving.py Outdated Show resolved Hide resolved

Include comments from gabrieldemarmiesse for pull request keras-team#…

04fafe9

…13477

Add flaky to the test_mean_iou test

1c7377b

gabrieldemarmiesse reviewed Oct 22, 2019

View reviewed changes

gabrieldemarmiesse approved these changes Oct 22, 2019

View reviewed changes

skip test_mean_iou test

f69bd39

gabrieldemarmiesse merged commit ecac367 into keras-team:master Oct 22, 2019

vmarkovtsev mentioned this pull request Nov 4, 2019

Bug in saving model in hdf5 format tensorflow/tensorflow#33888

Closed

vcarpani mentioned this pull request Dec 6, 2019

Sort layers by name to avoid conficts in h5 saving. tensorflow/tensorflow#34892

Closed

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 29, 2020

to fix bug in h5 saving in format

0e88439

Original PR: keras-team/keras#13477 Fixes: #33888 b/143774288 PiperOrigin-RevId: 292030348 Change-Id: I68032d72cf2cfd167c6ed065bce7a22d1f136011

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix h5py group naming while model saving #13477

Fix h5py group naming while model saving #13477

Tbuhet commented Oct 21, 2019 •

edited by gabrieldemarmiesse

gabrieldemarmiesse left a comment

gabrieldemarmiesse commented Oct 21, 2019

gabrieldemarmiesse commented Oct 22, 2019

Tbuhet commented Oct 22, 2019

gabrieldemarmiesse Oct 22, 2019 •

edited

Tbuhet Oct 22, 2019

gabrieldemarmiesse left a comment

		@@ -81,6 +82,7 @@ def test_sensitivity_metrics():


		@pytest.mark.skipif(K.backend() != 'tensorflow', reason='requires tensorflow')

	@pytest.mark.skipif(K.backend() != 'tensorflow', reason='requires tensorflow')
	@pytest.mark.skipif(True, reason='It is a flaky test, see #13477 for more context.')

Fix h5py group naming while model saving #13477

Fix h5py group naming while model saving #13477

Conversation

Tbuhet commented Oct 21, 2019 • edited by gabrieldemarmiesse

Summary

Related Issues

PR Overview

gabrieldemarmiesse left a comment

Choose a reason for hiding this comment

gabrieldemarmiesse commented Oct 21, 2019

gabrieldemarmiesse commented Oct 22, 2019

Tbuhet commented Oct 22, 2019

gabrieldemarmiesse Oct 22, 2019 • edited

Choose a reason for hiding this comment

Tbuhet Oct 22, 2019

Choose a reason for hiding this comment

gabrieldemarmiesse left a comment

Choose a reason for hiding this comment

Tbuhet commented Oct 21, 2019 •

edited by gabrieldemarmiesse

gabrieldemarmiesse Oct 22, 2019 •

edited