Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix resume logger #3375

Merged
merged 1 commit into from Jan 24, 2022
Merged

fix resume logger #3375

merged 1 commit into from Jan 24, 2022

Conversation

irasit
Copy link
Collaborator

@irasit irasit commented Jan 20, 2022

Signed-off-by: Peng Zhang pengz@uber.com

Description

The comet logger will fail in local if change save_dir without resume experience.
Need to always resume the experience with the new save_dir path in arguments.

@@ -123,15 +123,15 @@ def train(serialized_model):
train_logger = TensorBoardLogger(logs_path)
print(f"Setup logger: Using TensorBoardLogger: {train_logger}")

elif isinstance(logger, CometLogger) and logger._experiment_key is None:
# Resume logger experiment key if passed correctly from CPU.
elif isinstance(logger, CometLogger) and logger_experiment_key is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean "is not None"? this looks like it's creating a new CometLogger with an empty experiment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, yes, should be is not none.

@github-actions
Copy link

github-actions bot commented Jan 20, 2022

Unit Test Results

   438 files   -    364     438 suites   - 364   6h 15m 21s ⏱️ - 2h 25m 27s
   717 tests ±       0     576 ✔️  -      96     140 💤 +     95  1 +1 
9 557 runs   - 7 767  6 457 ✔️  - 5 781  3 099 💤  - 1 987  1 +1 

For more details on these failures, see this check.

Results for commit 633c82d. ± Comparison against base commit a5edcd0.

This pull request skips 95 tests.
test.parallel.test_adasum_pytorch.TorchAdasumTests ‑ test_orthogonal
test.parallel.test_adasum_pytorch.TorchAdasumTests ‑ test_parallel
test.parallel.test_mxnet1.MX1Tests ‑ test_gluon_trainer
test.parallel.test_mxnet1.MX1Tests ‑ test_gpu_required
test.parallel.test_mxnet1.MX1Tests ‑ test_horovod_allreduce_cpu_gpu_error
test.parallel.test_mxnet1.MX1Tests ‑ test_horovod_grouped_allreduce_cpu_gpu_error
test.parallel.test_mxnet2.MX2Tests ‑ test_allgather_object
test.parallel.test_mxnet2.MX2Tests ‑ test_broadcast_object
test.parallel.test_mxnet2.MX2Tests ‑ test_compression_fp16
test.parallel.test_mxnet2.MX2Tests ‑ test_gluon_trainer
…

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Jan 20, 2022

Unit Test Results (with flaky tests)

     480 files   -    410     480 suites   - 410   7h 22m 20s ⏱️ - 1h 58m 34s
     717 tests ±       0     576 ✔️  -      95     140 💤 +     95  1 ±0 
10 501 runs   - 8 929  7 109 ✔️  - 6 468  3 387 💤  - 2 465  5 +4 

For more details on these failures, see this check.

Results for commit 633c82d. ± Comparison against base commit a5edcd0.

This pull request skips 95 tests.
test.parallel.test_adasum_pytorch.TorchAdasumTests ‑ test_orthogonal
test.parallel.test_adasum_pytorch.TorchAdasumTests ‑ test_parallel
test.parallel.test_mxnet1.MX1Tests ‑ test_gluon_trainer
test.parallel.test_mxnet1.MX1Tests ‑ test_gpu_required
test.parallel.test_mxnet1.MX1Tests ‑ test_horovod_allreduce_cpu_gpu_error
test.parallel.test_mxnet1.MX1Tests ‑ test_horovod_grouped_allreduce_cpu_gpu_error
test.parallel.test_mxnet2.MX2Tests ‑ test_allgather_object
test.parallel.test_mxnet2.MX2Tests ‑ test_broadcast_object
test.parallel.test_mxnet2.MX2Tests ‑ test_compression_fp16
test.parallel.test_mxnet2.MX2Tests ‑ test_gluon_trainer
…

♻️ This comment has been updated with latest results.

Signed-off-by: Peng Zhang <pengz@uber.com>
@irasit irasit merged commit e0e982f into master Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants