fix tracking #361

pacman100 · 2022-05-12T14:45:46Z

What does this PR do?

When dictionary of metrics is provided to the accelerate tracker, tensorboard wasn't able to log it. Example with dict of metrics is in this transformers script run_glue_no_trainer.py. Fixed support for dictionary of int/float metrics
Fixing examples so that run is created only for the main process else wandb will create num_processes runs with no data.
comet-ml logging wasn't helpful as everything was being logged in others log_others. No plots were being created. Fixing it to be able to get plots for metrics using log_metric and log_metrics.

Post these changes, below are the plots for the run_glue_no_trainer.py script on MRPC task using all trackers.

HuggingFaceDocBuilderDev · 2022-05-12T15:04:14Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for looking into this!

sgugger · 2022-05-12T15:10:11Z

examples/by_feature/fsdp_with_peak_mem_tracking.py

@@ -293,7 +294,7 @@ def collate_fn(examples):
                    {
                        "accuracy": eval_metric["accuracy"],
                        "f1": eval_metric["f1"],
-                        "train_loss": total_loss,
+                        "train_loss": total_loss.item() if type(total_loss) == torch.Tensor else total_loss,


Do we really have either a Tensor or something else? Can't it always be .item() here?

sgugger · 2022-05-12T15:10:18Z

examples/by_feature/tracking.py

        if args.with_tracking:
            accelerator.log(
                {
                    "accuracy": eval_metric["accuracy"],
                    "f1": eval_metric["f1"],
-                    "train_loss": total_loss,
+                    "train_loss": total_loss.item() if type(total_loss) == torch.Tensor else total_loss,


Same comment

sgugger · 2022-05-12T15:10:26Z

examples/complete_nlp_example.py

@@ -245,9 +246,10 @@ def collate_fn(examples):
                {
                    "accuracy": eval_metric["accuracy"],
                    "f1": eval_metric["f1"],
-                    "train_loss": total_loss,
+                    "train_loss": total_loss.item() if type(total_loss) == torch.Tensor else total_loss,


sgugger · 2022-05-12T15:49:59Z

Oh, also make sure to edit the title of the PR ;-)

muellerzr

LG2M, will handle looking at the failing diff test tomorrow morning, as not sure quite what's going on there.

pacman100 added 2 commits May 12, 2022 20:00

fixing trackers

5efdd4b

quality

46fc439

pacman100 requested review from muellerzr and sgugger May 12, 2022 14:45

pacman100 added 2 commits May 12, 2022 20:22

bug fix

ae6d981

bug fix

adf967c

sgugger reviewed May 12, 2022

View reviewed changes

addressing comments and fixing tests

f8c156b

pacman100 changed the title ~~Smangrul/fix tracking~~ fix tracking May 12, 2022

muellerzr approved these changes May 12, 2022

View reviewed changes

Fixing script diff test

ae80065

pacman100 merged commit 4736c75 into huggingface:main May 13, 2022

pacman100 deleted the smangrul/fix-tracking branch May 13, 2022 11:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix tracking #361

fix tracking #361

pacman100 commented May 12, 2022

HuggingFaceDocBuilderDev commented May 12, 2022 •

edited

sgugger left a comment

sgugger May 12, 2022

sgugger May 12, 2022

sgugger May 12, 2022

sgugger commented May 12, 2022

muellerzr left a comment

fix tracking #361

fix tracking #361

Conversation

pacman100 commented May 12, 2022

What does this PR do?

HuggingFaceDocBuilderDev commented May 12, 2022 • edited

sgugger left a comment

Choose a reason for hiding this comment

sgugger May 12, 2022

Choose a reason for hiding this comment

sgugger May 12, 2022

Choose a reason for hiding this comment

sgugger May 12, 2022

Choose a reason for hiding this comment

sgugger commented May 12, 2022

muellerzr left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented May 12, 2022 •

edited