Add support for WandbLogger #1176

AyushExel · 2021-04-14T14:46:41Z

Before submitting (checklist)

Description

This PR adds support for WandbLogger that enables logging metrics and media to W&B dashboard

Related Issue

Type of Change

Examples / docs / tutorials / contributors update
Bug fix (non-breaking change which fixes an issue)
Improvement (non-breaking change which improves an existing feature)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Additional Deatils:

The minimum tests colab seemed stuck while running tests, but the tests passed on my machine. I'll update this thread with test results
I've made this draft PR as I still want to confirm the hyperparameter logging behavior. On running the test_finetune2.py train method, no hyperparameters are being logged.

Test logs

Code style
--> catalyst-make-codestyle && catalyst-check-codestyle

python3.8/site-packages/isort/settings.py:619: UserWarning: Failed to pull configuration information from /home/saksham/Desktop/catalyst/setup.cfg
  warn(f"Failed to pull configuration information from {potential_config_file}")
Skipped 55 files
python3.8/site-packages/isort/main.py:1000: UserWarning: W0501: The following deprecated CLI flags were used and ignored: --apply!
  warn(python3.8/site-packages/isort/main.py:1004: UserWarning: W0500: Please see the 5.0.0 Upgrade guide: https://pycqa.github.io/isort/docs/upgrade_guides/5.0.0/
  warn(
All done! ✨ 🍰 ✨
350 files left unchanged.
python3.8/site-packages/isort/settings.py:619: UserWarning: Failed to pull configuration information from /home/catalyst/setup.cfg
  warn(f"Failed to pull configuration information from {potential_config_file}")
Skipped 55 files
All done! ✨ 🍰 ✨
350 files would be left unchanged.
Failed to pull configuration information from home/catalyst/setup.cfg
0

Docs check
-->rm -rf ./builds; REMOVE_BUILDS=0 make check-docs

reading sources... [100%] tutorials/ddp                                                                                                                                                              
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] tutorials/ddp                                                                                                                                                               
generating indices...  genindex py-modindexdone
highlighting module code... [100%] torch.utils.data.sampler                                                                                                                                          
writing additional pages...  search/home/saksham/anaconda3/envs/catalyst_dev/lib/python3.8/site-packages/catalyst_sphinx_theme/search.html:21: RemovedInSphinx30Warning: To modify script_files in the theme is deprecated. Please insert a <script> tag directly in your theme instead.
  {% trans %}Please activate JavaScript to enable the search
done
copying static files... ... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded.

The HTML pages are in builds.
#### CODE: 0 ####

Tests
--> pytest .

337 passed, 134 skipped, 2 xfailed, 93 warnings in 439.09s (0:07:19)

@Scitator Let me know if I missed any steps here

FAQ

Please review the FAQ before submitting an issue:

I have read the documentation and FAQ
I have reviewed the minimal examples section
I have checked the changelog for main framework updates
I have read the contribution guide
I have joined Catalyst slack (#__questions channel) for issue discussion

Scitator

The logger looks great! After a few noted hotfixes, let's update the branch to the current master and CI test and merge it 👍

catalyst/loggers/wandb.py

catalyst/tests/test_finetune2.py

pep8speaks · 2021-04-22T10:52:36Z

Hello @AyushExel! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-30 14:36:16 UTC

Co-authored-by: Sergey Kolesnikov <scitator@gmail.com>

AyushExel · 2021-04-22T12:23:47Z

@Scitator I'm not sure why tests are failing. I don't think the changes should cause it. Here's the log of pytest on my machine from this branch:

336 passed, 132 skipped, 2 xfailed, 92 warnings in 450.74s (0:07:30)

catalyst/tests/test_finetune2.py

Co-authored-by: Sergey Kolesnikov <scitator@gmail.com>

Pull request has been modified.

AyushExel · 2021-04-26T07:46:36Z

@Scitator something that I just noticed. As discussed earlier, I'm sticking with using global_epoch_step for each metric as wandb supports only one global step for all metrics and that should be strictly increasing. But even in the current settings I'm getting these warnings. ( Testing on test/test_finetune2.py's train_experiment("cuda:0"))

Now, I remember that this worked fine the last time when I marked this PR for review( example output). It is strange as only the global epoch step is being used and it should definitely be increasing unless there's a typo. Can you find something out of place? Or maybe a recent commit introduced this?

Scitator · 2021-04-27T06:57:45Z

@AyushExel could you please try use _suffix rather than /suffix for the logging? maybe different names for the batch-based and epoch-based metrics would work for Wandb

AyushExel · 2021-04-27T07:25:13Z

@Scitator sorry it was a false alarm. I didn't rebuild the package after changing sample_step to global_step. The logger works fine and I've updated the test to not log hyperparameters when initializing, instead it only logs through log_hparams now. Should be good to go
EDIT : Example run

Scitator · 2021-04-27T17:31:43Z

catalyst/loggers/wandb.py

+        if scope == "batch":
+            metrics = {k: float(v) for k, v in metrics.items()}
+            self._log_metrics(
+                metrics=metrics, step=global_epoch_step, loader_key=loader_key, suffix="/batch"


Suggested change

metrics=metrics, step=global_epoch_step, loader_key=loader_key, suffix="/batch"

metrics=metrics, step=global_sample_step, loader_key=loader_key, suffix="/batch"

as far as this is per-batch/sample statistics we have to use per-sample counter

Using global_sample_step will cause this problem -> #1176 (comment)

@AyushExel could you please try using _suffix rather than /suffix for the logging? maybe different names for the batch-based and epoch-based metrics would work for Wandb

you could not write per-batch metrics under the epoch counter - it's simply incorrect behaviour
could you please try using different names for per-batch and per-epoch metrics?

Yeah, the step counter behavior is pretty strict which comes up a lot. But I'm pretty sure that's how it is right now. I tried changing names as you suggested but that didn't work. See this where this behaviour is explained.

As long as you keep passing the same value for step, W&B will collect the keys and values from each call in one unified dictionary. As soon you call wandb.log() with a different value for step than the previous one, W&B will write all the collected keys and values to the history, and start collection over again. Note that this means you should only use this with consecutive values for step: 0, 1, 2, .... This feature doesn't let you write to absolutely any history step that you'd like, only the "current" one and the "next" one.

Then I think you could use different names for batch-based and epoch-based metrics – just rename them in the way, that Wandb would understand :)

Yes this might make more sense. So, instead of suffix, I think we can use the prefix. If you log anything in wandb as {'prefix/key': value}, it creates it's own section in the dashboard called prefix and all metrics belonging to the same prefix share the same section.

it's up to you, you could create {metric_name}_{prefix}/{somethin}, so it would be accuracy_epoch and accuracy_batch

Okay updated the names of the metrics as you suggested. Now each metric has its section like {metric_name}_{prefix}. I think it's all good now.

ditwoo · 2021-04-27T18:37:46Z

Hi, @AyushExel

Could you please check that metrics, parameters, etc. will be logged as expected when the experiment has multiple stages?

AyushExel · 2021-04-27T19:02:24Z

Hi, @AyushExel

Could you please check that metrics, parameters, etc. will be logged as expected when the experiment has multiple stages?

@ditwoo any specific example in tests that I should run the logger with ?

ditwoo · 2021-04-27T21:25:41Z

Hi, @AyushExel
Could you please check that metrics, parameters, etc. will be logged as expected when the experiment has multiple stages?

@ditwoo any specific example in tests that I should run the logger with ?

I think you can adjust an example from readme:

import os
from torch import nn, optim
from torch.utils.data import DataLoader
from catalyst import dl, utils
from catalyst.contrib.datasets import MNIST
from catalyst.data.transforms import ToTensor


class CustomRunner(dl.IRunner):
    def __init__(self, logdir, device):
        super().__init__()
        self._logdir = logdir
        self._device = device

    def get_engine(self):
        return dl.DeviceEngine(self._device)

    def get_loggers(self):
        return {
            "console": dl.ConsoleLogger(),
            "csv": dl.CSVLogger(logdir=self._logdir),
            # TODO: finish this part
            "wandb": dl.WandbLogger(...),
        }

    @property
    def stages(self):
        return ["train_freezed", "train_unfreezed"]

    def get_stage_len(self, stage: str) -> int:
        return 3

    def get_loaders(self, stage: str):
        loaders = {
            "train": DataLoader(
                MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()), batch_size=32
            ),
            "valid": DataLoader(
                MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()), batch_size=32
            ),
        }
        return loaders

    def get_model(self, stage: str):
        model = (
            self.model
            if self.model is not None
            else nn.Sequential(nn.Flatten(), nn.Linear(784, 128), nn.ReLU(), nn.Linear(128, 10))
        )
        if stage == "train_freezed":
            # freeze layer
            utils.set_requires_grad(model[1], False)
        else:
            utils.set_requires_grad(model, True)
        return model

    def get_criterion(self, stage: str):
        return nn.CrossEntropyLoss()

    def get_optimizer(self, stage: str, model):
        if stage == "train_freezed":
            return optim.Adam(model.parameters(), lr=1e-3)
        else:
            return optim.SGD(model.parameters(), lr=1e-1)

    def get_scheduler(self, stage: str, optimizer):
        return None

    def get_callbacks(self, stage: str):
        return {
            "criterion": dl.CriterionCallback(
                metric_key="loss", input_key="logits", target_key="targets"
            ),
            "optimizer": dl.OptimizerCallback(metric_key="loss"),
            "checkpoint": dl.CheckpointCallback(
                self._logdir, loader_key="valid", metric_key="loss", minimize=True, save_n_best=3
            ),
        }

    def handle_batch(self, batch):
        x, y = batch
        logits = self.model(x)

        self.batch = {
            "features": x,
            "targets": y,
            "logits": logits,
        }

runner = CustomRunner("./logs", "cpu")
runner.run()

Or adjust this test file.

AyushExel · 2021-04-28T08:14:23Z

@ditwoo Yes above mentioned examples work. Here's sample dashboard. something that I've been noticing in these examples is that they're passing empty Dicts in log_hparams function call. Is that an intended use case or something that's still in the works?

ditwoo · 2021-04-28T18:59:28Z

log_hparams

As far as I know, hparams used when you are doing some architecture search (using optuna or similar).
@Scitator could you please comment if it's expected behavior for hparams?

Scitator · 2021-04-28T19:30:36Z

It's better to refer to the docs: https://catalyst-team.github.io/catalyst/api/core.html?highlight=hparams#catalyst.core.runner.IRunner.hparams.

AyushExel · 2021-04-29T11:50:06Z

Okay I think then it's all good. Runner.hparams for these experiments is empty so it makes sense that no values are logged.

Scitator · 2021-04-29T17:35:06Z

@AyushExel could you please update the changelog (link in the PR description)?

Scitator · 2021-04-29T18:33:38Z

btw, you also need to add the Logger to the docs

mergify · 2021-04-30T14:18:05Z

This pull request is now in conflicts. @AyushExel, could you fix it? 🙏

AyushExel added 4 commits April 14, 2021 14:18

add WandbLogger

Verified

This commit was signed with the committer’s verified signature.

jsumners-nr James Sumners

SSH Key Fingerprint: pC0PNUuH1Q6f8JAy2s372DbWkbWpbrmHiziQrS+XsfY
Verified
Learn about vigilant mode

849f758

update logger

Verified

This commit was signed with the committer’s verified signature.

jsumners-nr James Sumners

SSH Key Fingerprint: pC0PNUuH1Q6f8JAy2s372DbWkbWpbrmHiziQrS+XsfY
Verified
Learn about vigilant mode

133286a

Update tests, docstring

Verified

This commit was signed with the committer’s verified signature.

jsumners-nr James Sumners

SSH Key Fingerprint: pC0PNUuH1Q6f8JAy2s372DbWkbWpbrmHiziQrS+XsfY
Verified
Learn about vigilant mode

774efb5

add requirements-wandb.txt

Verified

This commit was signed with the committer’s verified signature.

jsumners-nr James Sumners

SSH Key Fingerprint: pC0PNUuH1Q6f8JAy2s372DbWkbWpbrmHiziQrS+XsfY
Verified
Learn about vigilant mode

c5ff715

Scitator reviewed Apr 21, 2021

View reviewed changes

catalyst/loggers/wandb.py Show resolved Hide resolved

catalyst/loggers/wandb.py Outdated Show resolved Hide resolved

catalyst/loggers/wandb.py Outdated Show resolved Hide resolved

catalyst/tests/test_finetune2.py Outdated Show resolved Hide resolved

Merge branch 'master' of https://github.com/catalyst-team/catalyst

Verified

This commit was signed with the committer’s verified signature.

jsumners-nr James Sumners

SSH Key Fingerprint: pC0PNUuH1Q6f8JAy2s372DbWkbWpbrmHiziQrS+XsfY
Verified
Learn about vigilant mode

ffe6433

AyushExel and others added 6 commits April 22, 2021 16:23

update wandblogger

3910c4c

Merge branch 'master' of https://github.com/AyushExel/catalyst

026157f

PEP8 format

db837b0

reformat

e95a99f

FIX test example

3336fd2

AyushExel marked this pull request as ready for review April 22, 2021 13:12

AyushExel requested review from bagxi and ditwoo as code owners April 22, 2021 13:12

ditwoo previously approved these changes Apr 22, 2021

View reviewed changes

Scitator reviewed Apr 22, 2021

View reviewed changes

catalyst/tests/test_finetune2.py Outdated Show resolved Hide resolved

Update catalyst/tests/test_finetune2.py

6b87bcb

Co-authored-by: Sergey Kolesnikov <scitator@gmail.com>

AyushExel added 4 commits April 23, 2021 00:50

Merge branch 'master' into master

42beed4

Merge branch 'master' into master

77c38b4

Update hparams logging

9285add

fix typo

68b8d29

Allow config update, update test

de42e30

Merge branch 'master' into master

f5c8da5

Scitator reviewed Apr 27, 2021

View reviewed changes

AyushExel added 2 commits April 28, 2021 13:41

Update metric names

6bfffd5

Merge branch 'master' of https://github.com/AyushExel/catalyst

1ad8462

Update metric name format

956cffd

AyushExel and others added 3 commits April 29, 2021 23:28

Update CHANGELOG.md

32dca88

Update CHANGELOG.md

f94af60

Update CHANGELOG.md

64f693f

Update loggers.rst

c068dd2

Merge branch 'master' into master

572347b

Scitator merged commit 6cfa27f into catalyst-team:master Apr 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for WandbLogger #1176

Add support for WandbLogger #1176

AyushExel commented Apr 14, 2021 •

edited

Loading

Scitator left a comment

pep8speaks commented Apr 22, 2021 •

edited

Loading

AyushExel commented Apr 22, 2021

AyushExel commented Apr 26, 2021 •

edited

Loading

Scitator commented Apr 27, 2021

AyushExel commented Apr 27, 2021 •

edited

Loading

Scitator Apr 27, 2021 •

edited

Loading

Scitator Apr 27, 2021

AyushExel Apr 27, 2021

Scitator Apr 27, 2021 •

edited

Loading

AyushExel Apr 27, 2021

Scitator Apr 28, 2021

AyushExel Apr 28, 2021

Scitator Apr 28, 2021

AyushExel Apr 29, 2021

ditwoo commented Apr 27, 2021

AyushExel commented Apr 27, 2021

ditwoo commented Apr 27, 2021 •

edited

Loading

AyushExel commented Apr 28, 2021

ditwoo commented Apr 28, 2021

Scitator commented Apr 28, 2021

AyushExel commented Apr 29, 2021

Scitator commented Apr 29, 2021

Scitator commented Apr 29, 2021

mergify bot commented Apr 30, 2021

	metrics=metrics, step=global_epoch_step, loader_key=loader_key, suffix="/batch"
	metrics=metrics, step=global_sample_step, loader_key=loader_key, suffix="/batch"

Add support for WandbLogger #1176

Add support for WandbLogger #1176

Conversation

AyushExel commented Apr 14, 2021 • edited Loading

Before submitting (checklist)

Description

Related Issue

Type of Change

PR review

Additional Deatils:

Test logs

FAQ

Scitator left a comment

Choose a reason for hiding this comment

pep8speaks commented Apr 22, 2021 • edited Loading

Comment last updated at 2021-04-30 14:36:16 UTC

AyushExel commented Apr 22, 2021

AyushExel commented Apr 26, 2021 • edited Loading

Scitator commented Apr 27, 2021

AyushExel commented Apr 27, 2021 • edited Loading

Scitator Apr 27, 2021 • edited Loading

Choose a reason for hiding this comment

Scitator Apr 27, 2021

Choose a reason for hiding this comment

AyushExel Apr 27, 2021

Choose a reason for hiding this comment

Scitator Apr 27, 2021 • edited Loading

Choose a reason for hiding this comment

AyushExel Apr 27, 2021

Choose a reason for hiding this comment

Scitator Apr 28, 2021

Choose a reason for hiding this comment

AyushExel Apr 28, 2021

Choose a reason for hiding this comment

Scitator Apr 28, 2021

Choose a reason for hiding this comment

AyushExel Apr 29, 2021

Choose a reason for hiding this comment

ditwoo commented Apr 27, 2021

AyushExel commented Apr 27, 2021

ditwoo commented Apr 27, 2021 • edited Loading

AyushExel commented Apr 28, 2021

ditwoo commented Apr 28, 2021

Scitator commented Apr 28, 2021

AyushExel commented Apr 29, 2021

Scitator commented Apr 29, 2021

Scitator commented Apr 29, 2021

mergify bot commented Apr 30, 2021

AyushExel commented Apr 14, 2021 •

edited

Loading

pep8speaks commented Apr 22, 2021 •

edited

Loading

AyushExel commented Apr 26, 2021 •

edited

Loading

AyushExel commented Apr 27, 2021 •

edited

Loading

Scitator Apr 27, 2021 •

edited

Loading

Scitator Apr 27, 2021 •

edited

Loading

ditwoo commented Apr 27, 2021 •

edited

Loading