-
-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for WandbLogger #1176
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logger looks great! After a few noted hotfixes, let's update the branch to the current master and CI test and merge it 👍
Hello @AyushExel! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-04-30 14:36:16 UTC |
Co-authored-by: Sergey Kolesnikov <scitator@gmail.com>
@Scitator I'm not sure why tests are failing. I don't think the changes should cause it. Here's the log of pytest on my machine from this branch:
|
Co-authored-by: Sergey Kolesnikov <scitator@gmail.com>
@Scitator something that I just noticed. As discussed earlier, I'm sticking with using |
@AyushExel could you please try use |
catalyst/loggers/wandb.py
Outdated
if scope == "batch": | ||
metrics = {k: float(v) for k, v in metrics.items()} | ||
self._log_metrics( | ||
metrics=metrics, step=global_epoch_step, loader_key=loader_key, suffix="/batch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metrics=metrics, step=global_epoch_step, loader_key=loader_key, suffix="/batch" | |
metrics=metrics, step=global_sample_step, loader_key=loader_key, suffix="/batch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as this is per-batch/sample statistics we have to use per-sample counter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using global_sample_step
will cause this problem -> #1176 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AyushExel could you please try using _suffix rather than /suffix for the logging? maybe different names for the batch-based and epoch-based metrics would work for Wandb
you could not write per-batch metrics under the epoch counter - it's simply incorrect behaviour
could you please try using different names for per-batch and per-epoch metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the step counter behavior is pretty strict which comes up a lot. But I'm pretty sure that's how it is right now. I tried changing names as you suggested but that didn't work. See this where this behaviour is explained.
As long as you keep passing the same value for step, W&B will collect the keys and values from each call in one unified dictionary. As soon you call wandb.log() with a different value for step than the previous one, W&B will write all the collected keys and values to the history, and start collection over again. Note that this means you should only use this with consecutive values for step: 0, 1, 2, .... This feature doesn't let you write to absolutely any history step that you'd like, only the "current" one and the "next" one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I think you could use different names for batch-based and epoch-based metrics – just rename them in the way, that Wandb would understand :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this might make more sense. So, instead of suffix, I think we can use the prefix. If you log anything in wandb as {'prefix/key': value}
, it creates it's own section in the dashboard called prefix
and all metrics belonging to the same prefix share the same section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's up to you, you could create {metric_name}_{prefix}/{somethin}
, so it would be accuracy_epoch and accuracy_batch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay updated the names of the metrics as you suggested. Now each metric has its section like {metric_name}_{prefix}
. I think it's all good now.
Hi, @AyushExel Could you please check that metrics, parameters, etc. will be logged as expected when the experiment has multiple stages? |
@ditwoo any specific example in tests that I should run the logger with ? |
I think you can adjust an example from readme: import os
from torch import nn, optim
from torch.utils.data import DataLoader
from catalyst import dl, utils
from catalyst.contrib.datasets import MNIST
from catalyst.data.transforms import ToTensor
class CustomRunner(dl.IRunner):
def __init__(self, logdir, device):
super().__init__()
self._logdir = logdir
self._device = device
def get_engine(self):
return dl.DeviceEngine(self._device)
def get_loggers(self):
return {
"console": dl.ConsoleLogger(),
"csv": dl.CSVLogger(logdir=self._logdir),
# TODO: finish this part
"wandb": dl.WandbLogger(...),
}
@property
def stages(self):
return ["train_freezed", "train_unfreezed"]
def get_stage_len(self, stage: str) -> int:
return 3
def get_loaders(self, stage: str):
loaders = {
"train": DataLoader(
MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()), batch_size=32
),
"valid": DataLoader(
MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()), batch_size=32
),
}
return loaders
def get_model(self, stage: str):
model = (
self.model
if self.model is not None
else nn.Sequential(nn.Flatten(), nn.Linear(784, 128), nn.ReLU(), nn.Linear(128, 10))
)
if stage == "train_freezed":
# freeze layer
utils.set_requires_grad(model[1], False)
else:
utils.set_requires_grad(model, True)
return model
def get_criterion(self, stage: str):
return nn.CrossEntropyLoss()
def get_optimizer(self, stage: str, model):
if stage == "train_freezed":
return optim.Adam(model.parameters(), lr=1e-3)
else:
return optim.SGD(model.parameters(), lr=1e-1)
def get_scheduler(self, stage: str, optimizer):
return None
def get_callbacks(self, stage: str):
return {
"criterion": dl.CriterionCallback(
metric_key="loss", input_key="logits", target_key="targets"
),
"optimizer": dl.OptimizerCallback(metric_key="loss"),
"checkpoint": dl.CheckpointCallback(
self._logdir, loader_key="valid", metric_key="loss", minimize=True, save_n_best=3
),
}
def handle_batch(self, batch):
x, y = batch
logits = self.model(x)
self.batch = {
"features": x,
"targets": y,
"logits": logits,
}
runner = CustomRunner("./logs", "cpu")
runner.run() Or adjust this test file. |
As far as I know, |
It's better to refer to the docs: https://catalyst-team.github.io/catalyst/api/core.html?highlight=hparams#catalyst.core.runner.IRunner.hparams. |
Okay I think then it's all good. |
@AyushExel could you please update the changelog (link in the PR description)? |
btw, you also need to add the Logger to the docs |
This pull request is now in conflicts. @AyushExel, could you fix it? 🙏 |
Before submitting (checklist)
catalyst-make-codestyle && catalyst-check-codestyle
(pip install -U catalyst-codestyle
).make check-docs
?pytest .
?latest
andminimal
requirements?Description
This PR adds support for WandbLogger that enables logging metrics and media to W&B dashboard
Related Issue
Type of Change
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Additional Deatils:
test_finetune2.py
train method, no hyperparameters are being logged.Test logs
Code style
-->
catalyst-make-codestyle && catalyst-check-codestyle
Docs check
-->
rm -rf ./builds; REMOVE_BUILDS=0 make check-docs
Tests
-->
pytest .
@Scitator Let me know if I missed any steps here
FAQ
Please review the FAQ before submitting an issue: