Report CycleGAN validation metrics correctly to wandb #2131

mcgibbon · 2023-01-05T16:59:38Z

Currently the CycleGAN training routine does not report the same training losses on validation data. This PR refactors the code to use training losses for validation data, produce one wandb report per epoch, and adds regularization loss as an output metric.

Refactored public API:

CycleGAN metrics have been updated to show all metrics in one report, and show all training losses on validation data

Significant internal changes:

black is moved before flake8 in pre-commit-hooks to reduce line length errors in flake8

Coverage reports (updated automatically):

test_unit: 60%

mcgibbon · 2023-01-13T20:24:43Z

.pre-commit-config.yaml

@@ -1,5 +1,10 @@
 exclude: "external/gcsfs/"
 repos:
+-   repo: https://github.com/psf/black


I'd like black to run before flake8 so that it can auto-fix flake8 issues before flake8 runs.

mcgibbon · 2023-01-13T20:25:37Z

external/fv3fit/fv3fit/pytorch/cyclegan/cyclegan_trainer.py

-            generator
-        discriminator_optimizer: configuration for the optimizer used to train the
-            discriminator
+        optimizer: configuration for the optimizer used to train the


Merging these was necessary so a wandb sweep can operate on the learning rate, there's no way to pair two hyperparameters for wandb sweeps.

mcgibbon · 2023-01-13T20:26:25Z

external/fv3fit/fv3fit/pytorch/cyclegan/cyclegan_trainer.py

@@ -316,76 +314,8 @@ def _init_targets(self, shape: Tuple[int, ...]):
            torch.Tensor(shape).fill_(0.0).to(DEVICE), requires_grad=False
        )

-    def evaluate_on_dataset(


This function was never actually used (it was called, but only on validation data, and I never provided validation datasets before now).

mcgibbon · 2023-01-13T20:26:47Z

external/fv3fit/fv3fit/pytorch/cyclegan/cyclegan_trainer.py

@@ -395,6 +325,8 @@ def train_on_batch(
                [sample, time, tile, channel, y, x]
            real_b: a batch of data from domain B, should have shape
                [sample, time, tile, channel, y, x]
+            training: if True, the model will be trained, otherwise we will


This allows getting training metrics on validation data.

mcgibbon · 2023-01-13T20:42:37Z

external/fv3fit/fv3fit/pytorch/cyclegan/cyclegan_trainer.py

@@ -51,12 +50,11 @@ class CycleGANNetworkConfig:
        cycle_weight: weight of the cycle loss
        generator_weight: weight of the generator's gan loss
        discriminator_weight: weight of the discriminator gan loss
+        reload_path: path to a directory containing a saved CycleGAN model to use


This was just a missing docstring entry.

…o match paper

…ng reasons

mcgibbon and others added 13 commits January 5, 2023 16:40

report validation metrics correctly to wandb

1379966

restore sample plotting to cyclegan training

3c750df

consolidate optimizer configs for generator and discriminator

fcae3c5

add explicit transfer to cpu numpy array before plotting

797af2d

add wandb sweep workflow, fix cyclegan saving on gs

54261a9

add docstring for reload_path

9dcb1e8

fix _load_pytorch for models on cloud

f5a428b

commit current project data for cyclegan

090ec0b

use consistent vmin/vmax within domains a and b for cyclegan plotting

daefd94

add (untested) cross-plots for all channels in cyclegan

f4825cf

add ability to save checkpoint models to CycleGAN training

946bd2d

use three digits, not two, for epoch label

1a381a4

updates to cyclegan projects folder

e62f047

mcgibbon commented Jan 13, 2023

View reviewed changes

mcgibbon added 16 commits January 13, 2023 22:09

Merge branch 'master' into feature/improve_wandb_cyclegan_reporting

8c40e9b

Merge branch 'master' into feature/improve_wandb_cyclegan_reporting

d5e12f0

remove output folder from repo

9678ab4

fix path joining for external urls when saving pytorch models

a12557e

fix plotting transpose, fix path join when saving models

acb06ba

fix generator for cyclegan to work for odd strided kernel size

5ce73aa

reduce memory footprint of cyclegan test

c59bd99

commit project files to xfer systems

e688e7f

add histogram plotting to cyclegan training

b30bbb9

remove instance norm from first layer of discriminator in cyclegan, t…

9c0c3b4

…o match paper

use inline conv definition instead of convblock without norm

c7ff935

update tfdataset loader to work when sample dimension exists

66de2b2

update cyclegan project files

4956f2a

add mean bias plots to cyclegan inline diagnostics

36fdb59

update project files, fix validation data config bug

b5959bd

updating project files

a61ba13

mcgibbon added 30 commits March 14, 2023 17:11

fix userwarning from target shape not matching final batch shape

a0e3e15

reintroduce bug with training data used as validation data, for testi…

90343a3

…ng reasons

remove bug with training data used as validation data

916d8e2

add SchedulerConfig to cyclegan training

8871235

delte unused validation_batch_size configuration setting

11104db

update cyclegan project files

a659aa8

Merge branch 'master' into feature/improve_wandb_cyclegan_reporting

5c76788

update project files

868b24e

Merge branch 'master' into feature/improve_wandb_cyclegan_reporting

2758fd0

update project files

c765aba

update project files

bdeb021

Merge branch 'master' into feature/improve_wandb_cyclegan_reporting

9afd0b3

update project files

85a912d

Merge branch 'master' into feature/improve_wandb_cyclegan_reporting

37297df

fix cyclegan when use_geographic_features=False

1bb13ca

fix fmr model breakage from main merge

6fbed6e

fix linting error in ramping.py

1806d88

fix time handling for multi-climate data

8104460

update project files with aggregate processing

9acfee6

add 1e-5 and 1e-6 percentiles to metrics

7014221

update project files, daily mean vals for histogram

9280a54

update project files, ramping uses new data

3dade60

add disable_temporal_features option for cyclegan

cf1099f

update project files

9c4f5fb

add option for whether to plot colorbar in UpdateablePColormesh

e71114f

allow training model with perturbation as context

2a50db1

import Protocol from typing instead of typing_extensions

6c55fcd

update Reloadable to work for new with-perturbation-context model

3b2d231

update project files

26de524

Merge branch 'master' into feature/improve_wandb_cyclegan_reporting

5f5913e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report CycleGAN validation metrics correctly to wandb #2131

Report CycleGAN validation metrics correctly to wandb #2131

mcgibbon commented Jan 5, 2023 •

edited by climate-ci-github

mcgibbon Jan 13, 2023

mcgibbon Jan 13, 2023

mcgibbon Jan 13, 2023

mcgibbon Jan 13, 2023

mcgibbon Jan 13, 2023

Report CycleGAN validation metrics correctly to wandb #2131

Are you sure you want to change the base?

Report CycleGAN validation metrics correctly to wandb #2131

Conversation

mcgibbon commented Jan 5, 2023 • edited by climate-ci-github

mcgibbon Jan 13, 2023

Choose a reason for hiding this comment

mcgibbon Jan 13, 2023

Choose a reason for hiding this comment

mcgibbon Jan 13, 2023

Choose a reason for hiding this comment

mcgibbon Jan 13, 2023

Choose a reason for hiding this comment

mcgibbon Jan 13, 2023

Choose a reason for hiding this comment

mcgibbon commented Jan 5, 2023 •

edited by climate-ci-github