AGC Optimizers

A small lib for using adaptive gradient clipping in your optimizer. Currently PyTorch only.

News
Introduction
Comparison
To Do

News

Sep 15, 2021

Add AGC use independent from optimizer choice in PyTorch

Sep 14, 2021

Add AdamW, Adam, SGD and RMSprop with AGC
Add first comparsion between optimizers with and without AGC based on CIFAR10

Introduction

Brock et al. introduced 2021 a new clipping technique in order to increase stability of large batch training and high learning rates in their Normalizer-Free Networks (NFNet), the adaptive gradient clipping. This clipping method is not implemented in leading frameworks, thus I provide optimizers which are capable of AGC.

Installation

pip install agc_optims

Usage

To be consistent with PyTorch all arguments of the optimizer remain the same as in the standard. Only two parameters are added for the AGC:

clipping : Hyperparameter for the clipping of the parameter. Default value 1e-2, smaller batch sizes demand a higher clipping parameter
agc_eps : Term used in AGC to prevent grads clipped to zero, default value 1e-3

Optimizer independent

from torch.optim import Adam
from agc_optims.clipper import AGC

net = Net() # your model

optimizer = Adam(net.parameters(), lr=0.001)
optimizer = AGC(optimizer=optimizer, clipping=0.16)

SGD

from agc_optims.optim import SGD_AGC

net = Net() # your model

optimizer = SGD_AGC(net.parameters(), lr=0.01, momentum=0.9, clipping=0.16)

Adam

from agc_optims.optim import Adam_AGC

net = Net() # your model

optimizer = Adam_AGC(net.parameters(), lr=0.001, weight_decay=1e-4, clipping=0.16)

AdamW

from agc_optims.optim import AdamW_AGC

net = Net() # your model

optimizer = AdamW_AGC(net.parameters(), lr=0.001, weight_decay=1e-4, clipping=0.16)

RMSprop

from agc_optims.optim import RMSprop_AGC

net = Net() # your model

optimizer = RMSprop_AGC(net.parameters(), lr=0.001, clipping=0.16)

Now you can use the optimizer just like their non-AGC counterparts.

Comparison

The following comparison shows that for batch sizes 64 and 128 Adam with AGC performs better than the normal Adam. SGD is unfortunately worse with AGC, but the batch size is also very small compared to the NFNet paper. This requires more comparisons with higher batch sizes and also on other data sets. RMSprop is also better at both batch sizes with AGC than without. The learning rate was left at the default value for all optimizers and the scripts in the performance_tests folder were used as the test environment.

Batch Size 64 - SGD Accuracy on Cifar10	Batch Size 64 - SGD Loss on Cifar10

Batch Size 128 - SGD Accuracy on Cifar10	Batch Size 128 - SGD Loss on Cifar10

Batch Size 64 - Adam Accuracy on Cifar10	Batch Size 64 - Adam Loss on Cifar10

Batch Size 128 - Adam Accuracy on Cifar10	Batch Size 128 - Adam Loss on Cifar10

Batch Size 64 - RMSProp Accuracy on Cifar10	Batch Size 64 - RMSProp Loss on Cifar10

Batch Size 128 - RMSProp Accuracy on Cifar10	Batch Size 128 - RMSProp Loss on Cifar10

As a little treat, I have also compared the speed of the optimizer with and without AGC to see whether this greatly increases training times.

Batch Size 128 - RMSProp Accuracy on Cifar10	Batch Size 128 - RMSProp Loss on Cifar10

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
examples		examples
performance_tests		performance_tests
src/agc_optims		src/agc_optims
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

performance_tests

performance_tests

src/agc_optims

src/agc_optims

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

Repository files navigation

AGC Optimizers

News

Sep 15, 2021

Sep 14, 2021

Introduction

Installation

Usage

Optimizer independent

SGD

Adam

AdamW

RMSprop

Comparison

To Do

About

Releases 2

Packages

Languages

License

Skyy93/agc_optims

Folders and files

Latest commit

History

Repository files navigation

AGC Optimizers

News

Sep 15, 2021

Sep 14, 2021

Introduction

Installation

Usage

Optimizer independent

SGD

Adam

AdamW

RMSprop

Comparison

To Do

About

Resources

License

Stars

Watchers

Forks

Languages