Skip to content

ysk24ok/ml-by-mixins

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ml-by-mixins

Overview

ml-by-mixins is my personal project where I implement various machine learning algorithms by combining loss functions, activation functions and optimizers. This library provides mixins of them and you can try any combination.

For example, if you want poisson regression with log-link function optimized by stochastic average gradient, you can write:

from mlbymxn.base import BaseML
from mlbymxn.loss_functions import PoissonLossMixin
from mlbymxn.activation_functions import ExponentialActivationMixin
from mlbymxn.optimizers import SAGOptimizerMixin

class PoissonRegressionBySAG(
        BaseML, PoissonLossMixin, ExponentialActivationMixin,
        SAGOptimizerMixin):
    pass

poisson_reg_sag = PoissonRegressionBySAG(eta=0.001, max_iters=50)
poisson_reg_sag.fit(X, y)

Since the link function is the inverse of the activation function (it sounds a little strange to use the terminology 'activation' for generalized linear model, but I think the link function and the activation function are related concepts), here ExponentialActivationMixin is combined with PoissonLossMixin. If you want poisson regression with identity-link function, all you have to do is to switch ExponentialActivationMixin to IdentityActivationMixin.

Provided mixins are as follows:

  • loss function mixin

    • squared loss
    • poisson loss
    • log loss
    • hinge loss
  • optimizer mixin

    • scipy.minimize.optimize
    • gradient desent (GD)
    • stochastic gradient descent (SGD)
    • stochastic average gradient (SAG)
    • Newton's method
    • momentum SGD
    • RMSprop
    • AdaGrad
    • AdaDelta
    • Adam
  • activation function mixin

    • identity
    • exponential
    • sigmoid
    • tanh
    • ReLU

Formulation

symbol description
the number of training samples
the number of features (including bias term)
learning rate
feature vector of -th trainig sample
target value (or label) of -th trainig sample
weight vector

Loss functions

Here is the basic form of loss function with L2 regularization over all training samples.
is different from the type of loss functions.

Squared Loss (Linear Regression)

loss function

gradient

Poisson Loss (Poisson Regression)

loss function

gradient

Log Loss (Logistic Regression)

loss function

gradient

Hinge Loss (Perceptron)

is the threshold. if SVM, if Perceptron.

loss function

gradient

If ,

otherwise

Optimizers

is the current iteration and the current weight vector will be updated to .

Gradient Descent (GD)

GD calculates all gradients of training samples and updates theta in a batch.

Stochastic Gradient Descent (SGD)

SGD calculates gradient of a randomly selected sample and updates theta in an online manner.

Stochastic Average Gradient (SAG)

SAG calculates gradient of a randomly selected sample (like SGD) and updates theta in a batch (like GD).

if ( is a randomly selected sample),

otherwise,

Newton's Method

momentum SGD

TODO

RMSprop

TODO

AdaGrad

TODO

AdaDelta

TODO

Adam

TODO

About

ML algorithms as combinations of loss functions, activation functions and optimizers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages