ml-by-mixins is my personal project where I implement various machine learning algorithms by combining loss functions, activation functions and optimizers. This library provides mixins of them and you can try any combination.
For example, if you want poisson regression with log-link function optimized by stochastic average gradient, you can write:
from mlbymxn.base import BaseML
from mlbymxn.loss_functions import PoissonLossMixin
from mlbymxn.activation_functions import ExponentialActivationMixin
from mlbymxn.optimizers import SAGOptimizerMixin
class PoissonRegressionBySAG(
BaseML, PoissonLossMixin, ExponentialActivationMixin,
SAGOptimizerMixin):
pass
poisson_reg_sag = PoissonRegressionBySAG(eta=0.001, max_iters=50)
poisson_reg_sag.fit(X, y)
Since the link function is the inverse of the activation function
(it sounds a little strange to use the terminology 'activation' for generalized linear model, but I think the link function and the activation function are related concepts),
here ExponentialActivationMixin
is combined with PoissonLossMixin
.
If you want poisson regression with identity-link function, all you have to do is to switch ExponentialActivationMixin
to IdentityActivationMixin
.
Provided mixins are as follows:
-
loss function mixin
- squared loss
- poisson loss
- log loss
- hinge loss
-
optimizer mixin
- scipy.minimize.optimize
- gradient desent (GD)
- stochastic gradient descent (SGD)
- stochastic average gradient (SAG)
- Newton's method
- momentum SGD
- RMSprop
- AdaGrad
- AdaDelta
- Adam
-
activation function mixin
- identity
- exponential
- sigmoid
- tanh
- ReLU
Here is the basic form of loss function with L2 regularization over all training samples.
is different from the type of loss functions.
is the threshold. if SVM, if Perceptron.
otherwise
is the current iteration and the current weight vector will be updated to .
GD calculates all gradients of training samples and updates theta in a batch.
SGD calculates gradient of a randomly selected sample and updates theta in an online manner.
SAG calculates gradient of a randomly selected sample (like SGD) and updates theta in a batch (like GD).
if ( is a randomly selected sample),
otherwise,
TODO
TODO
TODO
TODO
TODO