GitHub - dodobyte/neural-net: A small and simple neural network implementation

A small and simple neural network in Python

This is a basic handcrafted neural network implementation. It's not much but the motive was to prove myself that I understood backpropagation.

Network

We use only dense (linear) layers and sigmoid activation functions. The code was hardcoded to classify mnist but you can easily modify it for other purposes. We also take advange of vectorized operations with numpy, the version with loops is many times slower on my cpu.

Default network shape is (784,40,10), minibatch size is 10 and learning rate is 3.0. With the default hyperparameters network achieves 95% accuracy.

Data

I downloaded mnist data from: http://yann.lecun.com/exdb/mnist/. The data is also in this repository unmodified.

Code

init_net initializes the network with given shape. forward takes the input and predicts the label. backward implements the backpropagation algorithm which is the heart of the neural network.

Gradients are accumulated through a minibatch. optimize applies the accumulated gradients to weights and biases. zero_grad zeroes the gradients between minibatches.

Backpropagation

I like to think of a neural network as interconnected gears. I made the following picture which helped me a lot. Note that superscripts here denotes the layer numbers. They are in reverse order starting from 0 which is the output layer.

Here we can easily see how weights in a layer can affect the final gear i.e. the cost. If you rotate W⁰ gear one cycle, how much is the cost gear rotated? That ratio is basically the gradient of W⁰ gear.

image

With backpropagation, all we want is to know how much each weight affects the final cost. Once we know that, we can easily modify weights to decrease the cost. The naive approach is to calculate ∂C/∂w directly for each weight. In that case we modify a weight just a bit, send the input again, check how much the cost has changed. There is a big problem here. For each weight, we have to send the input again and calculate the whole forward pass. That's obviously not practical. Backpropagation solves this problem in a way that we calculate the derivatives only with a single forward pass.

Gradient of W⁰

Here is how it's work. We start from the final gear (cost) and ask ourselves which gear directly affects it? As you can tell from the image, the answer is A⁰, so we take derivative ∂C/∂a⁰ and note it somewhere. Remember that the derivative only tells how much A⁰ affects C. Next we ask which gear directly affects A⁰ and it's Z⁰. We calculate ∂a⁰/∂z⁰. Next we ask which gear directly affects Z⁰, there are actually two, W⁰ and A^-1. Let's continue with W⁰ first and calculate ∂z⁰/∂w⁰.

Now we have three pieces of information. How much W⁰ affects Z⁰, how much Z⁰ affects A⁰ and how much A⁰ affects the cost. We multiply these three derivatives and what we get is how much W⁰ affects the final cost, i.e. W⁰'s gradient. Check the image, this is the equation in the second group.

The Checkpoints

Here's the crucial part, since we already calculated how much Z⁰ affects the cost, we can consider it as a checkpoint. So instead of calculating how much W^-1 affects the cost directly, we can calculate how much it affects the Z⁰. As we already know how much Z⁰ affects cost. Finally, we can multiply these two derivatives and recover the total effects i.e. how much W^-1 affects the final cost.

Gradient of W^-1

So let's calculate the gradient of W^-1. Which gear directly affects Z⁰? They are W⁰ and A^-1, but we already calculated W⁰, and we're trying to get to W^-1. Hence, we continue with A^-1 and calculate ∂z⁰/∂a^-1. Next, which gear directly affects A^-1? It's Z^-1. We calculate ∂a^-1/∂z^-1. Next, which gear directly affects the Z^-1?

Both A^-2 and W^-1. We're interested in W^-1 so we calculate ∂z^-1/∂w^-1. Let's phrase it. We know how much W^-1 affects Z^-1 and we know how much Z^-1 affects A^-1 and we know how much A^-1 affects Z⁰. We multiply these three derivatives and learn how much W^-1 affects Z⁰. Once that's done, we can multiply this with gradient of Z⁰ to learn how much W^-1 affects the final cost, which is the gradient of W^-1. This is the ∂C/∂W^-1 equation in the third group of equations.

Other Gradients

As you may guess, now Z^-1 is our new checkpoint, so we repeat the same process for the rest of the layers. For instance, to calculate gradient of W^-2, we follow the exact steps in the previous section, only the indices change. Once we calculated the gradients, the job of backpropagation is done. Optimizer applies those gradients to actual weights and biases.

Biases

Note that we never mentioned biases in this writing or in the picture. That's because it's very easy to calculate and while we're calculating the gradients of weights, we implicitly calculate the gradients of biases as well. Imagine another gear connected to Z⁰ and it's named B⁰, just like the weight. How much B⁰ affects Z⁰ is a constant. ∂z⁰/∂b⁰ is actually 1. So how much B⁰ affects the cost is the same as how much Z⁰ affects it, therefore gradient of B⁰ is the gradient of Z⁰, which we already calculated. You can see it in the backward function with the assignments dc_db = dc_dz.

Sources

The main source I used was Grant Sanderson's awesome neural network series, mostly the fourth video. Another helpful source was Casper Hansen's neural network tutorial.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
backprop.py		backprop.py
t10k-images-idx3-ubyte.gz		t10k-images-idx3-ubyte.gz
t10k-labels-idx1-ubyte.gz		t10k-labels-idx1-ubyte.gz
train-images-idx3-ubyte.gz		train-images-idx3-ubyte.gz
train-labels-idx1-ubyte.gz		train-labels-idx1-ubyte.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

backprop.py

backprop.py

t10k-images-idx3-ubyte.gz

t10k-images-idx3-ubyte.gz

t10k-labels-idx1-ubyte.gz

t10k-labels-idx1-ubyte.gz

train-images-idx3-ubyte.gz

train-images-idx3-ubyte.gz

train-labels-idx1-ubyte.gz

train-labels-idx1-ubyte.gz

Repository files navigation

A small and simple neural network in Python

Network

Data

Code

Backpropagation

Gradient of W⁰

The Checkpoints

Gradient of W^-1

Other Gradients

Biases

Sources

About

Releases

Packages

Languages

dodobyte/neural-net

Folders and files

Latest commit

History

Repository files navigation

A small and simple neural network in Python

Network

Data

Code

Backpropagation

Gradient of W0

The Checkpoints

Gradient of W-1

Other Gradients

Biases

Sources

About

Topics

Resources

Stars

Watchers

Forks

Languages

Gradient of W⁰

Gradient of W^-1