Home

7/7/2018

Added group_convolution layer. Slightly optimized for depth-wise convolutions (that is when the number of groups = input channels = output channels) (inspired by ShuffleNet).
Added shuffle layer (inspired by ShuffleNet).
Discovered bug in stride!=1 optimizations of convolution layer.
Recommend reading this nice explaination of grouped convolutions and shuffle: Why MobileNet and Its Variants (e.g. ShuffleNet) Are Fast

6/29/2018

Added travis and appveyor CI and fixed a couple small issues to make sure linux builds were working.

6/3/2017

Fixed several small build issues and performed some smoke tests. Only tested on Win7 with visual C and GCC. GCC testing did not link to opencv. Speed was much better with VC compiles and cifar example was unstable with GCC build.
Change MNIST training example to use quick start model format. Would like to do fully running example that requires only images and the model file.

3/7/2017

Code update, but project files, make files, and linux build needs to be tested
Converted and tested VGG16 model. Works niceley.
Spent a lot of time optimizing for AVX. Moved to a packed format for the unwrapped convolutions. This is much faster but makes the code harder to understand since the start of each channels/map is aligned for AVX
Examples updated to use data augmentation api and to show opencv use for display
CPU speed is much faster on my older computers but slow on newer computers. VGG16 forward is around 1 second. Training to 99% MNIST in a minute and to 60% CIFAR in 2 minutes on relatively old computer. I believe the bottleneck could be getting data into cache since more cores per chip doesn't help after the first couple.
CPU speed for forward seems a bit faster than native mxnet CPU speeds

6/17/2016

pushed in a some new code, but didn't update the sample apps. The main items are: built-in data augmentation, new median-edge padding, updated file io, can read in a txt file with network configuration
Flurry of activity over at tiny_cnn. Much of the work seems to address the things that motivated mojo cnn - lack of branching and slow speed (btw I noticed tiny_cn readme changed from saying it was 'fast' to 'reasonably fast'). With the heavy activity there and the growing complexity of the project I feel there is even more of a need for a clean simple implementation like mojo cnn.
I've spent a little more time with MxNet. I'd have to say, even though it does not have a ton of features, it's my recommended CNN package if you need something to scale larger than mojo cnn can scale. It was easy to run out of the box on Windows.

6/8/2016

Added median_edge padding which takes the median value around the border of a feature map. This would be good for sparse signals. The normal edge padding could work better for dense signals.

6/1/2016

5/30/2016

5/27/2016

Haven't experimented with a ton of permutations, but I just ran a DeepCNet version of MNIST and got 0.25% error in 2 hours of training. I'm interested how that stacks up to other packages in terms of timing. I can make this quite a bit faster I think, but it will make the code messier than it already is. See the DeepCNet write up here: A Shallow Dive into DeepCNet witn Mojo-CNN

5/21/2016

Renamed my version of the maxout layer to mfm (see A Lightened CNN for Deep Face Representation) since the idea was already out there. The mfm layer (max-feature-map) pools over 2 maps. It is a subset of what the maxout network does. I've not seen this 'activation' work better then other ideas yet, but when pooling it helps speed by reducing the number of feature maps.

5/18/2016

Tried to reproduce MNIST accuracy reported by the DeepCNet paper. Ended up with 99.63% and not 99.69%. Close, but I did DeepCNet(5,40) with only 3 dropout layers and no padded input instead of DeepCNet(5,60) with 4 dropout layers and the input padding to 96x96. Instead of padding, I just used the 28x28 input and padded the output of layer 2 to 15x15 to 'catch up' with what would be there if I did it right. This prevents a ton of convolutions against useless padded data. Which got me thinking it may be nice to added a run length encoded convolution layer where you skip large empty spaces on sparse data.
The speed reported in the DeepCNet paper for MNIST DeepCNet(5,10) configuration was 3,000 records/second on a GeForce GTX 680. Granted a pretty cheap card. The mojo-cnn implementation of DeepCnet(5,10) with the un-padded input was 10,000 records/second on Core i7-4810MQ without special optimizations. I'm working on a DeepCnet layer which should be a bit faster.
DeepCNin was also tested, but too slow for my liking.
After this DeepCNet stuff is wrapped up, the top things on the list are: testing the max-out type of activations, train_target() functionality with multi-dimensional embedding, train_pair() with Siamese network type of training, testing resnet, and a first version of native sliding window code (without full image convolutions at first).

5/16/2016

5/15/2016

Spent (too much) time trying to figure out why the last drop of code seemed to work worse. Rolled back and slowly merged things in. One difference was that the gradients of the output layer were 1/2 the size they were supposed to be. This bug seemed to speed training over the early epochs.
added color maps for weight and state display: gray, hot, tensorglow, and voodoo

5/11/2016

updated cifar-10 model. Added 1x1 convolution layers to previous model. This is slower, but works a little better.
added a layer that will perform cropping, padding, and concatenation of output maps. This is called 'concatenation' or 'resize'.
some optimization of 1x1 convolutions and of dropout layer
considering moving activations into their own layer
softmax added but not tested yet

5/8/2016

uploaded training examples
added support for multiple input layers. you can separate architectures for different input channels or can add meta data with image data.

5/5/2016

added semi-stochastic pooling
fixed dropout layer (fraction was off by 10x)
started to add maxout concepts
added color rendering of first set of weights
cleaned up threading code so that it automatically picks the number of threads to use
improved HTML log so that is highlights the best results and uses yellow to show when performance is getting worse
updated project headers and re-arranged a little bit
added look-up-table for tanh

5/3/2016

5/1/2016

Uploading latest version though there are some odd things going on. the training rate and speed are a little worse than the other day after cleaning up some code. This version is not heavily tested.
Tried AVX to speed convolution with no improvement over SSE
Added html training log feature. Here is an example from CIFAR-10:

4/28/2016

Added dropout layer type
Fixed initialization bug in optimizers
Added some SSE3 speedups with unwrap trick for convolutions
With above updates, CIFAR-10 gives 62% accuracy in about 2 minutes, 70% in 8 minutes. MNIST gives 99% in 20 seconds. (No GPU but SSE3 and OpenMP)

4/26/2016

Benchmarked tiny-cnn. μCNN was about 10x faster on CIFAR with smart training in my unofficial test. For MNIST tiny-cnn claims 98.8% accuracy 13 minutes. μCNN gives 90% accuracy in 30 seconds, but the model configuration is not the same. Nevertheless, it supports ~10x speed difference.
Cygwin compile working & makefile.
Cross entropy loss added. Can specify loss function (cost) at start of epoch.
Finish version 1 of smart auto training.

4/19/2016

Provide feedback