Computer-Vision-Project

An example of a typical bag of words classification pipeline. Figure by https://fr.mathworks.com/help/vision/ug/image-classification-with-bag-of-visual-words.html

Computer Vision Project: Object Recognition using BoW as first approach and CNN as second approach.

Brief

You have to download VLFeat 0.9.17 binary package
VL Feat Matlab reference: http://www.vlfeat.org/matlab/matlab.html
MatConvNet Matlab reference: http://http://www.vlfeat.org/matconvnet/
You also need to download the PASCAL VOC 2007 dataset and the VOC developement kit in order to generate the images labels: http://http://host.robots.ox.ac.uk/pascal/VOC/voc2007/

Overview

The scope of this project is to implement two approaches for object recognition. The first one is based on Bag of Words models, where we extract handcrafted features (dense sift features). The visual vocabulary is created by clustering a large amount of local descriptors and we make use of the centers as visual vocabulary. Then, we encode each image by creating i)'hard' histograms and ii)spatial histograms. In the second approach we use pre-trained CNN model to fine-tune our data and then we extract a full connected layer as our feature representation and after we feed these deep features to a Linear SVM classifier.

BoW: We use the training and validation set to extract dense SIFT features and from each image we keep 300 descriptors, so in total we get around 1.5 million descriptors from train+val images. K-means is used to cluster these descriptors, using the ANN algorithm (Approximate Nearest-Neighbor) and k-means PLUSPLUS to intialize the centers. Six different numbers of clusters are used {200, 500, 800, 1000, 4000, 8000} and we save the centroids as our vocabulary, which we use after to encode each image into 'hard' histograms.When we have our feature sets ready, we map them into homogeneous kernel maps (intersection kernel, chi2 kernel) and then we feed them to a linear SVM and we see which vocabulary size (200, 500, 800, 1000, 4000, 8000) gives the best performance (using 8000 clusters yields the best performance). We also use spatial histogram encoding {1x1}, {1x3} and {2x2} regions.

This task is a multi-label classification, because each image may contain more than one object. To classify our images, we create 20 binary classifiers (one for each class) and for each classifier we get the scores when we put the the testing data. In the end we can use all the weights from the 20 classifiers to classify our objects. For our experiments we use the PASCAL VOC 2007 dataset.

Using CNN as our second approach, we use a pre-trained CNN model and we exploit the full connected layer to use it as our feature representation. We use SVM, Adaboost and Random Forest as our classifiers. While the first approach is implemented in MATLAB, the CNN one is implemented in Python.

For detailed information regarding this project:

Report: https://www.overleaf.com/read/swdcjfdcjcvk

Presentation: https://www.overleaf.com/read/wzxndywjpkyh

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
VOCdevkit		VOCdevkit
models		models
README.md		README.md
VOCinit_test.m		VOCinit_test.m
VOCinit_train.m		VOCinit_train.m
VOCinit_val.m		VOCinit_val.m
bow.png		bow.png
build_Spatialhist.m		build_Spatialhist.m
build_hist.m		build_hist.m
cnn.py		cnn.py
cnn_c.m		cnn_c.m
create_BoW.m		create_BoW.m
demo.py		demo.py
extract_voc.m		extract_voc.m
fine_tuning.py		fine_tuning.py
fine_tuning_boosting.py		fine_tuning_boosting.py
generate_labels.m		generate_labels.m
normalized_features.txt		normalized_features.txt
results_boosting.txt		results_boosting.txt
svm_classify.m		svm_classify.m
test_features.txt		test_features.txt
train_features.txt		train_features.txt

EucliTs0/Computer-Vision-Project

Folders and files

Latest commit

History

Repository files navigation

Computer-Vision-Project

Computer Vision Project: Object Recognition using BoW as first approach and CNN as second approach.

Brief

Overview

About

Resources

Stars

Watchers

Forks

Languages