Skip to content

Computer Vision Project: Object Recognition using BoW as first approach and CNN as second approach.

Notifications You must be signed in to change notification settings

EucliTs0/Computer-Vision-Project

Repository files navigation

Computer-Vision-Project

An example of a typical bag of words classification pipeline. Figure by https://fr.mathworks.com/help/vision/ug/image-classification-with-bag-of-visual-words.html

Computer Vision Project: Object Recognition using BoW as first approach and CNN as second approach.

Brief

Overview

The scope of this project is to implement two approaches for object recognition. The first one is based on Bag of Words models, where we extract handcrafted features (dense sift features). The visual vocabulary is created by clustering a large amount of local descriptors and we make use of the centers as visual vocabulary. Then, we encode each image by creating i)'hard' histograms and ii)spatial histograms. In the second approach we use pre-trained CNN model to fine-tune our data and then we extract a full connected layer as our feature representation and after we feed these deep features to a Linear SVM classifier.

BoW: We use the training and validation set to extract dense SIFT features and from each image we keep 300 descriptors, so in total we get around 1.5 million descriptors from train+val images. K-means is used to cluster these descriptors, using the ANN algorithm (Approximate Nearest-Neighbor) and k-means PLUSPLUS to intialize the centers. Six different numbers of clusters are used {200, 500, 800, 1000, 4000, 8000} and we save the centroids as our vocabulary, which we use after to encode each image into 'hard' histograms.When we have our feature sets ready, we map them into homogeneous kernel maps (intersection kernel, chi2 kernel) and then we feed them to a linear SVM and we see which vocabulary size (200, 500, 800, 1000, 4000, 8000) gives the best performance (using 8000 clusters yields the best performance). We also use spatial histogram encoding {1x1}, {1x3} and {2x2} regions.

This task is a multi-label classification, because each image may contain more than one object. To classify our images, we create 20 binary classifiers (one for each class) and for each classifier we get the scores when we put the the testing data. In the end we can use all the weights from the 20 classifiers to classify our objects. For our experiments we use the PASCAL VOC 2007 dataset.

Using CNN as our second approach, we use a pre-trained CNN model and we exploit the full connected layer to use it as our feature representation. We use SVM, Adaboost and Random Forest as our classifiers. While the first approach is implemented in MATLAB, the CNN one is implemented in Python.

For detailed information regarding this project:

Report: https://www.overleaf.com/read/swdcjfdcjcvk

Presentation: https://www.overleaf.com/read/wzxndywjpkyh

About

Computer Vision Project: Object Recognition using BoW as first approach and CNN as second approach.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages