Unsupervised clustering through feature learning using Tensorflow

Description

The overall pipeline consists of two major steps:

Feature learning
Clustering and analysis

All setups are configured through .task JSON files located in tasks directory. Results are stored in results folder, TensorBoard logs and checkpoints stored in logs folder.

Feature Learning

Three types of feature learning are provided: convolutional autoencoder, pre-trained CNN, vector quantization based on K-means.

Convolutional autoencoder

Default set of layers includes convolutional, batch normalization and max-pooling layers. Optionally dense layers can be added in the middle of the model. Available options:

Random input rotation during training
Max pooling with or without argmax (preservation of positions during decode). Argmax works on GPU only.
Same weights for encoder and decoder
Any number of dense layers

Pre-trained CNN

For instance, MNIST dataset is split into two: 1-5 digits and 6-0. The CNN classifier is trained on the first one (1-5), than feature-learning part of the model is evaluated on the second one (6-0). Feature-learning part - everything but the last Softmax layer.

Vector quantization

Feature learning described in this paper. Based on k-means' centroids learning.

Clustering and Quality Analysis

Output includes t-SNE embedding for three types of clustering (K-means, DBSCAN, HDBSCAN) and ground truth coloring. Here goes the example of ground truth scatter plot for MNIST dataset (output from Quick start example):

3D PCA reduction using Tensorboard Embedding:

Installation

Python 3 and the following libraries are required. Installation using pip:

pip install tensorflow, matplotlib, scikit_image, hdbscan, imageio, numpy, scipy, Pillow, skimage, scikit_learn

Quick start

To run convolutional autoencoder on MNIST dataset:

python cnn_feature_cluster.py --task=examples/cnnAE

After training is finished, test results will be saved in results folder. The feature learning quality is evaluated using clustering and V-measure entropy score.

To start tensorboard:

tensorboard --logdir=logs/train

Tensorboard supports input/output samples, computational graph, loss/accuracy plots, feature vector embeddings (projector)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
includes		includes
tasks		tasks
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
__init__.py		__init__.py
cnn_feature_cluster.py		cnn_feature_cluster.py
dataset_converter.py		dataset_converter.py
multiplytasks.py		multiplytasks.py
requirements.txt		requirements.txt
results_parser.py		results_parser.py
run.py		run.py

License

apatsekin/patterns-identification-tensorflow

Folders and files

Latest commit

History

Repository files navigation

Unsupervised clustering through feature learning using Tensorflow

Description

Feature Learning

Convolutional autoencoder

Pre-trained CNN

Vector quantization

Clustering and Quality Analysis

Installation

Quick start

About

Resources

License

Stars

Watchers

Forks

Languages