GitHub - jkooy/image-caption: Image caption using soft-attention

The best result is produced by the second model below

resnet152+lstm+hidden_size1024+lr_1e3: Bleu_4_C5 = 0.235378 CIDEr_C5 = 0.748013

resnet152+lstm+hidden_size512+lr_1e3: Bleu_4_C5 = 0.242659 CIDEr_C5 = 0.772517

resnet152+gru+hidden_size512+lr_1e3: Bleu_4_C5 = 0.235254 CIDEr_C5 = 0.750898

resnet152+gru+hidden_size1024+lr_1e3: Bleu_4_C5 = 0.234776 CIDEr_C5 = 0.749187

resnet50+lstm+hidden_size1024+lr_1e3: Bleu_4_C5 = 0.237044 CIDEr_C5 = 0.749605

resnet152+lstm+hidden_size512+lr_1e-2: Bleu_4_C5 = 0.222249 CIDEr_C5 = 0.695748

Requirements

===========================

Python 3.6+
PyTorch 1.3.1
Matplotlib
Pillow
Numpy
NLTK
pycocotools
And some other library included in regular python
These could be installed on DSMLP server by:

python -m pip install --user torch
python -m pip install --user matplotlib
python -m pip install --user numpy
python -m pip install --user pycocotools
python -m pip install --user nltk

Code organization

===========================

example_pics

folder for pictures used in inference demo

notebook_plot_loss

in this folder, there's a notebook for plot the validation and training loss

folders for trained models and results

Some models may not include the trained model files because the storage limit of GitHub
But all models has the result and the eval_score, and we have DropBox link for the model files that exceed limit
resnet152+gru+hidden_size512+lr_1e-3
resnet152+gru+hidden_size1024+lr_1e-3
resnet152+lstm+hidden_size512+lr_1e-3
resnet152+lstm+hidden_size512+lr_1e-2
resnet152+lstm+hidden_size1024+lr_1e-3
resnet50+lstm+hidden_size1024+lr_1e-3
- We change the CNN (resnet152 and resnet50), RNN (LSTM and GRU), hidden size (512 and 1024), learning rate (1e-2 and 1e-3)
- Under each folder, there might be three folders (models, result_json and eval_score) for:
  - storing trained models
  - storing results
  - storing scores

Result_generate_testset.ipynb

notebook for run the trained model on whole test set

Result_generate_valset.ipynb

notebook for run the trained model on whole validation set

checkDataset.ipynb

notebook for check the dataset&dataloader performance, it shows some of the pictures and captions

demo_for_inference.ipynb

this is the demo for inference, it will test the 7 example pictures and generate the captions

demo_for_train.ipynb

this is the demo for training the 'resnet152+lstm+hidden_size512+lr_1e-3' network

vocab.pkl

this is the dictionary that encodes the word

train.py

main file for training

model.py

lstm+CNN model construction

model_gru.py

gru+CNN model construction

utils.py

some useful functions

data_loader.py

load the training and validation set

sample.py

get captions when running forward through the network

build_vocab.py

build the dictionary

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
coco_json_files		coco_json_files
example_pics		example_pics
notebook_plot_loss		notebook_plot_loss
resnet152+gru+hidden_size1024+lr_1e-3		resnet152+gru+hidden_size1024+lr_1e-3
resnet152+gru+hidden_size512+lr_1e-3		resnet152+gru+hidden_size512+lr_1e-3
resnet152+lstm+hidden_size1024+lr_1e-3		resnet152+lstm+hidden_size1024+lr_1e-3
resnet152+lstm+hidden_size512+lr_1e-2		resnet152+lstm+hidden_size512+lr_1e-2
resnet152+lstm+hidden_size512+lr_1e-3		resnet152+lstm+hidden_size512+lr_1e-3
resnet50+lstm+hidden_size1024+lr_1e-3		resnet50+lstm+hidden_size1024+lr_1e-3
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md
Result_generate_testset.ipynb		Result_generate_testset.ipynb
Result_generate_valset.ipynb		Result_generate_valset.ipynb
build_vocab.py		build_vocab.py
checkDataset.ipynb		checkDataset.ipynb
data_loader.py		data_loader.py
demo_for_inference.ipynb		demo_for_inference.ipynb
demo_for_train.ipynb		demo_for_train.ipynb
model.py		model.py
model_gru.py		model_gru.py
sample.py		sample.py
train.py		train.py
utils.py		utils.py
vocab.pkl		vocab.pkl

jkooy/image-caption

Folders and files

Latest commit

History

Repository files navigation