Skip to content

nearnear/vision-studies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

Vision Problem Solvings

Solutions for Computer Vision problems.

1. ViViT

1.1. Video classification with ViViT

This notebook is a trial of ViViT(Video Vision Transformer) model.

The dataset used is SynapseMNIST3D of MedMNIST3D, where each data is a sequence of synapse images representing 3D volume. The data samples are displayed inside the notebook with Jupyter Widget.

Inference

These are some of the samples from inference.

1.2. ViViT with/without Token Learner

This notebook explores the effect of Token Learner put in ViViT.

The datasets used for training are from MedMNIST 3D, which contains medical 3D images with different types of classes. The model structure was tested on patch size 8 and 16, and token learner was put in the middle (half point of the transformer blocks). AdamW optimization method was used for regulralization and the learning rate was reduced on plateau.

The Result

The overall performance of the model with token learner was better than the naive model in validation accracy and loss over epochs. Also, there was no signs of overfitting with token learner even though the training time was shortened. The result shows that with token learners models learn faster, without significant risk of overfitting.

All of the result graphs are displayed on TensorBoard.

About

Vision Model Notebooks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published