Skip to content

LLukas22/Comparison-of-Deep-Image-Embedding-Methods

Repository files navigation

Comparison-of-Deep-Image-Embedding-Methods

This Repo compares different deep image embedding methods with the goal to achieve good general embeddings for images given a small amount of training data.

This Repo was created for an assignment in a deep vision course at the OTH-Amberg-Weiden. Therefore a report is included.

Datasets

Following datasets were used:

The notebooks expect the datasets to be in the root of the repo.

Backbones

The Backbones-Notebook compares the following backbones.

Results:

alt text

Backbone F1-Score
ResNet50 0.664
EfficientNetV2_L 0.540
MobilNetV3 0.367
DenseNet169 0.612
ViT 0.893
Swin 0.934

Losses

The Losses-Notebook compares the following Loss-Functions.

Results:

alt text

Loss F1-Score
ContrastiveLoss 0.650
TripletLoss 0.660
SupConLoss 0.709
SNRLoss 0.685
NTXentLoss 0.618

Embedding Size

The Embedding Size-Notebook compares different Embedding-Sizes.

Results:

alt text

Embedding Size F1-Score
64 0.654
128 0.683
256 0.712
512 0.719
1024 0.724
2048 0.731

Dataset Size

The Dataset Size-Notebook compares different Train-Sample-Sizes for each class in the dataset.

Results:

alt text

Samples per Class F1-Score
10 0.223
20 0.280
30 0.369
50 0.443
80 0.507
100 0.520
200 0.632
400 0.703

Augmentation Factor

The Augmentation Factor-Notebook compares different augmentation factors for a small dataset with 20 images per class.

Results:

alt text

Factor F1-Score
1x (Baseline) 0.306
2x 0.352
4x 0.437
8x 0.518
16x 0.569

Augmentation Methods

The Augmentation Methods-Notebook compares different auto-augmentation methods integrated in pytorch.

Results:

alt text

Method F1-Score
Baseline 0.289
AutoAugment 0.392
RandAugment 0.436
TrivialAugmentWide 0.441

Zero Shot Learning

In the Zero Shot-Notebook we test the capabilities of a SWIN-Network finetuned on the "Tiny-ImageNet"-Dataset to embed images from the "Internal and External Parts of Cars"-Dataset.

Results:

alt text

F1-Score
0.865

Putting it all together

In the Final-Notebook we try to finetune a Network on 20 images of 4 classes of the "Internal and External Parts of Cars"-Dataset and perform normal and zero-shot detection on all 8 classes with 230 images per class.

Results:

alt text

Mode F1-Score
Normal 0.995
Zero-Shot 0.975

About

Comparison of different Backbones, Losses and Augmentation-Methods to generate Image-Embeddings

Topics

Resources

License

Stars

Watchers

Forks