Skip to content

DVLP-CMATERJU/Variational-Augmentation-for-Enhancing-Historical-Document-Image-Binarization

Repository files navigation

Variational-Augmentation-for-Enhancing-Historical-Document-Image-Binarization

Official Code Implementation of Variational Augmentation of Enhancing Historical Document Image Binarization
Accepted at: ICVGIP 2022

Abstract

Historical Document Image Binarization is a well-known segmentation problem in image processing. Despite ubiquity, traditional thresholding algorithms achieved limited success on severely degraded document images. With the advent of deep learning, several segmentation models were proposed that made significant progress in the field but were limited by the unavailability of large training datasets. To mitigate this problem, we have proposed a novel two-stage framework -- the first of which comprises a generator that generates degraded samples using variational inference and the second being a CNN-based binarization network that trains on the generated data. We evaluated our framework on a range of DIBCO datasets, where it achieved competitive results against previous state-of-the-art methods.

Overview

Approach

Deep learning-based methods need large training datasets which are not readily available in the domain of historical documents. To tackle this problem we propose a two-stage framewrork:

  • Aug-Net: A VAE-GAN-based augmentation module based on BicycleGAN that generates synthetic training samples.
  • Bin-Net: An U-Net based segmentation module for the binarization task, trained on the synthetic samples generated by Aug-Net.

Overview

### Results The following are some samples obtained from Aug-Net.

True

Fake1

Fake2

Fake3

Predictions on DIBCO 2014, 2016 and 2018 samples:

Prerequisites

  1. Python 3.7+
  2. Pytorch 1.9+
  3. Albumentations
  4. Fast AI

Dataset Download

  1. You can download the training images of DIBCO from here. Extract patches using datamaker.py.
  2. You can download the testing data from here.
  3. You can also download the training patches directly from here. (recommended)

Directory Structure

- training_datasets
- - train
- - - - bw_patches
- - - - gt_patches
- - - - cl_patches
- - val
- - - - bw_patches
- - - - gt_patches
- - - - cl_patches

- testing_datasets
- - <DIBCO_YEAR>
- - - - bw_patches
- - - - gt_patches
- - - - cl_patches
- - - - results

- Restoration
- - code
- - - - all relavant files here (this repo)
- - weights
- - - - pretrained/saved weights here

Train Instructions

  1. The Augmentation Network (Aug-Net) is based on BicycleGAN. Train the model according to the instructions specified in their official repository using the patches extracted from the training data. Copy the checkpoints folder into synthetic/.

  2. Create a subdirectory evaluation/ to store intermediate results while the model is training.

  3. Run train.py to train the Binarization Network (Bin-Net).

Inference

  1. Change path to the directory containing the test images.
  2. Specify path to weight files.
  3. Run infer.py.
  4. For evaluation, specify the paths to the outputs and the ground truth images in eval.py and run it.

Citation

If you find our paper or code useful, consider citing us:

@misc{https://doi.org/10.48550/arxiv.2211.06581,
  doi = {10.48550/ARXIV.2211.06581},
  
  url = {https://arxiv.org/abs/2211.06581},
  
  author = {Dey, Avirup and Das, Nibaran and Nasipuri, Mita},
  
  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences, I.4.6},
  
  title = {Variational Augmentation for Enhancing Historical Document Image Binarization}

Acknowledgements

Our work is partly based on BicycleGAN and we made extensive use of their code. We would like to thank the authors for their contribution.

TO - DO

  • Inference instructions
  • Add environment.yml
  • Add weight files
  • Add sample images

About

An end-to-end deep learning framework for binarization of historical document images.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages