A Dilated MultiRes Visual Attention U-Net for historical document image binarization

Nikolaos Detsikas,Nikolaos Mitianoudis,Nikolaos Papamarkos

doi:10.1016/j.image.2024.117102

Abstract

The task of binarization of historical document images has been in the forefront of image processing research, during the digital transition of libraries. The process of storing and transcribing valuable historical printed or handwritten material can salvage world cultural heritage and make it available online without physical attendance. The task of binarization can be viewed as a pre-processing step that attempts to separate the printed/handwritten characters in the image from possible noise and stains, which will assist in the Optical Character Recognition (OCR) process. Many approaches have been proposed before, including deep learning based approaches. In this article, we propose a U-Net style deep learning architecture that incorporates many other developments of deep learning, including residual connections, multi-resolution connections, visual attention blocks and dilated convolution blocks for upsampling. The novelties in the proposed DMVAnet lie in the use of these elements in combination in a novel U-Net style architecture and the application of DMVAnet in image binarization for the first time. In addition, the proposed DMVAnet is a very computationally lightweight network that performs very close or even better than the state-of-the-art approaches with a fraction of the network size and parameters. Finally, it can be used on platforms with restricted processing power and system resources, such as mobile devices and through scaling can result in inference times that allow for real-time applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Dilated MultiRes Visual Attention U-Net for historical document image binarization

Abstract

Talk to us

Similar Papers

More From: Signal Processing: Image Communication

Lead the way for us

Journal: Signal Processing: Image Communication	Publication Date: Jan 15, 2024
Citations: 3

Similar Papers

Handwritten Character Recognition of MODI Script using Convolutional Neural Network Based Feature Extraction Method and Support Vector Machine Classifier
Solley Joseph ... Jossy George
-
Solley Joseph, et. al.Solley Joseph ... Jossy George
23 Oct 2020
23 Oct 2020

Robust pre-processing techniques for OCR applications on mobile devices
Loh Zhi Chang ... Steven Zhou Zhiying
-
Loh Zhi Chang, et. al.Loh Zhi Chang ... Steven Zhou Zhiying
01 Jan 2009
01 Jan 2009

Line Segmentation Challenges in Tamil Language Palm Leaf Manuscripts
R Spurgen Ratheash* ... M Mohamed Sathik
International Journal of Innovative Technology and Exploring Engineering | VOL. 9
R Spurgen Ratheash*, et. al.R Spurgen Ratheash* ... M Mohamed Sathik
30 Nov 2019
International Journal of Innovative Technology and Exploring Engineering | VOL. 9

A self-powered character recognition device based on a triboelectric nanogenerator
Il-Woong Tcho ... Yang-Kyu Choi
Nano Energy | VOL. 70
Il-Woong Tcho, et. al.Il-Woong Tcho ... Yang-Kyu Choi
25 Jan 2020
Nano Energy | VOL. 70

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Dilated MultiRes Visual Attention U-Net for historical document image binarization

Abstract

Talk to us

Similar Papers

More From: Signal Processing: Image Communication