Statistical Binarization Techniques for Document Image Analysis

Saad M Ismail,Siti Norul Huda Sheikh Abdullah,Fariza Fauzi

doi:10.3844/jcssp.2018.23.36

Saad M Ismail, Siti Norul Huda Sheikh Abdullah + Show 1 more

Open Access

https://doi.org/10.3844/jcssp.2018.23.36

Copy DOI

Abstract

Binarization is an important process in image enhancement and analysis. Currently, numerous binarization techniques have been reported in the literature. These binarization methods produce binary images from color or gray-level images. This article highlights an extensive review on various binarization approaches which are also referred to as thresholding methods. These methods are grouped into seven categories according to the employed features and techniques: histogram shape-based, clustering-based, entropy-based, object-attribute-based, spatial, local and hybrid methods. Most active binarization researchers exploit several initial information from the source image such as histogram shape, measurement space clustering, entropy, object attributes, spatial correlation and local gray level surface with a special attention to statistical information description features of image used in recent thresholding techniques.

Highlights

IntroductionA black and white representation of object and non-object (i.e. background) is the preferred format for image analysis especially document image analysis which consists of textual objects
Binary image representation, a black and white representation of object and non-object is the preferred format for image analysis especially document image analysis which consists of textual objects
We have reviewed several classes of document image binarization methods that have been proposed in the literature, mainly to transform gray-level or colored document images into binary document images

Summary

Introduction

A black and white representation of object and non-object (i.e. background) is the preferred format for image analysis especially document image analysis which consists of textual objects. In the application of document image analysis, binarization which converts color or gray-level images into binary images is a crucial pre-processing step. Other purposes include noise removal and size reduction of the images in memory This process increases the visibility of the important information in an image by discarding unimportant information (Kefali et al, 2010). This preservation of important image information is achieved by reducing all levels in a color or gray-level image into two levels: black text and white background. Thresholding only considers pixel intensity and disregards relationships between the pixels which is a major limitation (Morse, 1998) whereby noise pixels can be mistakenly included and informative pixels but isolated from other pixels

Methods

Discussion

Conclusion