A Zone Classification Approach for Arabic Documents using Hybrid Features

Amany M.Hesham,Amr Badr,Hassanin M.Al-Barhamtoshy,Sherif Abdou,Mohsen Rashwan

doi:10.14569/ijacsa.2016.070722

Amany M.Hesham, Amr Badr + Show 3 more

Open Access

https://doi.org/10.14569/ijacsa.2016.070722

Copy DOI

Abstract

Zone segmentation and classification is an important step in document layout analysis. It decomposes a given scanned document into zones. Zones need to be classified into text and non-text, so that only text zones are provided to a recognition engine. This eliminates garbage output resulting from sending non-text zones to the engine. This paper proposes a framework for zone segmentation and classification. Zones are segmented using morphological operation and connected component analysis. Features are then extracted from each zone for the purpose of classification into text and non-text. Features are hybrid between texture-based and connected component based features. Effective features are selected using genetic algorithm. Selected features are fed into a linear SVM classifier for zone classification. System evaluation shows that the proposed zone classification works well on multi-font and multi-size documents with a variety of layouts even on historical documents.

Full Text