Abstract
Document segmentation is a method of rending the document into distinct regions. A document is an assortment of information and a standard mode of conveying information to others. Pursuance of data from documents involves ton of human effort, time intense and might severely prohibit the usage of data systems. So, automatic information pursuance from the document has become a big issue. It is been shown that document segmentation will facilitate to beat such problems. This paper proposes a new approach to segment and classify the document regions as text, image, drawings and table. Document image is divided into blocks using Run length smearing rule and features are extracted from every blocks. Discipulus tool has been used to construct the Genetic programming based classifier model and located 97.5% classification accuracy.
Highlights
Document segmentation is defined as a method of subdividing the document regions into text and non-text regions
The problem of document segmentation is a multiclass classification. It has been solved by extending binary classification into multiclass classification using one against one method
This paper demonstrates the modeling of document segmentation as classification task and describes the implementation of genetic programming approach for classifying various regions
Summary
Document segmentation is defined as a method of subdividing the document regions into text and non-text regions. This research work associates the existing features specified in [4] [6] [8] and proposes few features which subsidizes more in document segmentation Features such as perimeter/height ratio, energy, entropy are employed. A block in document image is a connected component and it is defined as a collection of black runs that are 8-connected Both perimeter and height of the block diverges in their values. Each block of the document varies in its energy and entropy in case of table, drawings and image blocks. These new features offer a notable influence in document segmentation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Advanced Research in Artificial Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.