Abstract

The segmentation process entails dividing or decomposing the entire document image into segments. This operation serves as a fundamental step in developing any writing or optical character recognition system. However, numerous existing segmentation schemes encounter challenges when dealing with specific script styles, like ancient or historical Arabic writing found in ancient manuscripts, which possesses unique characteristics. These characteristics include inclined text lines, overlapping letters, diacritic marks, decorative elements, variable letter forms, and ligatures (combinations of two or more letters merged to form a single connected shape). Thus, in this paper, we present a thorough survey of the field. The survey is composed of two parts. The first section provides a concise overview of historical Arabic documents. The second, which serves as the primary section, focuses on the crucial step of handwritten document recognition, specifically segmentation. A detailed and systematic overview of various segmentation approaches at different levels for extracting handwritten Arabic text-lines is outlined, followed by a literature study analyzing proposed works in this area.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call