OCR is considered the fastest way of data entry; the smart conversion of the text data is called handwritten text recognition. Many of the languages possess OCRs and there are still some languages lacking the OCR. Balochi is one of the national languages of the Pakistan country and the most of speakers live in Baluchistan province of Pakistan. Balochi computing is at its infancy and require attention to its many of the approaches to accumulate the level of other languages especially pertaining to the matter of computation. This paper investigates the relation between other Arabic adopting languages and proposes a segmentation algorithm to segment Balochi text paragraphs into lines, lines into words and words into characters. The algorithm has been adopted and fine tuned to produce the accuracy of 95%. The segmentation algorithm will play a role in developing a complex OCR and handwritten recognition of Balochi language.
Read full abstract