Abstract

Script identification is one of a challenging segment of optical character recognition system for the bilingual or multilingual document image. Significant research work has been noted on script identification in the last two decades which highly concentrated on natural languages like Latin, Chinese, Hindi, French and so forth. Very little efforts are made on script identification of cursive languages like Arabic, Urdu, Pashto, etc. Most of the Urdu ancient literature which is yet to be digitised includes both Urdu and Arabic text. In this paper, we present a script identification of Urdu and Arabic text at word level using Gabor features with suitable orientation and frequencies. The proposed model is trained using support vector machine (SVM) classifier and the results achieved are very promising.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call