Abstract

Text appearing in videos carries rich information that can be exploited for content based retrieval applications. An important component in such systems is the development of a video optical character recognition (V-OCR) system that converts images into text. Though mature recognition system have been developed for recognition of text in non-cursive scripts, recognition of cursive text remains a challenging problem. An important factor in developing recognition systems for cursive text is the choice of recognition unit. Unlike words which can be extracted in non-cursive scripts, recognition of cursive text relies either on partial words/ligatures (holistic techniques) or characters (analytical techniques). This paper presents a comparative study on the effectiveness of holistic and analytical recognition using the latest deep learning techniques. More specifically, we employ convolutional neural networks for recognition of ligatures segmented from caption text and a combination of convolutional and recurrent neural networks for recognition of characters from text line images. Experiments are carried out on 16,000 text lines (extracted from 5000 video frames) containing cursive Urdu text extracted from videos of various News channels. The experimental results demonstrate that analytical techniques are more robust as compared to the holistic techniques and the findings can be generalized to other cursive scripts as well.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.