Abstract

India is a country with many cultures and if you travel from one place to another, you might find yourself in totally different culture. This also means the languages change from place to place in India and it gets very difficult to read signboards, shop names and even many other common things written in local languages. This can create problems for not only the travelers travelling from other countries but also the people who move withing the country from different regions. But most of the signboards, shop names or other landmarks mostly use English or Hindi in most of the regions. Here we propose a complete text detection & recognition as well as transliteration system that will help travelers read text written in Hindi on any signboards or shops and then transliterate that detected text into English. The proposed system is capable of detecting text written in Hindi language in natural environment using Progressive Scale Expansion algorithm and then transliterating the detected text into English language. Our proposed system can detect text in tough scenarios, and it can even detect curved text from natural images. Our system after detecting text region, extracts the text from the detected area using PyTesseract OCR engine and then the extracted text is further transliterated into English text with the help of seq2seq MultiRNN LSTM model which gives us accurate transliterations without losing the actual pronunciation of the original Hindi words. We use a synthetic dataset for Hindi Text images containing approx. 100000 for Text Detection and FIRE2013 dataset for transliteration. The overall system is evaluated using BLEU score.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call