Abstract

An effective scene to text conversion and its pronunciation is realized. An intelligent combination of Discrete Wavelet Transform (DWT), Contrast Limited Adaptive Histogram Equalization (CLAHE), Wiener filter and adaptive weighted average is utilized for the image enhancement. Subsequently, the Maximally Stable Extremal Region (MSER) is used to detect the text regions. Afterward, the geometrical and contour based approaches filter out the non-text MSERs. The connected component concept is used to group the text candidates. In next step the Optical Character Recognition (OCR) recognizes the text. The Microsoft speech to text synthesizer pronounces the extracted text. The system applicability is tested by using the standard robust reading competition dataset. The designed method secures 93% precision in text segmentation and 89.9% precision in end-to-end recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call