Abstract

The present paper has introduced an innovative and efficient technique that enables user to hear the contents of text images instead of reading through them. In the current world, there is a great increase in the utilization of digital technology and multiple methods are available for the people to capture images. such images may contain important textual content that the user may need to edit or store digitally. It merges the concept of Optical Character Recognition (OCR) and Text to Speech Synthesizer (TTS). This can be done using Optical Character Recognition with the use of Tesseract OCR Engine. OCR is a branch of AI that is used in applications to recognize text from scanned documents or images. The analyzed text can also be converted to audio format to help visually impaired people hear the content that they wish to know. Text-to-Speech conversion is a method that scans and reads alphabets and numbers that are in the image using OCR technique and convert it into voices. The aim is to study and compare the multiple methods used for STT conversions and to figure out the most efficient technique that can be adapted for the conversion processes. As a result, based on review study it is found that HMM is a statistical model which is most suitable for TTS conversions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.