Abstract

The historical and theoretical bases of contemporary high-performance text-to-speech (TTS) systems and their current designs are discussed. The major elements of TTS system are described, with particular reference to vocal tract models. The stages involved in the process of converting text into speech parameters are examined, covering text normalization, word pronunciation, prosodies, phonetic rules, voice tables, and hardware implementation. It is an image to text and speech conversion system developed for visually impaired as well as physically challenging people to be able to get information from the images easily. Core idea for image to text and speech conversion is to overcome the challenges faced by a visually impaired person in real life. The system goes through various phases such as image processing, text extraction, text-to-speech (TTS) conversion. This device opens the camera using the app into it out to the text and you get the audio. The primary motivation is to provide a visually impaired person with a friendly speech interface with computer and to allow such people who are physically and visually challenged to use the system to read printed text on the go.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call