Abstract

Abstract: Synthesizing speech is quite complex as it is heavily reliant on language. Meaning, the language processing section in a TTS system inherently has the largest chunk of linguistic knowledge for a particular language. The technical as well as theoretical challenges faced while building such a high-quality system can be quite daunting and hard to navigate. To ensure that the system has relevant and updated linguistic information, one must make sure it has access to the most natural and unrestricted text to ensure quality and authenticity. We will also need extensive studies to achieve the same. At the heart of this software engine lies an OCR Engine (Optical Character Recognizer) which inherits crucial morphological operations required for image conditioning & transformation, accompanied with python libraries used for character classification. Further the processed textual data is transformed into speech signals using various Text-to-Speech synthesis techniques

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call