The field of emotional text-to-speech (TTS) synthesis is making swift progress within the realm of artificial intelligence, holding immense promise to transform our interaction with technology. By using advanced algorithms to analyze and understand the emotional content of text, these systems are able to produce spoken language that accurately conveys the intended emotional tone of the message. Despite the existence of several Text-To-Speech systems across various languages, Pali language is yet to have its own. As a result, we have taken the initiative to create a Text-To-Speech synthesizer exclusively for Pali. Our system offers an end-to-end solution for emotional speech synthesis via Text-To-Speech. We address the problem by incorporating disentangled, well-grained prosody features with global, sentence-level emotion implanting. These well-grained features learn to denote local prosodic differences disentangled from the speaker, tone, and worldwide emotion label. Prosody is usually modeled by rules, so we have implemented the fuzzy logic system to develop a controller for the prosody of Pali speech. The fuzzy controller handles different linguistic parameters in three types of sentences: interrogative, exclamatory, and declarative. The final system produces comprehensible speech that mimics the appropriate intonation for every type of sentence. In this paper, we introduce and outline the application of a fuzzy paradigm to incorporate a Text-To-Speech system for the Pali language while preserving a rule-based Concatenative synthesizer. In the outline of classic Concatenative TTS systems, we recommend a new method in order to increase Concatenative unit selection computation, directed at increasing synthetic speech perceptual superiority. In order to tackle the problem of phonemes that are prone to multiple descriptions in rule-based speech synthesis, the proposed solution involves a fuzzy system. In the introductory section, we offer a concise description of the current context surrounding the challenge of emotional speech synthesis. The second section of this paper outlines the notable advancements made in emotional speech synthesis, acknowledging the contributions of various researchers in this field. The third section delves into the technical details of implementing a fuzzy system The last section of the paper presents the main conclusions and future research scope.
Read full abstract