Abstract
Existing Tagalog Text-to-speech (TTS) systems still have room for improvement, and although recent attempts at creating local TTS systems for Philippine spoken languages were able to generate synthesized speech, they still possess relatively low Mean Opinion Scores (MOS), ranging from 1.5 to 3.9 (out of 5), when it comes to naturalness and intelligibility. Improving speech prosody, the main factor for a speech's naturalness or individuality, has been made possible through voice conversion (VC). This project aims to implement a VC system for Tagalog synthesized speech, specifically using Cycle Generative Adversarial Networks (Cycle-GAN), a state-of-the-art neural network architecture used in non-parallel VC. Inter-gender and intra-gender VC were made for two types of inputs: Google's own Tagalog TTS and a locally sourced TTS system built from Mary TTS. Results show that Google TTS and its VC models perform better overall than Mary TTS and its VC models. Mel Cepstral Distortions (MCD) and F0: Root Mean Square Errors (F0:RMSE) vary across all models, reaching an MCD as low as 6.52 dB for Google TTS' intra-gender VC and an F0:RMSE as low as 16.92 Hz from Google TTS' inter-gender VC. Meanwhile, undergoing VC also caused a degradation in perceived speech quality as seen in a decrease in MOS across all VC models. Inter-gender VC for both TTS inputs were subjectively more preferred over intra-gender VC, reaching MOS values of 3.76 and 2.32 for Google TTS and Mary TTS inputs, respectively. Furthermore, it was also shown that male respondents were likely to rate higher opinion scores for intra-gender VC than female respondents, likely due to differences in hearing sensitivities.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.