Abstract

In this paper, we present an extension of a novel continuous residual-based vocoder for statistical parametric speech synthesis by addressing two objectives. First, because the noise component is often not accurately modelled in modern vocoders (e.g. STRAIGHT), a new technique for modelling unvoiced sounds is proposed by adding time domain envelope to the unvoiced segments to avoid any residual buzziness. Four time-domain envelopes (Amplitude, Hilbert, Triangular and True) are investigated, enhanced, and then applied to the noise component of the excitation in our continuous vocoder, i.e. of which all parameters are continuous. With the future aim of producing high-quality Arabic speech synthesis, we secondly apply this vocoder on a modern standard Arabic audio-visual corpus which is annotated both phonetically and visually, and dedicated to emotional speech processing studies.In an objective experiment, we investigated the Phase Distortion Deviation, whereas a MUSHRA type subjective listening test was conducted comparing natural and vocoded speech samples. As a result, both experiments based on the proposed noise modelling have shown satisfactory results in terms of naturalness and intelligibility, while outperforming STRAIGHT and other earlier residual-based approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.