Abstract
Voice imitation basically consists in estimating a synthesizer's input parameters to mimic a target speech signal. This is a difficult inverse problem because the mapping is time-varying, non-linear and from many to one. It typically requires considerable amount of time to be done manually. This work presents the evolution of a system based on a genetic algorithm (GA) to automatically estimate the input parameters of the Klatt and HLSyn formant synthesizers using an analysis-by-synthesis process. Results are presented for natural (human-generated) speech for three male speakers. The results obtained with the GA-based system outperform those obtained with the baseline Winsnoori with respect to four objective figures of merit and a subjective test. The GA with Klatt synthesizer generated similar voices to the target and the subjective tests indicate an improvement in the quality of the synthetic voices when compared to the ones produced by the baseline.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.