Quality Of Synthetic Speech Research Articles

Even the highest quality synthetic speech generated by rule sounds unlike human speech. As the intelligibility of rule-based synthetic speech improves, and the number of applications for synthetic speech increases, the naturalness of synthetic speech will become an important factor in determining its use. In order to improve this aspect of the quality of synthetic speech it is necessary to have diagnostic tests that can measure naturalness. Currently, all of the available metrics for evaluating the acceptability of synthetic speech do not distinguish sufficiently between measuring overall acceptability (including naturalness) and simply measuring the ability of listeners to extract intelligible information from the signal. In this paper we propose a new methodology for measuring the naturalness of particular aspects of synthesized speech, independent of the intelligibility of the speech. Although naturalness is a multidimensional, subjective quality of speech, this methodology makes it possible to assess the separate contributions of prosodic, segmental, and source characteristics of the utterance. In two experiments, listeners reliably differentiated the naturalness of speech produced by two male talkers and two text-to-speech systems. Furthermore, they reliably differentiated between the two text-to-speech systems. The results of these experiments demonstrate that perception of naturalness is affected by information contained within the smallest part of speech, the glottal pulse, and by information contained within the prosodic structure of a syllable. These results show that this new methodology does provide a solid basis for measuring and diagnosing the naturalness of synthetic speech.

Sinusoidal modeling has been successfully applied to a broad range of speech processing problems, and offers advantages over linear predictive modeling and the short-time Fourier transform for speech analysis/synthesis and modification. This paper presents a novel speech analysis/synthesis system based on the combination of an overlap-add sinusoidal model with an analysis-by-synthesis technique to determine the model parameters. It describes this analysis procedure in detail, and introduces an equivalent frequency-domain algorithm that takes advantage of the computational efficiency of the fast Fourier transform (FFT). In addition, a refined overlap-add sinusoidal model capable of shape-invariant speech modification is derived, and a pitch-scale modification algorithm is defined that preserves speech bandwidth and eliminates noise migration effects. Analysis-by-synthesis achieves very high synthetic speech quality by accurately estimating the component frequencies, eliminating sidelobe interference effects, and effectively dealing with nonstationary speech events. The refined overlap-add synthesis model correlates well with analysis-by-synthesis, and modifies speech without objectionable artifacts by explicitly controlling shape invariance and phase coherence. The proposed analysis-by-synthesis/overlap-add (ABS/OLA) system allows for both fixed and time-varying time-, frequency-, and pitch-scale modifications, and computational shortcuts using the FFT algorithm make its implementation feasible using currently available hardware.

Quality Of Synthetic Speech Research Articles

Related Topics

Articles published on Quality Of Synthetic Speech

A new speech synthesis algorithm for high quality TTS systems: a mixed phase vocoder

Text-to-speech from concatenation of articulatory units derived from natural speech

Evaluating the Quality of an Integrated Model of German Prosody

A Text-to-Speech Platform for Variable Length Optimal Unit Searching Using Perception Based Cost Functions

Intelligibility tests for synthetic speech subjective evaluation: The semantically unpredictable sentences approach for European Portuguese

Challenges and Rewards in Using Parametric or Concatenative Speech Synthesis

Evaluation of synthetic speech quality: A comparative study of several computer-based speech synthesizers

Unit Generation Based on Phrase Break Strength and Pruning for Corpus-Based Text-to-Speech

A dynamical system model for generating fundamental frequency for speech synthesis

Comparing the naturalness of several approaches for generating F0 contours in German text-to-speech systems

Automatic creation of CV templates for formant type speech synthesis based on HMM-based segmentation and syllable boundary detection

Quality enhancement of sinusoidal transform vocoders

TFW-PM reference for subjectively assessing the quality of synthetic speech

Voiced speech excitation synthesis using a sinusoidal model

Prosodic Phrasing and Comprehension

Measuring the naturalness of synthetic speech

Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model

Good prosody facilitates comprehension

A neuronal formant synthesizer

Analysis of quality factors in synthetic speech produced by rules

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Quality Of Synthetic Speech Research Articles

Related Topics

Articles published on Quality Of Synthetic Speech

A new speech synthesis algorithm for high quality TTS systems: a mixed phase vocoder

Text-to-speech from concatenation of articulatory units derived from natural speech

Evaluating the Quality of an Integrated Model of German Prosody

A Text-to-Speech Platform for Variable Length Optimal Unit Searching Using Perception Based Cost Functions

Intelligibility tests for synthetic speech subjective evaluation: The semantically unpredictable sentences approach for European Portuguese

Challenges and Rewards in Using Parametric or Concatenative Speech Synthesis

Evaluation of synthetic speech quality: A comparative study of several computer-based speech synthesizers

Unit Generation Based on Phrase Break Strength and Pruning for Corpus-Based Text-to-Speech

A dynamical system model for generating fundamental frequency for speech synthesis

Comparing the naturalness of several approaches for generating F0 contours in German text-to-speech systems

Automatic creation of CV templates for formant type speech synthesis based on HMM-based segmentation and syllable boundary detection

Quality enhancement of sinusoidal transform vocoders

TFW-PM reference for subjectively assessing the quality of synthetic speech

Voiced speech excitation synthesis using a sinusoidal model

Prosodic Phrasing and Comprehension

Measuring the naturalness of synthetic speech

Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model

Good prosody facilitates comprehension

A neuronal formant synthesizer

Analysis of quality factors in synthetic speech produced by rules