HMM-based Speech Synthesis Research Articles

Problem statement: In Thai speech synthesis using Hidden Markov model (HMM) based synthesis system, the tonal speech quality is degraded due to tone distortion. This major problem must be treated appropriately to preserve the tone characteristics of each syllable unit. Since tone brings about the intelligibility of the synthesized speech. It is needed to establish the tone questions and other phonetic questions in tree-based context clustering process accordingly. Approach: This study describes the analysis of questions in tree-based context clustering process of an HMM-based speech synthesis system for Thai language. In the system, spectrum, pitch or F0 and state duration are modeled simultaneously in a unified framework of HMM, their parameter distributions are clustered independently by using a decision-tree based context clustering technique. The contextual factors which affect spectrum, pitch and duration, i.e., part of speech, position and number of phones in a syllable, position and number of syllables in a word, position and number of words in a sentence, phone type and tone type, are taken into account for constructing the questions of the decision tree. All in all, thirteen sets of questions are analyzed in comparison. Results: In the experiment, we analyzed the decision trees by counting the number of questions in each node coming from those thirteen sets and by calculating the dominance score given to each question as the reciprocal of the distance from the root node to the question node. The highest number and dominance score are of the set of phonetic type, while the second, third highest ones are of the set of part of speech and tone type. Conclusion: By counting the number of questions in each node and calculating the dominance score, we can set the priority of each question set. All in all, the analysis results bring about further development of Thai speech synthesis with efficient context clustering process in an HMM-based speech synthesis system.

Read full abstract

In this paper, we analyze the effects of several factors and configuration choices encountered during training and model construction when we want to obtain better and more stable adaptation in HMM-based speech synthesis. We then propose a new adaptation algorithm called constrained structural maximum <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">a</i> <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">posteriori</i> linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms. Here, we investigate six major aspects of the speaker adaptation: initial models; the amount of the training data for the initial models; the transform functions, estimation criteria, and sensitivity of several linear regression adaptation algorithms; and combination algorithms. Analyzing the effect of the initial model, we compare speaker-dependent models, gender-independent models, and the simultaneous use of the gender-dependent models to single use of the gender-dependent models. Analyzing the effect of the transform functions, we compare the transform function for only mean vectors with that for mean vectors and covariance matrices. Analyzing the effect of the estimation criteria, we compare the ML criterion with a robust estimation criterion called structural MAP. We evaluate the sensitivity of several thresholds for the piecewise linear regression algorithms and take up methods combining MAP adaptation with the linear regression algorithms. We incorporate these adaptation algorithms into our speech synthesis system and present several subjective and objective evaluation results showing the utility and effectiveness of these algorithms in speaker adaptation for HMM-based speech synthesis.

Read full abstract

HMM-based Speech Synthesis Research Articles

Related Topics

Articles published on HMM-based Speech Synthesis

Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis

Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis

Unsupervised Intralingual and Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis Using Two-Pass Decision Tree Construction

Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis

Analysis of Decision Trees in Context Clustering of Hidden Markov Model Based Thai Speech Synthesis

HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering

The Romanian speech synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate

Tone Question of Tree Based Context Clustering for Hidden Markov Model Based Thai Speech Synthesis

Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis

Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora

A Covariance-Tying Technique for HMM-Based Speech Synthesis

HMM音声合成に基づく音声認識率予測手法

Improvements of Hungarian hidden Markov model-based text-to-speech synthesis

Towards the Development of Speaker-Dependent and Speaker-Independent Hidden Markov Model-Based Thai Speech Synthesis

Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis

Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis

TechWare: HMM-based speech synthesis resources [Best of the Web

HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

A Style Control Technique for HMM-Based Expressive Speech Synthesis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

HMM-based Speech Synthesis Research Articles

Related Topics

Articles published on HMM-based Speech Synthesis

Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis

Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis

Unsupervised Intralingual and Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis Using Two-Pass Decision Tree Construction

Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis

Analysis of Decision Trees in Context Clustering of Hidden Markov Model Based Thai Speech Synthesis

HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering

The Romanian speech synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate

Tone Question of Tree Based Context Clustering for Hidden Markov Model Based Thai Speech Synthesis

Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis

Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora

A Covariance-Tying Technique for HMM-Based Speech Synthesis

HMM音声合成に基づく音声認識率予測手法

Improvements of Hungarian hidden Markov model-based text-to-speech synthesis

Towards the Development of Speaker-Dependent and Speaker-Independent Hidden Markov Model-Based Thai Speech Synthesis

Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis

Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis

TechWare: HMM-based speech synthesis resources [Best of the Web

HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

A Style Control Technique for HMM-Based Expressive Speech Synthesis