HMM-based Speech Synthesis System Research Articles

Basic frequency (f0) extraction assumes a significant function in the handling of monophonic signals, for example, discourse and tune. It is fundamental in different ongoing applications like feeling acknowledgment, discourse/performing voice segregation, etc. A few f0 extraction techniques have been proposed throughout the long term, yet nobody calculation functions admirably for both discourse and tune. Previously an efficient method for extracting pitch from speech signals for HMM-based speech synthesis system, voicing detection and pitch estimation is performed using the mean signal obtained from Continuous Wavelet Transform (CWT) coefficients. Both objective and subjective evaluation results show that the quality of speech synthesized with the proposed pitch estimation method is much better compared with HMM-based speech synthesis systems developed using the state-of-the-art pitch extraction methods, namely, Robust Algorithm for Pitch Tracking (RAPT) and Speech Transformation and Representation using Adaptive Interpolation of weighted spectrum (STRAIGHT) employed in HTS. Zero-frequency filter (ZFF) is used to derive the locations of impulse excitation. An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds, the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. In this paper, we propose a novel methodology that can precisely appraise f0 from discourse just as tunes. To begin with, voiced/unvoiced recognition is performed utilizing a novel RNN-LSTM based methodology. At that point, each voiced edge is disintegrated into a few sub-groups. From each sub-band of a voiced casing, the applicant pitch periods are distinguished utilizing autocorrelation and non-straight activities. At last, Viterbi interpreting is utilized to shape the last pitch forms. The presentation of the proposed technique is assessed utilizing mainstream discourse (Keele, CMU-Cold), and melody (MIR-1K, Verses) information bases. The assessment results show that the proposed strategy performs similarly well for discourse and monophonic melodies and is superior to the best in class techniques. Further, the viability of proposed f0 extraction strategy is exhibited by building up an intelligent SARGAM learning apparatus. The most generally utilized f0 extraction strategies are Praat and Riveted. The proposed method increases the performance nearly 81.8% and 35.2% compared to existing methods.

Read full abstract

Problem statement: In Thai speech synthesis using Hidden Markov model (HMM) based synthesis system, the tonal speech quality is degraded due to tone distortion. This major problem must be treated appropriately to preserve the tone characteristics of each syllable unit. Since tone brings about the intelligibility of the synthesized speech. It is needed to establish the tone questions and other phonetic questions in tree-based context clustering process accordingly. Approach: This study describes the analysis of questions in tree-based context clustering process of an HMM-based speech synthesis system for Thai language. In the system, spectrum, pitch or F0 and state duration are modeled simultaneously in a unified framework of HMM, their parameter distributions are clustered independently by using a decision-tree based context clustering technique. The contextual factors which affect spectrum, pitch and duration, i.e., part of speech, position and number of phones in a syllable, position and number of syllables in a word, position and number of words in a sentence, phone type and tone type, are taken into account for constructing the questions of the decision tree. All in all, thirteen sets of questions are analyzed in comparison. Results: In the experiment, we analyzed the decision trees by counting the number of questions in each node coming from those thirteen sets and by calculating the dominance score given to each question as the reciprocal of the distance from the root node to the question node. The highest number and dominance score are of the set of phonetic type, while the second, third highest ones are of the set of part of speech and tone type. Conclusion: By counting the number of questions in each node and calculating the dominance score, we can set the priority of each question set. All in all, the analysis results bring about further development of Thai speech synthesis with efficient context clustering process in an HMM-based speech synthesis system.

Read full abstract

HMM-based Speech Synthesis System Research Articles

Related Topics

Articles published on HMM-based Speech Synthesis System

RETRACTED ARTICLE: Powerful basic frequency extraction from monophonic signs utilizing versatile sub-band separating

Language-independent acoustic cloning of HTS voices

Hidden-Markov-model based statistical parametric speech synthesis for Marathi with optimal number of hidden states

Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic

Robust Pitch Extraction Method for the HMM-Based Speech Synthesis System

Parameterization of Excitation Signal for Improving the Quality of HMM-Based Speech Synthesis System

Generation of creaky voice for improving the quality of HMM-based speech synthesis

Corpus-Based Hidden Markov Modelling of the Fundamental Frequency of Lithuanian

Towards automatic cross-lingual acoustic modelling applied to HMM-based speech synthesis for under-resourced languages

Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech

Robust Voicing Detection and $$F_{0}$$ F 0 Estimation for HMM-Based Speech Synthesis

$$\hbox {F}_{0}$$ F 0 contour generation and synthesis using Bengali Hmm-based speech synthesis system

Developing an HMM-Based Speech Synthesis System for Malay: A Comparison of Iterative and Isolated Unit Training

Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis

HMM 기반의 TTS를 위한 상호유사도 비율을 이용한 결정트리 기반의 문맥 군집화

Design of Tree Structure in Context Clustering Process of Hidden Markov Model-Based Thai Speech Synthesis

Thai Speech Phonology for Development of Speech Synthesis: A Review

A Control of Fundamental Frequency Contour for Hidden Markov Model-Based Thai Speech Synthesis

Outlier Detection and Removal for HMM-Based Speech Synthesis with an Insufficient Speech Database

Analysis of Decision Trees in Context Clustering of Hidden Markov Model Based Thai Speech Synthesis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

HMM-based Speech Synthesis System Research Articles

Related Topics

Articles published on HMM-based Speech Synthesis System

RETRACTED ARTICLE: Powerful basic frequency extraction from monophonic signs utilizing versatile sub-band separating

Language-independent acoustic cloning of HTS voices

Hidden-Markov-model based statistical parametric speech synthesis for Marathi with optimal number of hidden states

Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic

Robust Pitch Extraction Method for the HMM-Based Speech Synthesis System

Parameterization of Excitation Signal for Improving the Quality of HMM-Based Speech Synthesis System

Generation of creaky voice for improving the quality of HMM-based speech synthesis

Corpus-Based Hidden Markov Modelling of the Fundamental Frequency of Lithuanian

Towards automatic cross-lingual acoustic modelling applied to HMM-based speech synthesis for under-resourced languages

Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech

Robust Voicing Detection and $$F_{0}$$ F 0 Estimation for HMM-Based Speech Synthesis

$$\hbox {F}_{0}$$ F 0 contour generation and synthesis using Bengali Hmm-based speech synthesis system

Developing an HMM-Based Speech Synthesis System for Malay: A Comparison of Iterative and Isolated Unit Training

Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis

HMM 기반의 TTS를 위한 상호유사도 비율을 이용한 결정트리 기반의 문맥 군집화

Design of Tree Structure in Context Clustering Process of Hidden Markov Model-Based Thai Speech Synthesis

Thai Speech Phonology for Development of Speech Synthesis: A Review

A Control of Fundamental Frequency Contour for Hidden Markov Model-Based Thai Speech Synthesis

Outlier Detection and Removal for HMM-Based Speech Synthesis with an Insufficient Speech Database

Analysis of Decision Trees in Context Clustering of Hidden Markov Model Based Thai Speech Synthesis