An HMM-Based Brazilian Portuguese Speech Synthesizer and Its Characteristics

R Maia,F.G.V Resende Júnior,T Kitamura,H Zen,K Tokuda

doi:10.14209/jcis.2006.11

Abstract

Research on speech synthesis area has made great progress recently, perhaps motivated by its numerous applications, of which text-to-speech converters and dialog systems are examples. Several improvements have been reported in the technical literature related to existing state-of-the-art techniques as well as in the development of new ideas related to the alteration of voice characteristics, with their eventual application to different languages. Nevertheless, in spite of the attention that the speech synthesis field has been receiving, the technique which employs unit selection and concatenation of waveform segments still remains as the most popular approach among those available nowadays. In this paper, we report how a synthesizer for the Brazilian Portuguese language was constructed according to a technique in which the speech waveform is generated through parameters directly determined from Hidden Markov Models. When compared with systems based on unit selection and concatenation, the proposed synthesizer presents the advantage of being trainable, with the utilization of contextual factors including information related to different levels of the following acoustic units: phones, syllables, words, phrases and utterances. Such information is brought into effect through a set of questions for context-clustering. Thus, both the spectral and the prosodic characteristics of the system are managed by decision-trees generated for each one of the following parameters: mel-cepstral coefficients, fundamental frequency and state durations. As a typical characteristic of the technique based on Hidden Markov Models, synthesized speech with quality comparable to commercial applications built under the unit selection and concatenation approach can be obtained even from a database as small as eighteen minutes of speech. This was tested by a subjective comparison of samples from the synthesizer in question and other systems currently available for Brazilian Portuguese.

Highlights

Resumo - A pesquisa na area de sıntese de voz tem alcancado grande progresso recentemente, provavelmente motivada por suas inumeras aplicacoes, dentre as quais se pode citar conversores texto-voz e sistemas de dialogo
It should be noted that each utterance information produced by the natural language processing (NLP) modules connected to the Hidden Markov Model (HMM)-based and MBROLA synthesizers was manually corrected in order to avoid transcription and/or stress related errors on the synthesized speech
The description of a Brazilian Portuguese speech synthesizer with its corresponding characteristics was performed in this paper

Summary

INTRODUCTION

Resumo - A pesquisa na area de sıntese de voz tem alcancado grande progresso recentemente, provavelmente motivada por suas inumeras aplicacoes, dentre as quais se pode citar conversores texto-voz e sistemas de dialogo. One of the main advantages of the referred HMM-based synthesis technique when compared with the unit selection and concatenation method is the fact that voice alteration can be performed with no need of large databases [9,10,11] Another advantage is that synthesized speech with applicability can be achieved by training the system with a database as small as eighty sentences, as reported in [8]. One of the main disadvantages of the referred approach corresponds to the buzzy quality of the synthesized speech This drawback is caused by the source-filter model which is used during the waveform generation stage, which basically consists in a linear predictive vocoder, though in [14] it is reported that the mentioned buzz can be removed with the utilization of a mixed excitation scheme.

ENGINE DESCRIPTION

SPEECH PARAMETER EXTRACTION

HMM TRAINING

SYNTHESIS PART

PARAMETER DETERMINATION

EXCITATION CONSTRUCTION AND FILTERING

ASPECTS OF BRAZILIAN PORTUGUESE SPEECH SYNTHESIS BASED ON HMM

THE PHONE SET

DEFINITION OF AN UTTERANCE INFORMATION

TEXT PROCESSING

THE CONTEXTUAL FACTORS

CONTEXT CLUSTERING

THE CORPUS

PARAMETER EXTRACTION

GENERATED DECISION-TREES

EXAMPLE OF SYNTHESIS

INFLUENCE OF SOME CONTEXTUAL FACTORS ON THE SYNTHESIZED SPEECH

INFLUENCE OF POS AND SYLLABLE

INFLUENCE OF SYLLABLE STRESS

THE SYNTHESIZERS

THE SENTENCES

CONCLUSION AND FUTURE WORK

THE SUBJECTS

THE RESULTS

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Communication and Information Systems	Publication Date: Aug 30, 2006
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

An HMM-Based Brazilian Portuguese Speech Synthesizer and Its Characteristics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Communication and Information Systems

Lead the way for us

Similar Papers

Modulation spectrum compensation for HMM-based speech synthesis using line spectral pairs
Zhen-Hua Ling ... Xiao-Hui Sun
-
Zhen-Hua Ling, et. al.Zhen-Hua Ling ... Xiao-Hui Sun
01 Mar 2016
01 Mar 2016

Hidden Markov model-based speech synthesis as a tool for constructing comunicative spoken dialog systems
Keiichi Tokuda
The Journal of the Acoustical Society of America | VOL. 120
Keiichi TokudaKeiichi Tokuda
01 Nov 2006
The Journal of the Acoustical Society of America | VOL. 120

Performance Evaluation of Speech Synthesis Techniques for English Language
Bharti Gawali ... Santosh Gaikwad
-
Bharti Gawali, et. al. Bharti Gawali ... Santosh Gaikwad
01 Jan 2015
01 Jan 2015

Modeling of Speech Parameter Sequence Considering Global Variance for HMM-Based Speech Synthesis
Tomoki Toda
-
Tomoki TodaTomoki Toda
19 Apr 2011
19 Apr 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An HMM-Based Brazilian Portuguese Speech Synthesizer and Its Characteristics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Communication and Information Systems