Large Speech Corpus Research Articles

We propose a new approach to synthesizing emotional speech by a corpus-based concatenative speech synthesis system (ATR CHATR) using speech corpora of emotional speech. In this study, neither emotional-dependent prosody prediction nor signal processing per se is performed for emotional speech. Instead, a large speech corpus is created per emotion to synthesize speech with the appropriate emotion by simple switching between the emotional corpora. This is made possible by the normalization procedure incorporated in CHATR that transforms its standard predicted prosody range according to the source database in use. We evaluate our approach by creating three kinds of emotional speech corpus (anger, joy, and sadness) from recordings of a male and a female speaker of Japanese. The acoustic characteristics of each corpus are different and the emotions identifiable. The acoustic characteristics of each emotional utterance synthesized by our method show clear correlations to those of each corpus. Perceptual experiments using synthesized speech confirmed that our method can synthesize recognizably emotional speech. We further evaluated the method’s intelligibility and the overall impression it gives to the listeners. The results show that the proposed method can synthesize speech with a high intelligibility and gives a favorable impression. With these encouraging results, we have developed a workable text-to-speech system with emotion to support the immediate needs of nonspeaking individuals. This paper describes the proposed method, the design and acoustic characteristics of the corpora, and the results of the perceptual evaluations.

Read full abstract

This paper describes a new Korean Text-to-Speech (TTS) system based on a large speech corpus. Conventional concatenative TTS systems still produce machine-like synthetic speech. The poor naturalness is caused by excessive prosodic modification using a small speech database. To cope with this problem, we utilized a dynamic unit selection method based on a large speech database without prosodic modification. The proposed TTS system adopts triphones as synthesis units. We designed a new sentence set maximizing phonetic or prosodic coverage of Korean triphones. All the utterances were segmented automatically into phonemes using a speech recognizer. With the segmented phonemes, we achieved a synthesis unit cost of zero if two synthesis units were placed consecutively in an utterance. This reduces the number of concatenating points that may occur due to concatenating mismatches. In this paper, we present data concerning the realization of major prosodic variations through a consideration of prosodic phrase break strength. The phrase break was divided into four kinds of strength based on pause length. Using phrase break strength, triphones were further classified to reflect major prosodic variations. To predict phrase break strength on texts, we adopted an HMM-like Part-of-Speech (POS) sequence model. The performance of the model showed 73.5% accuracy for 4-level break strength prediction. For unit selection, a Viterbi beam search was performed to find the most appropriate triphone sequence, which has the minimum continuation cost of prosody and spectrum at concatenating boundaries. From the informal listening test, we found that the proposed Korean corpus-based TTS system showed better naturalness than the conventional demisyllable-based one.

Read full abstract

Large Speech Corpus Research Articles

Related Topics

Articles published on Large Speech Corpus

Modeling Phone Duration of Lithuanian by Classification and Regression Trees, using Very Large Speech Corpus

Automatic phonetic transcription of large speech corpora

Methods and apparatus for rapid acoustic unit selection from a large speech corpus

Incorporating Phonetic Knowledge Into an Evolutionary Subspace Approach for Robust Speech Recognition

MLP-based phone boundary refining for a TTS database

A novel visualization tool for manual annotation when building large speech corpora

On temporal aspects of turn taking in conversational dialogues

Statistical modeling of phonological rules through linguistic hierarchies

A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese

A corpus-based speech synthesis system with emotion

A New Korean Corpus-Based Text-to-Speech System

Representational issues in annotation: Using the Australian map task corpus to relate prosody and discourse structure

Automatic ToBI prediction and alignment to speed manual labeling of prosody

AHUMADA: A large speech corpus in Spanish for speaker characterization and identification

Automatic ToBI prediction and alignment to speed manual labeling of prosody

The mu + system for corpus based speech research

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large Speech Corpus Research Articles

Related Topics

Articles published on Large Speech Corpus

Modeling Phone Duration of Lithuanian by Classification and Regression Trees, using Very Large Speech Corpus

Automatic phonetic transcription of large speech corpora

Methods and apparatus for rapid acoustic unit selection from a large speech corpus

Incorporating Phonetic Knowledge Into an Evolutionary Subspace Approach for Robust Speech Recognition

MLP-based phone boundary refining for a TTS database

A novel visualization tool for manual annotation when building large speech corpora

On temporal aspects of turn taking in conversational dialogues

Statistical modeling of phonological rules through linguistic hierarchies

A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese

A corpus-based speech synthesis system with emotion

A New Korean Corpus-Based Text-to-Speech System

Representational issues in annotation: Using the Australian map task corpus to relate prosody and discourse structure

Automatic ToBI prediction and alignment to speed manual labeling of prosody

AHUMADA: A large speech corpus in Spanish for speaker characterization and identification

Automatic ToBI prediction and alignment to speed manual labeling of prosody

The mu + system for corpus based speech research