Prosodically Rich Speech Synthesis Interface Using Limited Data of Celebrity Voice

Takashi Nose,Taiki Kamei

doi:10.4236/jcc.2016.416006

Abstract

To enhance the communication between human and robots at home in the future, speech synthesis interfaces are indispensable that can generate expressive speech. In addition, synthesizing celebrity voice is commercially important. For these issues, this paper proposes techniques for synthesizing natural-sounding speech that has a rich prosodic personality using a limited amount of data in a text-to-speech (TTS) system. As a target speaker, we chose a well-known prime minister of Japan, Shinzo Abe, who has a good prosodic personality in his speeches. To synthesize natural-sounding and prosodically rich speech, accurate phrasing, robust duration prediction, and rich intonation modeling are important. For these purpose, we propose pause position prediction based on conditional random fields (CRFs), phone-duration prediction using random forests, and mora-based emphasis context labeling. We examine the effectiveness of the above techniques through objective and subjective evaluations.

Highlights

In the near future, people will have their own personal robots that support their daily life by communicating each other
We propose pause position prediction based on conditional random fields (CRFs), phone-duration prediction using random forests, and mora-based emphasis context labeling
We focused on synthesizing the voice of the prime minister of Japan, Shinzo Abe, as the target speaker

Summary

Introduction

People will have their own personal robots that support their daily life by communicating each other. To achieve such robots, speech recognition and synthesis interfaces are indispensable to make the communication of human-machine close to that of human-human. Most of the studies focus on synthesizing reading-style speech of news articles where the speaking style is always stable without prosodically rich expressions such as emphasis and emotions. Prosody of speech generally represents accent, intonation, rhythm, power, and phrasing (pause insertion) and has a rich personality. As a step of speech synthesis to generate more human-like speech for various applications including humanoid robots, synthesizing speech with a rich prosodic personality is an essential issue

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Prosodically Rich Speech Synthesis Interface Using Limited Data of Celebrity Voice

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer and Communications

Lead the way for us

Journal: Journal of Computer and Communications	Publication Date: Jan 1, 2016
License type: CC BY 4.0

Similar Papers

Correlation between Subjective and Objective Evaluations of Steering Performance of Passenger Vehicles
Juan-Juan Chen ... Sheng-Chang Wang
Journal of Highway and Transportation Research and Development (English Edition) | VOL. 12
Juan-Juan Chen, et. al.Juan-Juan Chen ... Sheng-Chang Wang
01 Dec 2018
Journal of Highway and Transportation Research and Development (English Edition) | VOL. 12

Correlation between subjective and objective evaluation of peri‐implant soft tissue color
Gianluca Paniz ... Eriberto Bressan
Clinical Oral Implants Research | VOL. 25
Gianluca Paniz, et. al.Gianluca Paniz ... Eriberto Bressan
10 Jun 2013
Clinical Oral Implants Research | VOL. 25

HMM 기반 한국어 음성합성에서의 화자적응 방식 성능비교 및 지속시간 모델 개선
Hea-Min Lee ... Hyung-Soon Kim
-
Hea-Min Lee, et. al.Hea-Min Lee ... Hyung-Soon Kim
30 Sep 2012
30 Sep 2012

Non-Invasive Early Detection of Oral Cancers Using Fluorescence Visualization with Optical Instruments.
Takamichi Morikawa ... Masayuki Takano
Cancers | VOL. 12
Takamichi Morikawa, et. al.Takamichi Morikawa ... Masayuki Takano
27 Sep 2020
Cancers | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Prosodically Rich Speech Synthesis Interface Using Limited Data of Celebrity Voice

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer and Communications