Automatic Classification of Synthetic Voices for Voice Banking Using Objective Measures

Agustin Alonso,Inma Hernaez,Víctor García,Jon Sanchez,Eva Navas

doi:10.3390/app12052473

Abstract

Speech is the most common way of communication among humans. People who cannot communicate through speech due to partial of total loss of the voice can benefit from Alternative and Augmentative Communication devices and Text to Speech technology. One problem of using these technologies is that the included synthetic voices might be impersonal and badly adapted to the user in terms of age, accent or even gender. In this context, the use of synthetic voices from voice banking systems is an attractive alternative. New voices can be obtained applying adaptation techniques using recordings from people with healthy voice (donors) or from the user himself/herself before losing his/her own voice. In this way, the goal is to offer a wide voice catalog to potential users. However, as there is no control over the recording or the adaptation processes, some method to control the final quality of the voice is needed. We present the work developed to automatically select the best synthetic voices using a set of objective measures and a subjective Mean Opinion Score evaluation. A prediction algorithm of the MOS has been build which correlates similarly to the most correlated individual measure.

Highlights

Speech is the most natural method that humans use to communicate with each other
We extend the initial work described in [29] by evaluating four objective measures: short time objective intelligibility (STOI), enhanced short time objective intelligibility (ESTOI), non-intrusive speech quality assessment (NISQA) and speech intelligibility in bits (SIIB)
We briefly describe the selected objective measures: two intrusive objective measures typically used in speech enhancement, STOI [42] and ESTOI [43]; one intrusive intelligibility measure based on information theory, SIIB [44]; and one measure based on NISQA that estimates the mean opinion score (MOS) of the naturalness of synthetic speech [45]

Summary

Introduction

Speech is the most natural method that humans use to communicate with each other. When, due to an accident or illness, one person loses the ability to speak, technology can provide solutions to mitigate the impact of his or her disability. Text-to-speech (TTS) systems are a fundamental component of the so-called alternative and augmentative communication (AAC) devices, providing a synthetic voice to speak aloud the text that has been introduced through some kind of input device, such as a keyboard or an eye-gaze-controlled device. Synthetic voice customization tries to keep those hints of personality, nonexistent in a generic or commercial synthetic voice. Studies such as [1] show our tendency to form an impression on the personality of other people from their voice (as happens with other features, such as the face, or the color of the skin). It is our belief that the use of personalized speech can help in reducing the social impact of using an electronic device for everyday communication

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic Classification of Synthetic Voices for Voice Banking Using Objective Measures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Journal: Applied Sciences	Publication Date: Feb 27, 2022
License type: CC BY 4.0

Similar Papers

Voice banking for people living with motor neurone disease: Views and expectations
Richard Cave ... Steven Bloch
International Journal of Language & Communication Disorders | VOL. 56
Richard Cave, et. al.Richard Cave ... Steven Bloch
22 Dec 2020
International Journal of Language & Communication Disorders | VOL. 56

Voice Banking to Support People Who Use Speech-Generating Devices: New Zealand Voice Donors' Perspectives
Michelle Westley ... H Timothy Bunnell
Perspectives of the ASHA Special Interest Groups | VOL. 4
Michelle Westley, et. al.Michelle Westley ... H Timothy Bunnell
15 Aug 2019
Perspectives of the ASHA Special Interest Groups | VOL. 4

Reimbursement for AAC Devices
Steven C White ... Mccarty Janet
The ASHA Leader | VOL. 16
Steven C White, et. al.Steven C White ... Mccarty Janet
01 Oct 2011
The ASHA Leader | VOL. 16

An expert system for use in the prescription of electronic augmentative and alternative communication devices
Stanley Napper ... Patricia Mcafee
Augmentative and Alternative Communication | VOL. 5
Stanley Napper, et. al.Stanley Napper ... Patricia Mcafee
01 Jan 1989
Augmentative and Alternative Communication | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Classification of Synthetic Voices for Voice Banking Using Objective Measures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences