Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis.

Beiming Cao,Alan Wisler,Jun Wang

doi:10.3390/s22166056

Beiming Cao, Alan Wisler + Show 1 more

Open Access

https://doi.org/10.3390/s22166056

Copy DOI

Abstract

Silent speech interfaces (SSIs) convert non-audio bio-signals, such as articulatory movement, to speech. This technology has the potential to recover the speech ability of individuals who have lost their voice but can still articulate (e.g., laryngectomees). Articulation-to-speech (ATS) synthesis is an algorithm design of SSI that has the advantages of easy-implementation and low-latency, and therefore is becoming more popular. Current ATS studies focus on speaker-dependent (SD) models to avoid large variations of articulatory patterns and acoustic features across speakers. However, these designs are limited by the small data size from individual speakers. Speaker adaptation designs that include multiple speakers’ data have the potential to address the issue of limited data size from single speakers; however, few prior studies have investigated their performance in ATS. In this paper, we investigated speaker adaptation on both the input articulation and the output acoustic signals (with or without direct inclusion of data from test speakers) using the publicly available electromagnetic articulatory (EMA) dataset. We used Procrustes matching and voice conversion for articulation and voice adaptation, respectively. The performance of the ATS models was measured objectively by the mel-cepstral distortions (MCDs). The synthetic speech samples were generated and are provided in the supplementary material. The results demonstrated the improvement brought by both Procrustes matching and voice conversion on speaker-independent ATS. With the direct inclusion of target speaker data in the training process, the speaker-adaptive ATS achieved a comparable performance to speaker-dependent ATS. To our knowledge, this is the first study that has demonstrated that speaker-adaptive ATS can achieve a non-statistically different performance to speaker-dependent ATS.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Aug 13, 2022
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis.

Abstract

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

Speaker adaptation using maximum likelihood model interpolation
Zuoying Wang ... Feng Liu
-
Zuoying Wang, et. al. Zuoying Wang ... Feng Liu
01 Jan 1998
01 Jan 1998

Speaker normalization and adaptation based on linear transformation
J Ishii ... M Tonomura
-
J Ishii, et. al.J Ishii ... M Tonomura
21 Apr 1997
21 Apr 1997

A Speaker-Adaptive HMM-based Vietnamese Text-to-Speech System
Duy Khanh Ninh
-
Duy Khanh NinhDuy Khanh Ninh
01 Oct 2019
01 Oct 2019

Linear Networks Based Speaker Adaptation for Speech Synthesis
Zhiying Huang ... Heng Lu
-
Zhiying Huang, et. al.Zhiying Huang ... Heng Lu
01 Apr 2018
01 Apr 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis.

Abstract

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)