Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion.

Ganesh Sivaraman,Mark Tiede,Carol Espy-Wilson,Hosung Nam,Vikramjit Mitra

doi:10.1121/1.5116130

Abstract

Speech inversion is a well-known ill-posed problem and addition of speaker differences typically makes it even harder. Normalizing the speaker differences is essential to effectively using multi-speaker articulatory data for training a speaker independent speech inversion system. This paper explores a vocal tract length normalization (VTLN) technique to transform the acoustic features of different speakers to a target speaker acoustic space such that speaker specific details are minimized. The speaker normalized features are then used to train a deep feed-forward neural network based speech inversion system. The acoustic features are parameterized as time-contextualized mel-frequency cepstral coefficients. The articulatory features are represented by six tract-variable (TV) trajectories, which are relatively speaker invariant compared to flesh point data. Experiments are performed with ten speakers from the University of Wisconsin X-ray microbeam database. Results show that the proposed speaker normalization approach provides an 8.15% relative improvement in correlation between actual and estimated TVs as compared to the system where speaker normalization was not performed. To determine the efficacy of the method across datasets, cross speaker evaluations were performed across speakers from the Multichannel Articulatory-TIMIT and EMA-IEEE datasets. Results prove that the VTLN approach provides improvement in performance even across datasets.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The Journal of The Acoustical Society of America	Publication Date: Jul 1, 2019
Citations: 20	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion.

Abstract

Talk to us

Similar Papers

More From: The Journal of The Acoustical Society of America

Lead the way for us

Similar Papers

Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization
Umit H Yapanel ... John H L Hansen
Eurasip Journal on Audio, Speech, and Music Processing | VOL. 2008
Umit H Yapanel, et. al.Umit H Yapanel ... John H L Hansen
01 Jan 2008
Eurasip Journal on Audio, Speech, and Music Processing | VOL. 2008

Towards an Intelligent Acoustic Front-End for Automatic Speech Recognition: Built-In Speaker Normalization (BISN)
U.H Yapanel ... J.H.L Hansen
-
U.H Yapanel, et. al.U.H Yapanel ... J.H.L Hansen
18 Mar 2005
18 Mar 2005

Speaker Adaptation With Limited Data Using Regression-Tree-Based Spectral Peak Alignment
Shizhen Wang ... Abeer Alwan
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15
Shizhen Wang, et. al.Shizhen Wang ... Abeer Alwan
01 Nov 2007
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15

Using VTLN matrices for rapid and computationally-efficient speaker adaptation with robustness to first-pass transcription errors
S P Rath ... A K Sarkar
-
S P Rath, et. al.S P Rath ... A K Sarkar
06 Sep 2009
06 Sep 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion.

Abstract

Talk to us

Similar Papers

More From: The Journal of The Acoustical Society of America