BLTRCNN-Based 3-D Articulatory Movement Prediction: Learning Articulatory Synchronicity From Both Text and Audio Inputs

Lingyun Yu,Jun Yu,Qiang Ling

doi:10.1109/tmm.2018.2887027

Abstract

Predicting articulatory movements from audio or text has diverse applications, such as speech visualization. Various approaches have been proposed to solve the acoustic-articulatory mapping problem. However, their precision is not high enough with only acoustic features available. Recently, deep neural network (DNN) has brought tremendous success in various fields, like speech recognition and image processing. To increase the accuracy, we propose a new network architecture for articulatory movement prediction with both text and audio inputs, called a bottleneck long-term recurrent convolutional neural network (BLTRCNN). To the best of our knowledge, it is the first time to predict articulatory movements based on DNN by fusing text and audio inputs. Our BLTRCNN consists of two networks. The first is the bottleneck network, generating a compact bottleneck features of text information for each frame independently. The second, including convolutional neural network, long short-term memory and skip connection, is called the long-term recurrent convolutional neural network (LTRCNN). LTRCNN is used for articulatory movement prediction when bottleneck features, acoustic features, and text features are integrated as inputs together. Experiments show that the proposed BLTRCNN achieves the state-of-the-art root-mean-square error (RMSE) 0.528 mm and the correlation coefficient 0.961. Moreover, we also demonstrate how text information complements acoustic features in this prediction task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BLTRCNN-Based 3-D Articulatory Movement Prediction: Learning Articulatory Synchronicity From Both Text and Audio Inputs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Jul 1, 2019
Citations: 52

Similar Papers

Synthesizing 3D Acoustic-Articulatory Mapping Trajectories: Predicting Articulatory Movements by Long-Term Recurrent Convolutional Neural Network
Lingyun Yu ... Jun Yu
-
Lingyun Yu, et. al.Lingyun Yu ... Jun Yu
01 Dec 2018
01 Dec 2018

Bottleneck and Embedding Representation of Speech for DNN-based Language and Speaker Recognition
Alicia Lozano-Diez ... Joaquin Gonzalez-Rodriguez
-
Alicia Lozano-Diez, et. al.Alicia Lozano-Diez ... Joaquin Gonzalez-Rodriguez
21 Nov 2018
21 Nov 2018

Deep Neural Network Based 3D Articulatory Movement Prediction Using Both Text and Audio Inputs
Lingyun Yu ... Qiang Ling
-
Lingyun Yu, et. al.Lingyun Yu ... Qiang Ling
08 Dec 2018
08 Dec 2018

INTELLIGENT MODEL FOR CLASSIFYING HEMODYNAMIC PATTERNS OF BRAIN ACTIVATION TO IDENTIFY NEUROCOGNITIVE MECHANISMS OF SPATIAL-NUMERICAL ASSOCIATIONS
R G Asadullaev ... M A Sitnikova
Vestnik komp'iuternykh i informatsionnykh tekhnologii | VOL. -
R G Asadullaev, et. al.R G Asadullaev ... M A Sitnikova
01 Jan 2024
Vestnik komp'iuternykh i informatsionnykh tekhnologii | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BLTRCNN-Based 3-D Articulatory Movement Prediction: Learning Articulatory Synchronicity From Both Text and Audio Inputs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia