Synthesizing Audio from Tongue Motion During Speech Using Tagged MRI Via Transformer.

Xiaofeng Liu,Jonghye Woo,Fangxu Xing,Jerry L Prince,Georges El Fakhri,Maureen Stone

doi:10.1117/12.2653345

Abstract

Investigating the relationship between internal tissue point motion of the tongue and oropharyngeal muscle deformation measured from tagged MRI and intelligible speech can aid in advancing speech motor control theories and developing novel treatment methods for speech related-disorders. However, elucidating the relationship between these two sources of information is challenging, due in part to the disparity in data structure between spatiotemporal motion fields (i.e., 4D motion fields) and one-dimensional audio waveforms. In this work, we present an efficient encoder-decoder translation network for exploring the predictive information inherent in 4D motion fields via 2D spectrograms as a surrogate of the audio data. Specifically, our encoder is based on 3D convolutional spatial modeling and transformer-based temporal modeling. The extracted features are processed by an asymmetric 2D convolution decoder to generate spectrograms that correspond to 4D motion fields. Furthermore, we incorporate a generative adversarial training approach into our framework to further improve synthesis quality on our generated spectrograms. We experiment on 63 paired motion field sequences and speech waveforms, demonstrating that our framework enables the generation of clear audio waveforms from a sequence of motion fields. Thus, our framework has the potential to improve our understanding of the relationship between these two modalities and inform the development of treatments for speech disorders.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Synthesizing Audio from Tongue Motion During Speech Using Tagged MRI Via Transformer.

Abstract

Talk to us

Similar Papers

More From: Proceedings of SPIE--the International Society for Optical Engineering

Lead the way for us

Journal: Proceedings of SPIE--the International Society for Optical Engineering	Publication Date: Apr 3, 2023
Citations: 2

Similar Papers

A four-dimensional motion field atlas of the tongue from tagged and cine magnetic resonance imaging
Jonghye Woo ... Georges El Fakhri
-
Jonghye Woo, et. al.Jonghye Woo ... Georges El Fakhri
24 Feb 2017
24 Feb 2017

Activity of tongue muscles during respiration: it takes a village?
Alan J Sokoloff
Journal of Applied Physiology | VOL. 96
Alan J SokoloffAlan J Sokoloff
01 Feb 2004
Journal of Applied Physiology | VOL. 96

Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator.
Xiaofeng Liu ... Fangxu Xing
Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention | VOL. 13436
Xiaofeng Liu, et. al.Xiaofeng Liu ... Fangxu Xing
01 Jan 2021
01 Jan 2021

Predicting Speech Intelligibility Based on Spatial Tongue-Jaw Coupling in Persons With Amyotrophic Lateral Sclerosis: The Impact of Tongue Weakness and Jaw Adaptation.
Panying Rong ... Jordan R Green
Journal of Speech, Language, and Hearing Research | VOL. 62
Panying Rong, et. al.Panying Rong ... Jordan R Green
29 Aug 2019
Journal of Speech, Language, and Hearing Research | VOL. 62

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Synthesizing Audio from Tongue Motion During Speech Using Tagged MRI Via Transformer.

Abstract

Talk to us

Similar Papers

More From: Proceedings of SPIE--the International Society for Optical Engineering