Abstract

AbstractMagnetic resonance imaging (MRI) of vocal tract shaping and surrounding articulators during speaking is a powerful tool in several application areas such as understanding language disorder, informing treatment plans in oro-pharyngeal cancers. However, this is a challenging task due to fundamental tradeoffs between spatio-temporal resolution, organ coverage, and signal-to-noise ratio. Current volumetric vocal tract MR methods are either restricted to image during sustained sounds, or does dynamic imaging at highly compromised spatio-temporal resolutions for slowly moving articulators. In this work, we propose a novel unsupervised deep variational manifold learning approach to recover a “pseudo-3D” dynamic speech dataset from sequential acquisition of multiple 2D slices during speaking. We demonstrate “pseudo-3D” (or time aligned multi-slice 2D) dynamic imaging at a high temporal resolution of 18 ms capable of resolving vocal tract motion for arbitrary speech tasks. This approach jointly learns low-dimensional latent vectors corresponding to the image time frames and parameters of a 3D convolutional neural network based generator that generates volumes of the deforming vocal tract by minimizing a cost function which enforce: a) temporal smoothness on the latent vectors; b) \(l_1\) norm based regularization on generator weights; c) latent vectors of all the slices to have zero mean and unit variance Gaussian distribution; and d) data consistency with measured k-space v/s time data. We evaluate our proposed method using in-vivo vocal tract airway datasets from two normal volunteers producing repeated speech tasks, and compare it against state of the art 2D and 3D dynamic compressed sensing (CS) schemes in speech MRI. We finally demonstrate (for the first time) extraction of quantitative 3D vocal tract area functions from under-sampled 2D multi-slice datasets to characterize vocal tract shape changes in 3D during speech production. Code: https://github.com/rushdi-rusho/varMRI.KeywordsManifold learningDynamic 3D speech MRIUnsupervised learningVariational autoencoderAccelerated MRI

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call