Abstract
We propose a method for co-registrating speech articulatory/acoustic data from two modalities that provide complementary advantages. Electromagnetic Articulography (EMA) provides high temporal resolution (100 samples/second in WAVE system) and flesh-point tracking, while real-time Magnetic Resonance Imaging, rtMRI, (23 frames/second) offers a complete midsagittal view of the vocal tract, including articulated structures and the articulatory environment. Co-registration was achieved through iterative alignment in the acoustic and articulatory domains. Acoustic signals were aligned temporally using Dynamic Time Warping, while articulatory signals were aligned variously by minimization of mean total error between articulatometry data and estimated corresponding flesh points and by using mutual information derived from articulatory parameters for each sentence. We demonstrate our method on a subset of the TIMIT corpus elicited from a male and a female speaker of American English, and illustrate the benefits of co-registered multi-modal data in the study of liquid and fricative consonant production in rapid speech. [Supported by NIH and NSF grants.]
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.