Abstract

We present a method of using generalized additive mixed models (GAMMs) to analyze midsagittal vocal tract data obtained from real-time magnetic resonance imaging (rt-MRI) video of speech production. Applied to rt-MRI data, GAMMs allow for observation of factor effects on vocal tract shape throughout two key dimensions: time (vocal tract change over the temporal course of a speech segment) and space (location of change within the vocal tract). Examples of this method are provided for rt-MRI data collected at a temporal resolution of 20 ms and a spatial resolution of 1.41 mm, for 36 native speakers of German. The rt-MRI data were quantified as 28-point semi-polar-grid aperture functions. Three test cases are provided as a way of observing vocal tract differences between: (1) /aː/ and /iː/, (2) /aː/ and /aɪ/, and (3) accentuated and unstressed /aː/. The results for each GAMM are independently validated using functional linear mixed models (FLMMs) constructed from data obtained at 20% and 80% of the vowel interval. In each case, the two methods yield similar results. In light of the method similarities, we propose that GAMMs are a robust, powerful, and interpretable method of simultaneously analyzing both temporal and spatial effects in rt-MRI video of speech.

Highlights

  • One of the primary challenges facing speech articulation researchers is obtaining, quantifying, and interpreting data that capture both the spatial and temporal complexity of speech production

  • In each of the three test cases observed in this study, the context of interest was found to condition changes in real-time magnetic resonance imaging (rt-MRI) vocal tract aperture functions associated with German vowel productions

  • These changes were observed in the generalized additive mixed models and cross-verified at both the beginning of the vowel (20% of the vowel interval) and the end of the vowel (80% of the vowel interval), using independently constructed functional linear mixed models created with data from these time points

Read more

Summary

Introduction

One of the primary challenges facing speech articulation researchers is obtaining, quantifying, and interpreting data that capture both the spatial and temporal complexity of speech production. We have chosen to quantify the vocal tract using semi-polar grid functions that represent the aperture (i.e., distance, diameter) of the vocal tract within the midsagittal plane This particular quantification method was chosen for two reasons, both of which are important for maintaining interpretability in the specific use of GAMMs that we propose in this paper. Art. 2, page 3 of 26 using a relatively large number of grid lines (in our case, 28), the resulting function is a gradient, fine-grained spatial representation of the vocal tract that can be modeled as a continuous variable In this way, we can subject the dynamic evolution of vocal tract aperture over time to statistical modeling

Generalized additive mixed models and functional linear mixed models
Normalization procedures
GAMM construction
Monophthongs
Diphthongs
Stress
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call