Abstract

Center for Corpus Development, National Institute for Japanese Language and Linguistics,10–2 Midoricho, Tachikawa, Tokyo, 190–0014 Japan(Received 28 May 2012, Accepted for publication 6 July 2012)Keywords: Magnetic resonance imaging, Cine-MRI, Voiced stop consonant, Voiceless stop consonant, Tongue movementPACS number: 43.70.Aj [doi:10.1250/ast.33.391]1. IntroductionThe present study describes the transient profiles ofarticulatory movement during the pronunciation of stopconsonants using fast MRI (magnetic resonance imaging)and image processing to analyze tongue movements. Therecently developed fast scanning techniques based on MRIhave provided opportunities to observe the vocal tract shapeof every moment during speech production [1], and thus haveallowed us to reproduce speech sounds by precise, physicalacoustic modeling of the articulatory organs [2]. Theconstructive approach using physical modeling is expectedto complement conventional, descriptive studies in acousticphonetics and to reveal causal relationships between articu-latory dynamics and actual speech sounds. In contrast to thelong history of MRI studies on vowel production, however,the kinematics of the oral cavity during consonantal articu-lation has not been well studied by MRI because of thetemporal instabilities in consonantal articulation, leading todifficulties in the synthesis of natural-sounding speech byphysical modeling.Here, we measured the transient properties of vocal tractshapes by MRI during the production of rapidly changingconsonants, i.e. stop consonants, and compared them invoiced and voiceless situations. For the natural-soundingsyntheses of voiced and voiceless stops, the physical acousticmodel must reflect not only the interval between plosivereleases and vocal fold vibration (i.e., voice-onset time; VOT)but also, perhaps, secondary kinematic changes in the vocaltract accompanied with changes in VOT. For example, the airpressure in the oral cavity during the closure period forvoiceless stop consonants is higher than that for voiced stops[3,4]. This pressure difference is expected to be accompaniedwith a volume difference in the pharyngeal cavity during thestop production [5], implying supplementary changes in someacoustical features. In the present paper, we measured thevocal tract by fast MRI while participants pronounced voicedand voiceless velar stops and, in particular, we analyzed themovement trajectories of the tongue surfaces.2. Materials and methods2.1. Subjects and speech taskFour adult Japanese speakers (three males and onefemale) repeatedly uttered two nonsense words, /agise/ and/akise/; the former includes a voiced stop consonant /g/ andthe latter has a voiceless stop consonant /k/. Note that all thespeakers were Kansai dialect speakers, and hence, they didnot show the vowel deletion phenomenon on the vowel /i/following the voiceless stop consonant.2.2. Cine-MRI data acquisitionCine-MRI data of the mid-sagittal plane were acquired bya synchronized sampling method with external trigger [1].The speakers lay supine in the MRI gantry, wearing head-phones to listen to a guide sound. The guide sound was atriplet-beat sequence consisting of one tone and two noisebursts. These sound bursts had durations of 100ms, and theonset-to-onset interval was 400ms. Speakers were asked topronounce each nonsense word in synchrony with the guidesound matching the first and third morae to the second andthird beats, respectively. Each speaker uttered each worda total of 128 times during scanning. Prior to the MRIacquisition, the speakers practiced uttering the words tominimize the variation between repeated utterances to ensuresufficient quality of the cine-MRI data.The MRI acquisition was carried out using a 3T MRIscanner (MAGNETOM Verio, Siemens) at the Brain ActivityImaging Center of ATR-Promotions. The scanning parameterswere as follows. Scan sequence: FLASH, frame rate: 100frames/s, repetition time (TR): 10.0ms, echo time: 1.62ms,flip angle (FA): 15 , field of view: 256 256mm, matrixsize: 256 256, slice thickness: 4mm, without averaging.2.3. Image processing methodThe velocity of the highest point of the tongue during theproduction of the voiced and voiceless stop consonants wasmeasured from the cine-MRI data. The tongue surface wasfirst extracted from each frame of the cine-MRI data byCanny’s edge detection method [6] and was tracked using afifth-order polynomial approximation as illustrated in Fig. 1.The vertical movement and velocity of the highest point of thetongue were then calculated from the surface trajectory. Thetrajectory of the highest point of the tongue was smoothed bya sixth-order moving average and resampled at a rate of1,000 frames/s by spline interpolation. The vertical velocity

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call