Abstract

The goal of this paper is to explore the effects of changes in velar coupling area and oral cavity configuration on the poles and zeros introduced in the nasalized vowel and nasal consonant spectra due to the sphenoidal and maxillary sinuses. MRI data for the vocal tract and nasal tract of one speaker was used to simulate the spectra of the nasalized vowels , and nasal consonants with different coupling areas. It is shown that during nasalized vowels, the frequencies of both poles and zeros due to the sinuses change with a change in the velar coupling area or the vowel. It is also shown that during nasal consonants, the zero frequencies are constant, and the pole frequencies are more stable as compared to nasalized vowels. This study, therefore, corroborates the use of nasal consonant spectra for speaker recognition and raises doubts on the potential benefits of using nasalization during vowels for that purpose. Index Terms: speaker recognition, nasal, sinus, MRI. The nasal cavity is probably the most complicated structure involved in the production of speech. Unlike the oral cavity, the nasal cavity is divided into two parallel passages which end with the two nostrils. The nasal cavity also has several paranasal cavities called sinuses. Humans have four kinds of sinuses: Maxillary Sinus (MS), Frontal Sinus (FS), Sphenoidal Sinus (SS) and Ethomoidal Sinus (ES). These sinuses are connected to the main nasal passages through small openings called ostia. Coupling between the nasal tract and the vocal tract (oral cavity and pharyngeal cavity) is controlled by a movable fold called the velum. It has been shown that the asymmetry between the two nasal passages can introduce extra poles and zeros in the acoustic spectrum [1]. It has also been shown that the maxillary sinuses account for the lowest pole-zero pair seen in the acoustic spectrum (especially for low vowels) when nasalization is introduced [2, 3], and they are also very important in making speech sound nasal [4]. Despite several studies, the exact dynamics of the poles and zeros due to the sinuses are unclear. In this study, MRI data for the vocal tract and nasal tract of one speaker recorded by Story et al [5, 6] was used to simulate the spectral effects of SS and MS (since these were the only two sinuses for which data was recorded). This study is focused towards understanding the movement of the poles and zeros due to the sinuses with a change in the velar coupling area and the oral cavity configuration. Four vowels ( ) and two nasal consonants ( ) were considered in this study. Analysis of MRI data shows that not only the frequencies of the poles, but also the frequencies of the zeros due to sinuses during the nasalized vowel regions change with a change in the velar coupling area and a change in the vowel. The frequencies of the zeros due to the sinuses, however, stay at the same location during nasal consonant regions. Several researchers in the past have shown the effectiveness of the nasal consonantal regions for speaker recognition. The power spectrum during the nasal consonant regions was used in [7] for the purposes of speaker recognition. Features extracted from nasal consonant spectra were also used in [8] for speaker recognition. In another paper [9], coarticulation between the nasal and the following vowel was used as a cue for speaker recognition. The authors showed that using their coarticulation measure worked better than using the nasal spectrum alone. Other studies on the relative speaker discriminating properties of phonemes [10, 11, 12, 13] have shown that nasals and vowels perform the best. Although several researchers have shown that nasal consonant regions give reliable cues for speaker recognition, no one has used nasality during the vowel regions as a cue. In light of the analysis in this paper, a question arises: Does nasalization during vowels provide a good cue for speaker recognition?

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.