Abstract

Although acoustical methods have been widely used in the nasality literature, a direct link between the acoustical measurements and velum movement during speech production is yet to be established. In this study, we propose a model through which the vertical movements of the velum are inferred from an acoustic feature set. An X-ray Microbeam data set collected at University of Tokyo are used for the modeling. The data recorded the vertical movements of the velum of 11 American English speakers saying both isolated words and sentences. Velum positions are recorded from tracing a metal pallet placed on top of the velum. 40 MFCC (Mel-frequency Cepstral Coefficient) features are extracted from the accompanying acoustic signal at each time frame. MFCCs of ten frames before the current frame, together with the current frame, consist of the feature vector for predicting the velum movement of the current frame. Elasticnet regression is used to reduce the dimensionality of the feature vector. In general, MFCCs from higher frequencies are penalized during model selection. The selected features are then fitted to a stepwise logistic model. For each individual speaker, the inferred velum movements in the validation set are a good fit to the actual observation as judged by the high accuracy in identifying locations of peaks and valleys and small deviance from the response. However, there exists large inter-speaker variation in terms of both movement pattern and model performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call