Supervised learning in voice type discrimination using neck-skin vibration signals: Preliminary results on single vowels

Zhengdong Lei,Nicole Y Li-Jessen,Luc Mongeau

doi:10.1121/1.4988844

Abstract

Discrimination between normal and pathological voice is a critical component in laryngeal pathology diagnosis and vocal rehabilitative treatment. In the present study, a portable miniature glottal notch accelerometer (GNA) device with supervised machine learning techniques was proposed to discriminate between three human voice types: normal, breathy, and pressed voice. Fourteen native American English speakers who were wearing a GNA device produced five different English single vowels in each of the three voice types. Acoustic features of the GNA signals were extracted using spectral analysis. Preliminary assessments of feature discrepancy among different voice types were made to present physical clues of discrimination. The linear discriminant analysis technique was applied to reduce the dimensionality of the raw-feature vector of the GNA signals. Maximization of between-class distance and minimization of within-class distance were synchronously achieved. The voice types were then classified using several supervised learning techniques, such as Linear Discriminant, Decision Tree, Support Vector Machine, and K-Nearest Neighbors. A classification accuracy of up to 91.0% was achieved. One mapping model from voice input to type output was eventually obtained based on the training set, so as to make predictions with new data in the future work.

Full Text