Relating Cepstral Peak Prominence to Cyclical Parameters of Vocal Fold Vibration from High-Speed Videoendoscopy Using Machine Learning: A Pilot Study

Peter S Popolo,Aaron M Johnson

doi:10.1016/j.jvoice.2020.01.026

Abstract

Smoothed cepstral peak prominence (CPPs) has been shown to be an effective indicator of breathiness (Hillenbrand and Houde, 1996). High-speed videoendoscopy (HSV) is frequently being used as a complement to stroboscopy especially when asymmetric or aperiodic vocal fold vibration is present in dysphonic voices. In an HSV image data set obtained with normal (nondisordered) voice subjects, we have observed that some degree of asymmetry is present in many of the vocal fold displacement curves extracted from the HSV exam videos; therefore, we have used this data set for a pilot study to investigate the relationship of CPPs to cyclical vocal fold vibration parameters, including left-right vocal fold (LVRF) phase asymmetry, in subjects with normal (nondisordered) voices. Twenty subjects with normal (nondisordered) voices produced sustained vowel phonations while undergoing a transoral HSV examination of the vocal folds with synchronized recording of the voice signal. Glottal area waveform (GAW) and cyclical parameters open quotient (OQ), closed quotient (CQ), speed quotient (SQ), and LVRF skew were extracted from the HSV exam videos, and CPPs measures were obtained from acoustic analysis of the audio recordings. Correlations among the cyclical parameters and CPPs values were investigated using machine learning with the Regression Learner application in the MATLAB© Statistics and Machine Learning Toolbox (version 9.5.0.944444, R2018b, August 28, 2018, (c) 1984-2018, The MathWorks, Inc., Natick, MA). Because the sample size of the data set used for this study was small, and because there possibly was multicollinearity among the predictor variables used, the only meaningful result that was obtained with the data set of 20 normal subjects in the four predictor variables was the constant model (ie, the best prediction of CPPs was just the average value of the 20 observations), when the model validation feature of the app was turned on to protect against overfitting. In order to fully investigate the usefulness of the Regression Learner App, however, the validation feature was turned off and 48 more model types were investigated. While these were not necessarily indicative of the best regression model for the current data set, the results obtained in this manner nevertheless demonstrated the utility of the automated approach for finding a regression model for a larger data set to be collected in the future. Further work is warranted to collect a data set from a larger sample size of disordered voice patients with breathy and/or rough voice. It is speculated that a correlation between CPPs and cyclical parameters of vocal fold vibration may be more evident with disordered voices, because there will be more asymmetry in LRVF displacement with an effect on the acoustic voice signal.

Full Text