Abstract
A two-stage classifier is used to improve the classification performance between normal and pathological voices. A primary classification between normal and pathological voices is achieved by the Gaussian mixture model (GMM) log-likelihood scores. For samples that do not meet the thresholds for normal or disordered voice in the GMM, the final decision is made by a higher-order statistics (HOS)-based parameter. The normalized skewness and kurtosis, and means of the normalized skewness and kurtosis were estimated using a sustained vowel /a/ from 53 normal and 173 pathological voices taken from the Disordered Voice Database. Mel-frequency cepstral coefficients (MFCC)-based GMM, the HOS methods, and a two-stage classifier based on the GMM-HOS were performed for each voice signal. A Mann–Whitney rank sum test was used to detect differences in the means of the HOS-based parameters. A fivefold cross-validation scheme was performed to test the classification method. When 16 Gaussian mixtures were used, the MFCC-based GMM algorithm is performed with 92.0% accuracy. When means of the normalized skewness and kurtosis were used, performances of 82.31 and 83.67% were obtained, respectively. The two-stage classifier with 16 Gaussian mixtures and the mean of the normalized kurtosis classified samples with a 96.96% accuracy were obtained. The proposed two-stage classifier is more accurate than the MFCC-based GMM and HOS methods alone and shows potential for the classification of voices in the clinic.
Highlights
A large amount of research has focused on the automatic detection of voice pathologies by means of acoustic analysis, parametric and non-parametric feature extraction, pattern recognition algorithms, and statistical methods [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
They described some methodological concerns to be considered when designing automatic systems for pathological voice detection. They recommended the use of a commercially well-known database, a crossvalidation strategy based on several partitions to obtain averaged classification performances with confidence intervals, a report of the means of a detection error trade-off (DET), and an investigation of the area under receiver operating characteristic (ROC) curves
Mel-frequency cepstral coefficients (MFCC)-based Gaussian mixture model (GMM) method The performance was assessed by averaging the results obtained from fivefold cross-validation scheme [10,12]
Summary
Sáenz-Lechón et al [5] presented an overview of previous classification schemes applied to the Massachusetts Eye & Ear Infirmary (MEEI) Voice Disorders Database [16]. They described some methodological concerns to be considered when designing automatic systems for pathological voice detection. They recommended the use of a commercially well-known database, a crossvalidation strategy based on several partitions to obtain averaged classification performances with confidence intervals, a report of the means of a detection error trade-off (DET), and an investigation of the area under receiver operating characteristic (ROC) curves
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.