Investigation and Evaluation of Glottal Flow Waveform for Voice Pathology Detection

Yuanbo Wu,Xiaojun Zhang,Ziqi Fan,Zhi Tao,Changwei Zhou,Di Wu

doi:10.1109/access.2020.3046767

Abstract

Automatic voice pathology detection can provide objective estimation and prevention in the early stages of voice diseases. Glottal flow waveform directly reflects the state of glottal excitation. Extracting acoustic features from glottal source signals may contribute to the detection of pathological voice. To improve the performance of voice pathology detection, this article investigates the contribution of the glottal flow waveform for pathological voice detection by evaluating the classification result using features extracted from raw speech utterances and corresponding glottal flow waveforms. The individual feature sets used are extracted from raw or glottal voice utterances with identical parameter settings, which are openSMILE acoustic features, audio features computed by Moving Picture Experts Group-7 standard and classical glottal source features. In addition, a feature selection method in terms of the wrapper approach is used to combine the single features ranked by using the Fisher discrimination ratio. Voice pathology detection experiments were carried out using Random Forest. The best accuracies of 88.52% for the Saarbrucken Voice database and 100.00% for the Massachusetts Eye and Ear Infirmary database are achieved using the combined feature set extracted from the glottal source signal, with improvement of 0.44-3.13% in the accuracies obtained by using raw speech utterances. Compared to state-of-the-art methods, the proposed method achieves the highest accuracy for the Massachusetts Eye and Ear Infirmary database and an increase of 2.75-17.16% in detection accuracy compared to other conventional pipeline systems for the Saarbrucken Voice databse. The experimental results demonstrate that using glottal flow waveform as source signal can improve the performance of pathological voice detection.

Highlights

IntroductionSpeech is a type of signal produced by the human vocal organs (such as the lungs, vocal folds, nasal cavity and lips), and represents a certain practical significance, especially for social communication
Speech is a type of signal produced by the human vocal organs, and represents a certain practical significance, especially for social communication
It can be found that for raw speech utterances the classification results obtained for the Moving Picture Experts Group-7 (MPEG-7) feature set (MPEG-7-R) are higher compared to the openSMILE feature set for both databases

Summary

Introduction

Speech is a type of signal produced by the human vocal organs (such as the lungs, vocal folds, nasal cavity and lips), and represents a certain practical significance, especially for social communication.

Objectives

Methods

Results

Discussion

Conclusion