The electroglottogram (EGG) is a signal used for measuring the change of the relative contact area in the vocal cord during the throat production. In the recent years, the low cost and the non-invasive applications have been derived. Hence, the EGG has been applied in various science, engineering and medical fields such as in the basic voice science including the phonetics, the singing and the hearing as well as in the speech and the language therapy and the related clinical works including the voice production physiology, the swallowing and the psychology. However, the pathological classifications using the EGGs usually yield the poor performances. This is because the EGGs are required to decompose into the various components for extracting the features for performing the classifications. Nevertheless, the total numbers of the components decomposed by some time frequrncy representation such as the empirical mode decomposition (EMD) for different EGGs are different. Hence, the dimension of the feature vectors extracted from different EGGs is different. This introduces to the difficulty for building a machine learning model for performing the classification. This paper is to address this issue. This paper proposes a method for grouping the intrinsic mode functions (IMFs) and the residue obtained by applying the EMD to the EGGs for classifying between the healthy subjects and the pathological subjects. More precisely, this paper proposes a clustering based method to group the IMFs and the residue so that the total numbers of the grouped IMFs of different EGGs are the same. First, the IMFs and the residue of the EGGs are catergorized into a desired number of groups based on their correlation coefficients. Second, the IMFs or the residue in each group are summed together to obtain the grouped IMF. Third, the mean frequency and the first formant of each grouped IMF are computed. Finally, a random forest is employed for performing the classification. To our best knowledge, this joint EMD and clustering based method is firstly proposed to preform the pathological voice detection. The computer numerical simulations are conducted using the online availiable Saarbrücken voice database. Here, five cross validations have been performed. The mean accuracy, the mean specificity and the mean sensitivity among these five validations are 86.98, 79.92 and 91.57, respectively. The standard deviation of the accuracy, the specificity and the sensitivity among these five validations are ±2.00%, ±3.71% and ±2.13%, respectively. The simulation results show that our proposed method outperforms the common EGG or speech processing based methods. This paper proposes a clustering based method for grouping the IMFs and the residue for performing the pathological classifications via the EGGs. The grouping criterion is based on the correlation coefficients. It is found that our proposed method can achieve the highest classifications for the majority signal to noise ratios compared to the state of the arts methods. • This paper proposes a clustering based method for grouping the IMFs and the residue. • It applies to classify between the healthy and the pathological subjects. • The total numbers of the grouped IMFs of different EGGs are the same. • The mean frequency and the first formant of each grouped IMF are computed. • A random forest is employed for performing the classification.
Read full abstract