Pathological voice detection based on gammatone short time spectral self-similarity

Denghuang Zhao,Xincheng Zhu,Xiaojun Zhang,Changwei Zhou,Zhi Tao

doi:10.7507/1001-5515.202107037

Abstract

The acoustic detection method based on machine learning and signal processing is an important method of pathological voice detection and the extraction of voice features is one of the most important. Currently, the features widely used have disadvantage of dependence on the fundamental frequency extraction, being easily affected by noise and high computational complexity. In view of these shortcomings, a new method of pathological voice detection based on multi-band analysis and chaotic analysis is proposed. The gammatone filter bank was used to simulate the human ear auditory characteristics to analyze different frequency bands and obtain the signals in different frequency bands. According to the characteristics that turbulence noise caused by chaos in voice will worsen the spectrum convergence, we applied short time Fourier transform to each frequency band of the voice signal, then the feature gammatone short time spectral self-similarity (GSTS) was extracted, and the chaos degree of each band signal was analyzed to distinguish normal and pathological voice. The experimental results showed that combined with traditional machine learning methods, GSTS reached the accuracy of 99.50% in the pathological voice database of Massachusetts Eye and Ear Infirmary (MEEI) and had an improvement of 3.46% compared with the best existing features. Also, the time of the extraction of GSTS was far less than that of traditional nonlinear features. These results show that GSTS has higher extraction efficiency and better recognition effect than the existing features.

Full Text