Abstract

The nonlinear dynamic features can effectively describe the acoustic characteristics of normal and pathological voice. In this paper, the phase space reconstruction and convolution neural network are used to classify the normal and pathological voice. The phase space information of normal and pathological voice is reconstructed using delay time and embedding dimension, the one-dimensional signal is converted to a two-dimensional matrix, and the reconstructed trajectory graph sample of the signal is generated. The trajectory graph samples are used as the input of the VGG-like convolutional neural network, and the graphical features are extracted to achieve a classification of normal and pathological voice. In order to overcome the lack of clinical data, a data enhancement scheme is used. The experiment which classifies the normal and pathological voice is carried out on three pathological databases respectively, i.e. the Massachusetts eye and ear infirmary (MEEI) database, Saarbrücken voice database (SVD) database, and a clinical database collected by the authors. Five-fold cross validation is used and the average recognition rates on the three databases are 99.42%, 97.30% and 95.88% respectively. The average recognition rates are 96.04% and 92.27% for normal, vocal fold paralysis and vocal fold non-paralysis voice in MEEI database and SVD database. The experimental results show that the method has high classification recognition rate and good robustness, and has certain universal applicability for the recognition of the normal and pathological voice.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call