Speaker recognition is carried out in the space of the functional parameters of the area of the glottal cross-section, found by solving the inverse problem. This problem is solved in two stages: first, the signal obtained by inverse filtering is approximated using the vocal source model, and then the glottal area model parameters, which generate the calculated vocal source impulse, are computed. Speaker recognition is carried out on a database of Russian numerals from 0 to 9 separately for men (48 speakers) and women (37 speakers) at the segments of stressed vowels. Various methods of recognition are studied: the Gaussian mixture model (GMM), support vector machines (SVMs), discriminant analysis, naive Bayes classifier (NB), the method of classification trees (CTREE), and the Parzen window classifier. The best results were obtained using the method of SVMs and the Parzen method: the average total error of identification of men was 4.9% and 5.1%, and that of women--8.2% and 8.8%, respectively.
Read full abstract