Abstract

One extension of feature vector for automatic speaker recognition is considered in this paper. The starting feature vector consisted of 18 mel-frequency cepstral coefficients (MFCCs). Extension was done with two additional features derived from the spectrum of the speech signal. The main idea that generated this research is that it is possible to increase the efficiency of automatic speaker recognition by constructing a feature vector which tracks a real perceived spectrum in the observed speech. Additional features are based on the energy maximums in the appropriate frequency ranges of observed speech frames. In experiments, accuracy and equal error rate (EER) are compared in the case when feature vectors contain only 18 MFCCs and in cases when additional features are used. Recognition accuracy increased by around 3%. Values of EER show smaller differentiation but the results show that adding proposed additional features produced a lower decision threshold. These results indicate that tracking of real occurrences in the spectrum of the speech signal leads to more efficient automatic speaker recognizer. Determining features which track real occurrences in the speech spectrum will improve the procedure of automatic speaker recognition and enable avoiding complex models.

Highlights

  • Mel-frequency cepstral coefficients (MFCCs) are introduced as features that can track the spectral envelope of the speech signal

  • The purpose of the experiments in this paper is to examine if it is possible to achieve better performance of a speaker recognizer by using some additional features derived from energy in the signal

  • It follows that compactness of models was increased by adding additional features. These results prove that efficiency of an automatic speaker recognizer that uses MFCCs as features can be increased only by enhancing features which are used, by using additional features derived from the energy spectrum of speech

Read more

Summary

Introduction

Mel-frequency cepstral coefficients (MFCCs) are introduced as features that can track the spectral envelope of the speech signal. Speaker verification combining MFCCs with the Spectral Dimension (SD) features, 1. The determination of MFCCs in Equation (1) does not take into consideration the real perceived spectrum of signal [19], since the maximums of applied frequency selective filters in Equation (1) are not strictly positioned

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call