Abstract

Automatic Speaker Recognition (ASR) aims at finding out the speaker’s identity from their speech signal. Different from the traditional mel-bank filters that are used to obtain subbands, we transform the signals from the time domain into wavelet domain by Wavelet Packet Decomposition (WPD). Moreover, the WPD is designed such that the frequency scale is sub-partitioned equivalent to auditory scales that mimic the response of human cochlea. Cepstral coefficients are calculated for the resulting subbands. This is followed by classification by K nearest neighbor algorithm to get the speaker recognition rate (SRR). In this paper, three different WP decompositions (Optimal Wavelet Packet Tree, Equivalent Rectangular Bandwidth (ERB) scale based Signal Decomposition, and Bark scale based WP Decomposition) are compared and the results are tabulated. Experimental results show that ERB scale based Signal Decomposition gives better SRR than the other Wavelet Packet Trees, when FSDD database is tested.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call