Abstract

The main goal of the present thesis is the exploitation of wavelets for the optimization of speaker and speech recognition systems performance. In this context, four new speech parameterization methods are introduced: (1) The first method adapts the frequency resolution of wavelet packet transform to the critical bandwidth of auditory filters incorporating the recent advances for their estimation. (2) The second method introduces a generalization of wavelet packet transform, named overlapping wavelet packet transform, which emphasizes those frequency sub-bands that critical bandwidth changes from a finer to a coarser value. (3) The third method evaluates the contribution of each one of eight non-overlapping frequency sub-bands, that the Nyquist interval is divided, to the speaker recognition task and a wavelet packet transform is constructed which adapts its frequency resolution according to the performance of each sub-band. (4) The fourth method introduces a new technique for seeking and selecting the best basis among all wavelet packet transforms available in the speaker recognition task taking as criterion the EER. The aforementioned four speech signal parameterizations were evaluated on the speaker verification system WCL-1 of Wire Communications Laboratory, University of Patras, utilizing the speaker recognition corpora POLYCOST and NIST and their superiority was proven over previous wavelet-based parameterizations as well as the widely used Mel Frequency Cepstral Coefficients (MFCC). Among the four proposed methods, it was proven that the second parameterization technique exhibited the best performance. Furthermore, the most important wavelet properties are thoroughly analyzed, the optimal is selected for the representation of the speech signal and this choice is experimentally verified. Finally, the first two parameterization methods were further modified and extended appropriately for application on the speech recognition task where their superiority was proven over traditionally and widely used speech parameterization techniques based on Fourier transform. The main conclusion that resulted in the present doctoral thesis is that wavelets and specifically wavelet packet transforms can be used successfully for the tasks of speaker and speech recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call