Abstract

In this article, we investigate a specific long-term speech spectrum with respect to its use for speaker recognition. The long-term effect was satisfied by averaging short-term autocorrelation coefficients over the whole utterance. The long-term spectrum was calculated by means of second-order linear prediction using the average autocorrelation coefficients. First, speaker discriminability of 32 individual parameters was evaluated by combining spectral energy and spectral slope in eight different frequency bands covering the range 0−4 kHz (seven narrow nonoverlapping subbands and one band spanning over the full range). Then, four subbands with the most discriminative capability were selected for speaker recognition. These subbands involve the frequencies of 0−1.2 kHz in total. In the main experiments, text-independent speaker recognition based on relative Euclidean distance was performed in each single subband as well as in multiple 2 to 4 subbands applying two types of speech data, complete continuous speech and voiced part of the same speech. The voiced speech seems to be generally more effective for speaker recognition using the long-term speech spectrum. The best recognition rates, i.e. 91.7% on complete speech and 100% on voiced speech, were achieved in optimal paired subbands. The long-term speech spectrum can complement the traditional voice features.

Highlights

  • One of the issues in speech signal processing as well as in biometric data mining is the investigation “How is the person’s individuality reflected in voice?” There is no standard set of speech signal features commonly adopted for speaker recognition

  • The presented experiments were aimed at comparing differences in speaker specific long-term spectra and their utilization for speaker recognition

  • The results of the research may be generalized to a new finding that one efficient parameter derived from a suitable subband of smoothed long-term spectrum is sufficient to successfully discriminate against speakers

Read more

Summary

Introduction

One of the issues in speech signal processing as well as in biometric data mining is the investigation “How is the person’s individuality reflected in voice?” There is no standard set of speech signal features commonly adopted for speaker recognition. The long-term spectrum provides information on spectral energy distribution of a speech signal during a relatively long utterance. Some statistically significant differences have been found among them, researchers concluded that all spectra are similar enough and they suggested a “universal” average spectrum of speech across all investigated languages. This normative spectrum can be used for many clinical objectives such as prescription and evaluation of hearing aid. The aim of this article is to investigate the discrimination power of the long-term spectrum for speaker recognition in the case where the spectrum is estimated using a low-order linear prediction.

Estimation of the Long-Term Spectrum
Analysed Parameters
Speaker Recognition
Findings
Conclusion and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call