Abstract

In this paper, the task of identifying the speaker using limited training and testing data is addressed. Speaker identification system is viewed as four stages namely, analysis, feature extraction, modelling and testing. The speaker identification performance depends on the techniques employed in these stages. As demonstrated by different experiments, in case of limited training and testing data condition, owing to less data, existing techniques in each stage will not provide good performance. This work demonstrates the following: multiple frame size and rate (MFSR) analysis provides improvement in the analysis stage, combination of mel frequency cepstral coefficients (MFCC), its temporal derivatives (Δ, ΔΔ), linear prediction residual (LPR) and linear prediction residual phase (LPRP) features provides improvement in the feature extraction stage and combination of learning vector quantization (LVQ) and gaussian mixture model — universal background model (GMM-UBM) provides improvement in the modelling stage. The performance is further improved by integrating the proposed techniques at the respective stages and combining the evidences from them at the testing stage. To achieve this, we propose strength voting (SV), weighted borda count (WBC) and supporting systems (SS) as combining methods at the abstract, rank and measurement levels, respectively. Finally, the proposed hierarchical combination (HC) method integrating these three methods provides significant improvement in the performance. Based on these explorations, thiswork proposes a scheme for speaker identification under limited training and testing data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call