Abstract
This work demonstrates the usefulness of multiple frame size and rate (MFSR) analysis for speaker recognition under limited data condition. Present day speaker recognition systems assume the availability of sufficient data for modelling and testing. Owing to this, speech signals are analysed with fixed frame size and rate, which may be termed as single frame size and rate (SFSR) analysis. In the limited data condition available training and testing data is small. If we use SFSR analysis, then it may not provide sufficient feature vectors to train and test the speaker. Further, insufficient feature vectors lead to poor speaker modelling during training and may not yield reliable decision during testing. As part of analysis, we demonstrate the use of multiple frame size (MFS), multiple frame rate (MFR) and MFSR analysis techniques for speaker recognition under limited data condition. These techniques are specifically useful to mitigate the sparseness of limited feature vectors during training and testing. These techniques produce relatively more number of feature vectors. This helps in better modelling and testing under limited data conditions. The experimental results show that use of MFS, MFR and MFSR analysis improves the performance significantly compared to SFSR analysis. The MFSR analysis even outperforms the Gaussian mixture model-universal background model (GMM-UBM) performance, the most widely used modelling technique.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have