Speaker Recognition Based on Long-Term Acoustic Features With Analysis Sparse Representation

Ting Lin,Ye Zhang

doi:10.1109/access.2019.2925839

Ting Lin, Ye Zhang

Open Access

https://doi.org/10.1109/access.2019.2925839

Copy DOI

Abstract

The performance of a speaker recognition system depends highly on which acoustic features are used. Most speaker recognition systems use short-term acoustic features extracted from a single speech frame, and the most popular short-term acoustic features are the Mel-frequency cepstral coefficients (MFCCs). The short-term features are generally static features no dynamic information in the speech signal is included in either cepstral coefficients or an MFCCs frame. Using an analysis sparse representation model, in this paper, we introduce the long-term acoustic (LTA) feature for text-independent speaker recognition, which is a sparse presentation of the static features and dynamic information for the speaker’s speech. First, the speech signal is segmented into frames which are overlapping with each other, and then the MFCCs frame features can be extracted to construct some super MFCCs frames by stacking some following frames of the current frame to capture the dynamic information of the speech signal. The super MFCCs frames can be combined into a 2-D MFCCs features map (MFCCsmap). Finally, the speaker model can be built based on the analysis sparse model and the sparse representations of the MFCCsmap are used as the LTA features. A state-of-the-art deep neural network (DNN) is employed as a classifier for speaker recognition. The experimental results illustrate the effectiveness and robustness of the proposed system.

Highlights

Speaker recognition is the process of identifying a person based on the voice of the speaker [1]
We present the long-term acoustic (LTA) features including the static and dynamic information of the speech signal, which is obtained by using the analysis sparse representations of the MFCCsmap with the speaker model, and the LTA features are used as the input of the deep neural network (DNN) classifier
The MFCCsmap of the test speech is obtained in the same way as in the training phase, and the long-term acoustic features are generated by the speaker model, and the LTA features are utilized as the input for the trained DNN classifier to realize the speaker recognition

Summary

INTRODUCTION

Speaker recognition is the process of identifying a person based on the voice of the speaker [1]. On the basis of the analysis sparse model, the sparse representations of the super MFCCs frames could be used as long-term acoustic (LTA) features with static and dynamic information of the speech signal. We present the LTA features including the static and dynamic information of the speech signal, which is obtained by using the analysis sparse representations of the MFCCsmap with the speaker model, and the LTA features are used as the input of the DNN classifier. The MFCCsmap of the test speech is obtained in the same way as in the training phase, and the long-term acoustic features are generated by the speaker model, and the LTA features are utilized as the input for the trained DNN classifier to realize the speaker recognition

MFCCSMAP

SPEAKER MODEL AND LONG-TERM ACOUSTIC FEATURES

DNN CLASSIFIER

Findings

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2019
Citations: 39	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Speaker Recognition Based on Long-Term Acoustic Features With Analysis Sparse Representation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Advancements in Speaker Recognition: Exploring Mel Frequency Cepstral Coefficients (MFCC) for Enhanced Performance in Speaker Recognition
V Sai Nitin Varma ... Abdul Majeed K.K
International Journal for Research in Applied Science and Engineering Technology | VOL. 11
V Sai Nitin Varma, et. al.V Sai Nitin Varma ... Abdul Majeed K.K
31 Aug 2023
International Journal for Research in Applied Science and Engineering Technology | VOL. 11

Analysing the performance of speaker identification task using different short term and long term features
P Suba ... B Bharathi
-
P Suba, et. al.P Suba ... B Bharathi
01 May 2014
01 May 2014

MFCC AND CMN BASED SPEAKER RECOGNITION IN NOISY ENVIRONMENT
Debashish Dev Mishra ... Shikhar Kumar Sarma
International Journal of Electronics Signals and Systems | VOL. -
Debashish Dev Mishra, et. al.Debashish Dev Mishra ... Shikhar Kumar Sarma
01 Jul 2013
International Journal of Electronics Signals and Systems | VOL. -

Real-time prediction of upcoming respiratory events via machine learning using snoring sound signal.
Bochun Wang ... Wen Xu
Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine | VOL. 17
Bochun Wang, et. al.Bochun Wang ... Wen Xu
12 Apr 2021
Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speaker Recognition Based on Long-Term Acoustic Features With Analysis Sparse Representation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions