Abstract
This paper investigates the problem of speaker recognition in noisy conditions. A new approach called nonnegative tensor principal component analysis (NTPCA) with sparse constraint is proposed for speech feature extraction. We encode speech as a general higher-order tensor in order to extract discriminative features in spectrotemporal domain. Firstly, speech signals are represented by cochlear feature based on frequency selectivity characteristics at basilar membrane and inner hair cells; then, low-dimension sparse features are extracted by NTPCA for robust speaker modeling. The useful information of each subspace in the higher-order tensor can be preserved. Alternating projection algorithm is used to obtain a stable solution. Experimental results demonstrate that our method can increase the recognition accuracy specifically in noisy environments.
Highlights
Automatic speaker recognition has been developed into an important technology for various speech-based applications
We propose a new feature extraction method for robust speaker recognition based on auditory periphery model and tensor structure
The results show that auditory-based nonnegative tensor cepstral coefficients (ANTCCs) feature demonstrates good performance in the presence of four noises
Summary
Automatic speaker recognition has been developed into an important technology for various speech-based applications. Traditional recognition system usually comprises two processes: feature extraction and speaker modeling. Conventional speaker modeling methods such as Gaussian mixture models (GMMs) [1] achieve very high performance for speaker identification and verification tasks on highquality data when training and testing conditions are well controlled. In many practical applications, such systems generally cannot achieve satisfactory performance for a large variety of speech signals corrupted by adverse conditions such as environmental noise and channel distortions. Traditional GMM-based speaker recognition system, as we know, degrades significantly under adverse noisy conditions, which is not applicable to most real-world problems. How to capture robust and discriminative feature from acoustic data becomes important. Main efforts are focused on reducing the effect of noises and distortions
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: EURASIP Journal on Audio, Speech, and Music Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.