Abstract

A new speech feature extraction technique called moving average multi directional local features (MA-MDLF) is presented in this paper. This method is based on linear regression (LR) and moving average (MA) in the time–frequency plane. Three-point LR is taken along time axis and frequency axis, and 3 points MA is taken along 45° and 135° in the time–frequency plane. The LR captures the voice onset\offset, formant contour, while the moving average captures the dynamics on time–frequency axes which can be seen as voiceprints. The MA-MDLF performance is compared to commonly used speech features in speaker recognition. The comparison is performed in a speaker recognition system (SRS) for three different conditions, namely clean speech, mobile speech, and cross channel. MA-MDLF has shown better performance than the baseline MFCC, RASTA-PLP and LPCC. In clean and mobile speech, MA-MDLF feature performs the best and also in the cross channel task MA-MDLF performed excellent. We also evaluated the MA-MDLF using three speech databases, namely KSU, LDC Babylon and TIMITdatabases, and found that MA-MDLF outperformed the other commonly used features with speech from all the three databases. The first and second databases are for Arabic speech while third is for English speech.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.