Abstract

In this paper, we propose a novel pitch-synchronous auditory-based feature extraction method for robust automatic speech recognition (ASR). A pitch-synchronous zero-crossing peak-amplitude (PS-ZCPA)-based feature extraction method was proposed previously [1,2], and showed improved performance except while modulation enhancement was integrated together with Wiener filter (WF)-based noise reduction and auditory masking into it [3]. However, since zero-crossing is not an auditory event, we propose a new pitch-synchronous peak-amplitude (PS-PA)-based method to make a feature extractor of ASR more auditory-like. We also examine the effect of WF-based noise reduction, modulation enhancement, and auditory masking into the proposed PS-PA method using Aurora-2J database. The experimental results showed the superiority of the proposed method over the PS-ZCPA method, and eliminated the problem due to the reconstruction of zero-crossings from modulated envelope. The highest relative performance over MFCC was achieved as 67.33% using the PS-PA method together with WF-based noise reduction, modulation enhancement, and auditory masking.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call