Abstract
Speech Synthesis (SS) and Voice Conversion (VC) methods provides a great risk for Automatic Speaker Verification (ASV) system. In this paper, we tried to find the difference between natural and spoofed speech signals using Teager Energy Operator-based Energy Separation Algorithm (TEO-ESA). Here, we exploit the contribution of Amplitude Envelope (AE) and Instantaneous Frequency (IF) in each narrowband filtered signals energy via ESA to capture possible changes in a temporal and spectral envelope of the synthetic speech signal generated by the machines as opposed to natural signals. Furthermore, IF was used for classification of natural vs. spoof speech with Gaussian Mixture Model (GMM) as a classifier. These findings may assist to distinguish these two speeches and provide an aid to alleviate possible impostor attacks in voice biometrics. The experiments are done on ASV Spoof 2015 Challenge database. We have compared proposed Energy Separation Algorithm-Instantaneous Frequency Cosine Coefficients (ESA-IFCC) with Mel Frequency Cepstral Coefficients (MFCC) features. On the development set, MFCC alone gave an Equal Error Rate (EER) of (6.98 %) and ESA-IFCC gave (5.43 %) with 13-D static features. With score-level fusion of MFCC and ESA-IFCC EER reduced to 3.45 % on static feature vector. The EER decreases further to 2.01 % and 1.89 % for Δ and ΔΔ features. On evaluation set, the overall average error rate for known and unknown attacks was 6.79 % for ESA-IFCC and was significantly better than the MFCC (9.15 %) and their score-level fused EER (7.16 %).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.