Abstract
Voice Assistant (VA) now-a-days plays a very important role for the smart home applications. However, the VA along with ease also brings security issue too, such as possibility of being attacked by replay, hidden voice commands, etc. This paper presents replay Spoof Speech Detection (SSD) system for VA using Energy Separation Algorithm (ESA)-based features to capture Instantaneous Amplitude and Frequency Cepstral Coefficients (i.e., ESA-IACC and ESA-IFCC), and Gaussian Mixture Model (GMM) as a pattern classifier. Teager Energy Operator (TEO) has the characteristics to suppress the noise and hence, it is robust to noise sensitivity. For noisy acoustic environment, the ESA-based features that employ TEO perform well compared to the clean environment. We performed the experiments on the ReMASC database, which contains four different acoustic environments. Proposed features performed better in clean and noisy environments. In addition, to obtain possible complementary information, we performed score-level fusion of ESA-IACC and ESA-IFCC that resulted in low Equal Error Rate (EER) for different environments. Furthermore, we compared our proposed feature sets with Constant-Q Cepstral Coefficients (CQCC), and Linear Frequency Cepstral Coefficients (LFCC) resulting in an relative improvement of approximately 21.88 % for clean environments and 66.34 % for noisy environments (in EER), respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.