Abstract
Automatic speaker verification systems are found to be vulnerable to spoof attacks such as voice conversion, text-to-speech, and replayed speech. As the security of biometric systems is vital, many countermeasures have been developed for spoofed speech detection. To satisfy the recent developments on speech synthesis, publicly available datasets became more and more challenging (e.g., ASVspoof 2019 and 2021 datasets). A variety of replay attack configurations were also considered in those datasets, as they do not require expertise, hence easily performed. This work utilizes regional energy features, which are experimentally proven to be more effective than the traditional frame-based energy features. The proposed energy features are independent from the utterance length and are extracted over nonoverlapping time-frequency regions of the magnitude spectrum. Different configurations are considered in the experiments to verify the regional energy features’ contribution to the performance. First, light convolutional neural network – long short-term memory (LCNN – LSTM) model with linear frequency cepstral coefficients is used to determine the optimal number of regional energy features. Then, SE-Res2Net model with log power spectrogram features is used, which achieved comparable results to the state-of-the-art for ASVspoof 2019 logical access condition. Physical access condition from ASVspoof 2019 dataset, logical access and deep fake conditions from ASVspoof 2021 dataset are also used in the experiments. The regional energy features achieved improvements for all conditions with almost no additional computational or memory loads (less than 1% increase in the model size for SE-Res2Net). The main advantages of the regional energy features can be summarized as i) capturing nonspeech segments, ii) extracting band-limited information. Both aspects are found to be discriminative for spoofed speech detection.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.