Abstract

In order to improve the performance of hand-crafted features to detect playback speech, two discriminative features, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients, are proposed for playback speech detection in this work. They rely on our findings that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can enhance the discriminative power between genuine speech and playback speech. Then constant-Q variance-based octave coefficients (constant-Q mean-based octave coefficients) can be obtained by combining variance-based modified log magnitude spectrum (mean-based modified log magnitude spectrum), octave segmentation, and discrete cosine transform. Finally, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients are evaluated on ASVspoof 2017 corpus version 2.0 and ASVspoof 2019 physical access, respectively. Experimental results show that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can produce discriminative features toward playback speech. Further results on the two databases show that constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients can perform better than some common features, such as mel frequency cepstral coefficients and constant-Q cepstral coefficients.

Highlights

  • Replay attacks present serious threat to automatic speaker verification (ASV) system

  • We found that Constant-Q statistics-plus-principal information coefficients (CQSPIC) performs better than constant-Q variance-based octave coefficients (CVOC) and constant-Q mean-based octave coefficients (CMOC), the reason is that CQSPIC is a combined feature, it has spectral principal information, subband information, and short-term spectral statistical information while our CVOC and CMOC only has spectral principal information

  • (3) Comparing Table 10 with Table 11, it can be seen that CMOC-A and CVOC-A perform the best on ASVspoof 2019 physical access development and evaluation set

Read more

Summary

Introduction

Replay attacks present serious threat to automatic speaker verification (ASV) system. Replay attacks can pose the threat to ASV system. This motivates our focus on playback speech detection. Since the ASVspoof 2017 challenge [1, 2], more and more researchers begin to focus on playback speech detection [3,4,5,6,7,8,9,10]. Similar to many speech signal processing systems, most of all playback speech detection systems usually consist of front-end feature and back-end classifier [11,12,13,14,15,16,17,18]. We mainly focus on how to extract discriminative feature for playback speech detection

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.