Discriminative features based on modified log magnitude spectrum for playback speech detection

Jichen Yang,Bo Ren,Yunyun Ji,Longting Xu

doi:10.1186/s13636-020-00173-5

Abstract

In order to improve the performance of hand-crafted features to detect playback speech, two discriminative features, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients, are proposed for playback speech detection in this work. They rely on our findings that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can enhance the discriminative power between genuine speech and playback speech. Then constant-Q variance-based octave coefficients (constant-Q mean-based octave coefficients) can be obtained by combining variance-based modified log magnitude spectrum (mean-based modified log magnitude spectrum), octave segmentation, and discrete cosine transform. Finally, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients are evaluated on ASVspoof 2017 corpus version 2.0 and ASVspoof 2019 physical access, respectively. Experimental results show that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can produce discriminative features toward playback speech. Further results on the two databases show that constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients can perform better than some common features, such as mel frequency cepstral coefficients and constant-Q cepstral coefficients.

Highlights

Replay attacks present serious threat to automatic speaker verification (ASV) system
We found that Constant-Q statistics-plus-principal information coefficients (CQSPIC) performs better than constant-Q variance-based octave coefficients (CVOC) and constant-Q mean-based octave coefficients (CMOC), the reason is that CQSPIC is a combined feature, it has spectral principal information, subband information, and short-term spectral statistical information while our CVOC and CMOC only has spectral principal information
(3) Comparing Table 10 with Table 11, it can be seen that CMOC-A and CVOC-A perform the best on ASVspoof 2019 physical access development and evaluation set

Summary

Introduction

Replay attacks present serious threat to automatic speaker verification (ASV) system. Replay attacks can pose the threat to ASV system. This motivates our focus on playback speech detection. Since the ASVspoof 2017 challenge [1, 2], more and more researchers begin to focus on playback speech detection [3,4,5,6,7,8,9,10]. Similar to many speech signal processing systems, most of all playback speech detection systems usually consist of front-end feature and back-end classifier [11,12,13,14,15,16,17,18]. We mainly focus on how to extract discriminative feature for playback speech detection

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Apr 7, 2020
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

Discriminative features based on modified log magnitude spectrum for playback speech detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

An investigation of spectral feature partitioning for replay attacks detection
Zhi Hao Lim ... Wei Rao
-
Zhi Hao Lim, et. al.Zhi Hao Lim ... Wei Rao
01 Dec 2017
01 Dec 2017

Replay Spoof Attack Detection using Deep Neural Networks for Classification
Salahaldeen Duraibi ... Wasim Alhamdani
-
Salahaldeen Duraibi, et. al.Salahaldeen Duraibi ... Wasim Alhamdani
01 Dec 2020
01 Dec 2020

Detection of replay spoof speech using teager energy feature cues
Madhu R Kamble ... Hemant A Patil
Computer Speech & Language | VOL. 65
Madhu R Kamble, et. al.Madhu R Kamble ... Hemant A Patil
14 Aug 2020
Computer Speech & Language | VOL. 65

Constant Q Cepstral coefficients for classification of normal vs. Pathological infant cry
Hemant A. Patil ... Ankur T. Patil
-
Hemant A. Patil, et. al.Hemant A. Patil ... Ankur T. Patil
23 May 2022
23 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discriminative features based on modified log magnitude spectrum for playback speech detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing