The replay attack is refereed as an unauthorized attempt to access the automatic speaker verification (ASV) system by using the pre-recorded speech samples of any target. The replay attack is performed by placing the pre-recorded speech sample of the target before the machine. Of late the replay attack is identified as the greatest threat to ASV system, mainly due to the availability of high quality recording and playback devices. In this work, excitation source feature referred as glottal mel frequency cepstral coefficient (GMFCC) and shifted constant Q cepstral coefficient (SCQCC) are proposed for detection of replay signals. The GMFCC is derived by applying conventional mel-cepstral technique to glottal flow derivative signal. The SCQCC is computed by using constant Q cepstral processing. The effectiveness of the proposed features are demonstrated by conducting experiments with ASVspoof 2017 version 2.0 database. The proposed GMFCC feature provides an equal error rate (EER) of 16.78%, that is 19.63% higher than the recently proposed residual mel frequency cepstral coefficient(RMFCC) feature. The conventional CQCC feature provides an EER of 12.32%. The proposed SCQCC feature provides an EER of 11.34%, shows a relative improvement of 7.94% over CQCC. Further, the CQCC in together with proposed GMFCC provides an EER of 8.82%. On the other hand, the proposed SCQCC+GMFCC system provides an EER of 8.60%. These results signify the usefulness of the proposed system to counter replay attacks.
Read full abstract