Abstract

Previous works in synthetic speech detection have focused on features based on magnitude or phase spectrum. In this study, to extract useful discriminative information for synthetic speech detection, the authors propose a feature based on magnitude–phase spectrum (MPS), combining magnitude- and phase-spectra information. The proposed feature is termed as constant-Q magnitude–phase coefficient (CMPC), which is obtained by combining constant-Q transform (CQT), MPS, uniform resampling, and discrete cosine transform. The CQT used in this study is a long-term window transform, which can provide the basis for CMPC to capture important artefacts of synthetic speech. Such artefacts are obtained using a unit selection algorithm, which have difficulties when based on the short-term window transform. Uniform resampling aims to convert MPS from the octave domain into the linear domain. The discrete cosine transform is used when extracting principal components to remove correlations among the feature dimensions. The experimental results on AVspoof and ASVspoof 2015 corpora show that CMPC performs better than some commonly used features based on magnitude or phase spectrum alone. Their system based on CMPC outperforms many known systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call