Synthetic Speech Classification using Bidirectional LSTM Networks

Aswin Sankesh G S,Thiruvengadam S Jayaraman,Madhan Nanchan Suresh,Vetrivel Chelian Thirumavalavan,Velmurugan Pg Sivabalan,Ganeshkumar V

doi:10.1109/gcat55367.2022.9971887

Abstract

With the invention of generative adversarial networks, synthetic man-made manipulation of speech has been exponentially increasing. The detection of synthetic or bonafide speech is a challenging problem due to the availability of plenty of algorithms generating synthetic speech signals, This becomes more challenging if we want to know the algorithm used to generate the synthetic speech. In this work, a set of synthetic speech signals generated by five different commonly known algorithms and an another set belonging to unknown algorithm class are used for classification. The objective is to classify the algorithms used to generate synthetic speech. Recurrent neural network based bidirectional long short term memory (B-LSTM) sequence to label classifier is employed for training and evaluation. Eight different feature sets are used for analyzing the performance of classification accuracy. It is observed that the feature set produced from dynamic time warping and spectral characteristics provide the best classification accuracy.

Full Text