Abstract

With the invention of generative adversarial networks, synthetic man-made manipulation of speech has been exponentially increasing. The detection of synthetic or bonafide speech is a challenging problem due to the availability of plenty of algorithms generating synthetic speech signals, This becomes more challenging if we want to know the algorithm used to generate the synthetic speech. In this work, a set of synthetic speech signals generated by five different commonly known algorithms and an another set belonging to unknown algorithm class are used for classification. The objective is to classify the algorithms used to generate synthetic speech. Recurrent neural network based bidirectional long short term memory (B-LSTM) sequence to label classifier is employed for training and evaluation. Eight different feature sets are used for analyzing the performance of classification accuracy. It is observed that the feature set produced from dynamic time warping and spectral characteristics provide the best classification accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call