Abstract
With the invention of generative adversarial networks, synthetic man-made manipulation of speech has been exponentially increasing. The detection of synthetic or bonafide speech is a challenging problem due to the availability of plenty of algorithms generating synthetic speech signals, This becomes more challenging if we want to know the algorithm used to generate the synthetic speech. In this work, a set of synthetic speech signals generated by five different commonly known algorithms and an another set belonging to unknown algorithm class are used for classification. The objective is to classify the algorithms used to generate synthetic speech. Recurrent neural network based bidirectional long short term memory (B-LSTM) sequence to label classifier is employed for training and evaluation. Eight different feature sets are used for analyzing the performance of classification accuracy. It is observed that the feature set produced from dynamic time warping and spectral characteristics provide the best classification accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.