Abstract
The essential steps of our study are to quantify and classify the differences between real and fake speech signals. In this scope, the main aim is to use the salient feature learning ability of deep learning in our study. With the use of ensemble classification pipeline, the interpretable logical rules were used for generalized reasoning with the class activation maps to discriminate the different speech classes as correctly. Fake audio samples were generated by using Deep Convolutional Generative Adversarial Neural Network. Our experiments were conducted on three different language dataset such as Turkish, English languages and Bilingual. As a result of higher classification and recognition accuracy with the use of classification pipeline as compiled into a majority voting-based ensemble classifier, the experimental results were obtained for each individual language performance approximately as 90% for training and as 80.33% for testing stages for pipeline, and it reached as 73% for majority voting results considered together with the appropriate test cases as well. To extract semantically rich rules, an interpretable logical rules infrastructure was used to infer the correct fake speech from class activations of deep learning’s generative model. Discussion and conclusion based on scientific findings are included in our study.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.