Abstract

Fake speech consists on voice recordings created even by artificial intelligence or signal processing techniques. Among the methods for generating false voice recordings are Deep Voice and Imitation. In Deep voice, the recordings sound slightly synthesized, whereas in Imitation, they sound natural. On the other hand, the task of detecting fake content is not trivial considering the large number of voice recordings that are transmitted over the Internet. In order to detect fake voice recordings obtained by Deep Voice and Imitation, we propose a solution based on a Convolutional Neural Network (CNN), using image augmentation and dropout. The proposed architecture was trained with 2092 histograms of both original and fake voice recordings and cross-validated with 864 histograms. 476 new histograms were used for external validation, and Precision (P) and Recall (R) were calculated. Detection of fake audios reached P=0.997,R=0.997 for Imitation-based recordings, and P=0.985,R=0.944 for Deep Voice-based recordings. The global accuracy was 0.985. According to the results, the proposed system is successful in detecting fake voice content.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.