Abstract

In this work has been investigated the possibility of using convolutional neural networks to detect synthesized speech. The Python programming language, the TensorFlow library in combination with the high-level Keras API and the ASVspoof 2019 audio database in flac format were used to create the software application. The voice signal of synthesized and natural speech was converted into mel-frequency spectrograms. The structure of a convolutional neural network with high indicators of recognition accuracy is proposed. The learning speed of neural networks on GPU and CPU is compared using the CUDA library. The influence of the batch size parameter on the accuracy of the neural network was investigated. The TensorBoard tool was used to monitor and profile the learning process of neural networks. Keywords: audio deepfake, mel-frequency sound spectrograms, convolutional neural networks, learning speed of neural networks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.