Mimicry Voice Detection using Convolutional Neural Networks

Medikonda Neelima,I Santiprabha

doi:10.1109/icosec49089.2020.9215407

Abstract

The primary purpose of the automatic speaker verification (ASV) system is for user authentication basing on speech samples. There are four spoofing attacks on the ASV system. They are Replay attack, Speech synthesis, Voice conversion, and impersonation attacks. Impersonation means mimicry attack. There are different databases available to detect and developa countermeasure forspoofing attacks. The available databases are concentrated on three spoofing attacks only. For impersonation attacks, there is no standard database available. The present work concentrated on collecting some samples for detecting mimicry attacks. For this purpose, two celebrity speakers are selected, and the direct speech samples from the speakers are collected as genuine samples. And also, the same speakers imitated samples are collected from the professional imitators, and these samples are taken as spoofed samples. The phantom highlights of the discourse tests are created from Mel Frequency Cepstral Coefficients (MFCC) include extraction procedure. For detecting the corresponding sample as a genuine or spoof sample, a classifier Convolutional Neural Network (CNN) is used. The evaluation of the system is carried out by taking Equal Error Rate (EER), and Receiver Operating Characteristics (ROC). With the collected genuine and spoof samples for mimicry spoofing attack, the EER for development data is obtained as 0.207, and for test data it is 0.3585.

Full Text