Speaker Identification in Crowd Speech Audio using Convolutional Neural Networks

Husam Husam,Husam Ali Abdulmohsin

doi:10.54216/fpa.160208

Abstract

Crowd speaker identification is the most advanced technology in the field of audio identification and personal user experience which researchers have extensively focused on, but still, science hasn’t been able to achieve high results in crowed identification. This work aims to design and implement a novel crowd speech identification method that can identify speakers in a multi speaker environment, (two, three, four and five speakers). This work will be implemented through two phases. The training phase is the Convolutional Neural Network (CNN) training and testing phase. Through this phase, the training will be implemented on data generated via the Combinatorial Cartesian Product approach. This approach uses two primary processes, the Computation of the Cartesian product process and combinatorial selection process. The second phase is the prediction phase. The aim of this phase is to check the CNN trained in the first phase, through testing it on new crowed audios than the data that the CNN was trained on in the first phase, these new crowded audios exist in the Ghadeer-Speech-Crowd-Corpus (GSCC) dataset, which is a new database designed through this work. Compared to the state-of-the-art speaker identification in multi speaker environment approaches, the results are impressive, with a recognition rate of 99.5% for audio with three speakers, 98.5% for music with four speakers, and 96.4% for audio with five speakers.

Full Text