Abstract

Forensic speech science is the application of acoustics to legal investigations and speaker comparison is the most common task carried out in forensic speech analysis. A voiceprint is speech evidence which can be crucial in case a witness may have only heard a voice. The aim of the proposed work is to accurately identify a criminal based on their voiceprint alone under a mismatch condition, which refers to the difference in the environment of the voiceprint (recorded in a noisy environment) and the suspects’ recorded speech (recorded in a noise-free environment). In the current work, the task of speaker identification is performed by a Gaussian Mixture Model (GMM)-based Speaker Identification (SID) system. The Australian Forensic Voice Comparison database containing speech for forensic applications is used from which data is corrupted with three different types of noise namely car, white, and babble noise. The data is corrupted at SNR levels between -5 dB and 5 dB in steps of 1 dB. Two speech enhancement techniques are used at the front-end of the SID system namely spectral subtraction and log minimum mean square error (log MMSE) technique. The GMM-based SID system compares the enhanced speech with the recorded clean speech of the suspects to identify the culprit. The feature chosen for training the system is Mel Frequency Cepstral Coefficients.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call