Abstract

The so-called ‘mismatch’ is a factor which experts in the forensic voice comparison field encounter regularly. Therefore, we decided to explore to what extent the automatic-speaker-recognition system’s and the earwitness’ ability to identify speakers is influenced when recordings are acquired in different languages and at different times. 100 voices in a database of 300 recordings (100 speakers recorded in three mutually mismatched sessions) were compared with an automatic-speaker-recognition software VOCALISE based on i-vectors and x-vectors, and by 39 respondents in simulated voice parades. Both the automatic and perceptual approach seem to have yielded similar results in that the less complex the mismatch type, the more successful the identification. The results point to the superiority of the x-vector approach, and also to varying identification abilities of listeners.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call