Abstract

In the last decade, forensic voice comparison has experienced a remarkable paradigm shift [Morrison, Sci. Justice 49, 298–308 (2009)]. Both automatic and traditional phonetic approaches have been developed within the new paradigm. The main difference is that traditional approaches are typically local in both time and frequency domains, with features like formant frequencies extracted from linguistically comparable items (e.g., words or phonemes), whereas automatic approaches are typically global, with long-term spectral properties used and linguistic information treated as noise. Since neither makes use of all the information present, combining them could improve performance. A fully automatic and a partially traditional system were compared. Data were pairs of non-contemporaneous landline-telephone recordings of 60 speakers from the Japanese National Research Institute of Police Science database (net 35–40 s speech per recording). In the fully automatic system, the whole speech-active portion of the recording was analyzed using 12th order LPCCs, mean cepstral subtraction, GMM-UBM, and logistic-regression calibration. In the partially traditional system, the same procedures were applied only to tokens of [o:], [ɴ], and [ç] extracted from the recordings, with logistic-regression fusion of the results. The performance of each system and the fusion of the two were compared using the log-likelihood-ratio cost (Cllr).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call