Abstract

The performance of the automatic speaker recognition (ASR) system Batvox TM (Version 4.1) has been tested with a male population of 24 monozygotic (MZ) twins, 10 dizygotic (DZ) twins, 8 non-twin siblings and 12 unrelated speakers (aged 18–52 with Standard Peninsular Spanish as their mother tongue). Since the cepstral features in which this ASR system is based depend largely on anatomical–physiological foundations, we hypothesized that such features ought to be gene-dependent. Therefore, higher similarity values should be found in MZ twins (100% shared genes) than in DZ twins, in brothers (B) or in a reference population of unrelated speakers (US). Results corroborated the expected decreasing scale MZ > DZ > B > US since the similarity coefficients yielded by the automatic system for these speakers decreased exactly in the same direction as the kinship degree of the four speaker groups diminishes. This suggests that the system features are to a great extent genetically conditioned and that they are hence useful and robust for comparing speech samples of known and unknown origin, as found in legal cases. Furthermore, the 9.9% EER (Equal Error Rate) obtained when testing MZ pairs lies around the same value (11% EER) found in Kunzel (2010) with German twins.

Highlights

  • Following the methodology described in Künzel (2010), in order to facilitate this task, one member of the twin pairs was labeled red and the other member was labeled blue

  • For the MZ twins participating in our study, we have considered useful to compare the coefficients obtained by each speaker in the intra-speaker (IS) comparisons with the coefficients obtained by these same speakers in the intrapair (IP) comparisons

  • We introduced the concept of intrapair (IP) comparison while taking into account the fact that out of the 54 speakers considered, 24 were MZ twins, 10 were DZ twins, eight were non-twin siblings and 12 were unrelated speakers

Read more

Summary

Introduction

It is widely acknowledged that distinguishing twins poses a major challenge in the field of forensics because these individuals are physically very similar Biometrics such as fingerprints (Jain, Prabhakar & Pankanti, 2002) or palmprints (Kong, Zhang & Lu, 2006) have often been investigated in twins to study the subtle differences frequently observed between them. In the same way that handwriting depends on physiology as much as on behavioral factors like training and habits, the foundations of speaker recognition are largely grounded on the idea that a voice is determined by anatomical structure and by nonbiological or behavioral factors These factors include mainly social or dialectal aspects but other environmental influences are possible. This organic-learned dichotomy (Nolan, 1997; Nolan & Oh, 1996) may be a good translation in phonetic terms of the wellknown nature–nurture dichotomy, first outlined by Sir Francis Galton in 1875 (Galton 1875, in Segal 1993, p. 45)

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call