Abstract
The deliberate attempt by speakers to conceal their identity (voice disguise) presents a challenge for forensics and for automated speaker identification systems. Using a database of natural and disguised voices of three professional voice impersonators, we build on earlier findings (Amin et al., 2012) by exploring how certain glottal and vocal tract measures, including fundamental frequency (f0), glottal timing (Open Quotient), and vowel formants, are exploited to create novel voice identities. Specifically, we explored whether the amount and type of variation exhibited by impersonators can be used to develop a metric for distinguishing natural from disguised voices. As expected, variation in f0 and Open Quotient was speaker-dependent, and corresponded closely to social attributes (i.e., gender/age) of the voice identities involved. In a novel finding, the effects of voice identity on vowel formants were highly dependent on vowel category, and could not be readily characterized as global modifications to the vowel space (Bradlow et al.,1996). We therefore developed a no-reference objective metric for voice disguise that treats formant variability on a vowel-by-vowel basis. This metric consistently assigned high rankings to natural voices (3.3/27 on average). This correlated closely with the subjective disguisedness ratings of 18 naïve listeners, even outperforming them slightly.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.