Abstract
While many authorship analysis techniques could be applied to any type of linguistic data, there is very limited research on their application on transcriptions of spontaneous speech to help identify the speaker. In authorship analyses, frequency analysis of function words has been quite a successful feature since, when writing a text, people tend to use different function words in different frequencies. Thus, this feature is sensitive to author style while it remains relatively insensitive to the topic of the text. In this paper, we make the cross-over to speech samples and test whether frequency analysis of the most common words in speech, which are mostly function words, has potential as a speaker discriminant. We propose a method within the likelihood ratio framework for court applicability. Our approach takes into account both the similarity and typicality of word frequencies by employing a method using percentile-rank for feature extraction. We apply our method on the forensically relevant dataset FRIDA, achieving good results even when a limited amount of data is available.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.