Abstract

Speech samples can provide valuable information regarding speaker characteristics, including their social backgrounds. Accent variations with speaker backgrounds reflect corresponding acoustic features of speech, and these acoustic variations can be analyzed to assist in tracking down criminals from speech samples available as forensic evidence. Speech accent identification has recently received significant consideration in the speech forensics research community. However, most works have utilized long-term temporal modelling of acoustic features for accent classification and disregarded the stationary acoustic characteristics of particular phoneme articulations. This paper analyzes short-term acoustic features extracted from a central time window of English vowel speech segments for accent discrimination. Various feature computation techniques have been compared for the accent classification task. It has been found that using spectral features as an input gives better performance than using cepstral features, with the lower filters contributing more significantly to the classification task. Moreover, detailed analysis has been presented for time window durations and frequency bin resolution to compute short-term spectral features concerning accent discrimination. Using longer time durations generally requires higher frequency resolution to optimize classification performance. These results are significant, as they show the benefits of using spectral features for speaker profiling despite the popularity of cepstral features for other speech-related tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call