Abstract
The purpose of this research was to examine the effects of speech sample duration on speaker identification accuracy when an FFT-based Long-Term Spectrum (LTS) analysis approach was utilized. A secondary goal was to assess the usefulness of FFT-LTS when non-contemporary speech samples were used. Two separate experiments were performed. Results for Experiment 1 revealed identification accuracy of 92–100% for text-independent, contemporary speech samples of 20 s and 10 s duration. Five-second contemporary samples resulted in accuracy rates of 28–96%. Statistical analysis revealed that the specific text of the test sample was significantly related to accuracy rate; thus, this procedure could not be considered “text-independent” at the 5 s duration. Preliminary phonemic analysis did not reveal a phonetically-based explanation for this text-related difference in accuracy results at 5 s. Results of Experiment 2 indicated that for non-contemporary samples with similar text, the identification accuracy of the LTS vector was significantly reduced (52–72% correct). It was concluded that the LTS approach may be most useful as one element in a multi-vector speaker recognition/identification profile.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.