Abstract

We performed machine learning for text-independent speaker identification using speech recorded during the day, evening, and night, from subjects undergoing 25 hours of prolonged wakefulness. Subjects answered casual questions lasting approximately 3 minutes and described pictures presented to them for 0.5 minutes. We extracted 12,515 vocal features using OpenSmile software. For generalization of the training scheme, we segmented the 20 subjects into training and testing sets (10 subjects for each) and repeated testing four times with different subsets. Specifically, we used one set of 10 subjects to find the best feature-sets and the optimal machine-learning method, and the other set of 10 subjects was used to test the trained model. With trained machine-learning models using three speech sessions recorded throughout the day for speaker identification, we obtained 95% and 98.8% for balanced accuracies for daytime and evening speech, respectively, but 84.2% for nighttime-testing speech. With training data from all times of day-daytime, evening, and nighttime-we obtained 97.5%, 98.8%, and 98.1% for balanced accuracies for test data from daytime, evening, and nighttime speech, respectively; the overall accuracy was 98.1%. Prolonged wakefulness deteriorates the performance of machine-learning based speaker identification. This work suggests that machine-learning based speaker identification should be trained using speech data from both daytime and nighttime speech sessions for better overall accuracy. Machine learning can potentially be used for identifying a speaker's voice even when it is affected by tiredness and fatigue which are frequently encountered in scenarios such as the emergency rooms and long-duration repetitive task operations.

Highlights

  • Speaker identification is relevant for applications such as military operations, forensic speaker recognition, and phone customer service, among others [1], [2]

  • We evaluated the performance of the machine learning methods by calculating the balanced accuracy as follows: BBBBBBBBBBBBBBBB

  • By using two sessions for training the machine learning methods, all test sets showed more than 90% balanced accuracies

Read more

Summary

Introduction

Speaker identification is relevant for applications such as military operations, forensic speaker recognition, and phone customer service, among others [1], [2]. For these applications, speaker identification must be independent of the text being spoken, and there can be no reliance on emotional or situational context. Speaker identification must be independent of the text being spoken, and there can be no reliance on emotional or situational context This makes speech identification challenging, because external factors like stress, emotions, and fatigue can affect human speech [3]–[5].

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.