Abstract

A Speaker Identification system for a personalized wearable device to combat gender-based violence is presented in this paper. Speaker recognition systems exhibit a decrease in performance when the user is under emotional or stress conditions, thus the objective of this paper is to measure the effects of stress in speech to ultimately try to mitigate their consequences on a speaker identification task, by using data augmentation techniques specifically tailored for this purpose given the lack of data resources for this condition. An extensive experimentation has been carried out for assessing the effectiveness of the proposed techniques. First, we conclude that the best performance is always obtained when naturally stressed samples are included in the training set, and second, when these are not available, their substitution and augmentation with synthetically generated stress-like samples improves the performance of the system.

Highlights

  • In this paper, we analyze how stress affects speaker identification rates to determine if there is a significant difference when comparing it to a speaker identification system operating in neutral conditions

  • We have identified a problem, stressed speech in the testing stage affects negatively when Speaker Identification (SI) systems are trained only with neutral speech

  • As for the case of match and mismatch conditions, in the mixed setting—using neutral and stressed original utterances—the SI system achieves a 96.05% of accuracy, a satisfactory rate for this type of tasks, demonstrating that the set of features chosen for the task is adequate

Read more

Summary

Introduction

We analyze how stress affects speaker identification rates to determine if there is a significant difference when comparing it to a speaker identification system operating in neutral conditions. We aim at finding techniques to improve speaker identification systems when facing stressed speech, either by neutralizing the effects of stress or by training the system to cope with it. We propose data augmentation techniques both statistical and using synthetically generated speech under stressed conditions together with an analysis of the best feature extraction methods to design a stress-robust system [1].

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.