Abstract

In this study, we focus on the classification of neutral and stressed speech based on a physical model. In order to represent the characteristics of the vocal folds and vocal tract during the process of speech production and to explore the physical parameters involved, we propose a method using the two-mass model. As feature parameters, we focus on stiffness parameters of the vocal folds, vocal tract length, and cross-sectional areas of the vocal tract. The stiffness parameters and the area of the entrance to the vocal tract are extracted from the two-mass model after we fit the model to real data using our proposed algorithm. These parameters are related to the velocity of glottal airflow and acoustic interaction between the vocal folds and the vocal tract and can precisely represent features of speech under stress because they are affected by the speaker’s psychological state during speech production. In our experiments, the physical features generated using the proposed approach are compared with traditionally used features, and the results demonstrate a clear improvement of up to 10% to 15% in average stress classification performance, which shows that our proposed method is more effective than conventional methods.

Highlights

  • Stress is a psycho-physiological state characterized by subjective strain, increased physiological activity, and deterioration of performance [1]

  • Varied vocal tract (VT) denotes that the parameters for cross-sectional area are estimated by fitting the two-mass model instead of being fixed as constants

  • 6.3 Results for physical parameters In the second evaluation, VTL was first estimated for each speaker, and further evaluations were based on the obtained vocal tract length

Read more

Summary

Introduction

Stress is a psycho-physiological state characterized by subjective strain, increased physiological activity, and deterioration of performance [1]. Factors inducing stress on speakers include workload, background noise, emotions, physical environmental factors (e.g., G-force), and fatigue. These factors are believed to affect voice quality and are detrimental to the performance of communication equipment, especially automated systems with speech interfaces. Some external factors (workload, background noise, etc.) and internal factors (emotional state, fatigue, etc.) may induce stress [2]. The first investigations of emotional speech were conducted in the mid-1980s, using the studies lack a physical basis, and the methods do not consider the whole process of speech production, which is believed to be essential for effective classification of speech under stress

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.