Abstract

Life in modern societies is fast-paced and full of stress-inducing demands. The development of stress monitoring methods is a growing area of research due to the personal and economic advantages that timely detection provides. Studies have shown that speech-based features can be utilised to robustly predict several physiological markers of stress, including emotional state, continuous heart rate, and the stress hormone, cortisol. In this contribution, we extend previous works by the authors, utilising three German language corpora including more than 100 subjects undergoing a Trier Social Stress Test protocol. We present cross-corpus and transfer learning results which explore the efficacy of the speech signal to predict three physiological markers of stress—sequentially measured saliva-based cortisol, continuous heart rate as beats per minute (BPM), and continuous respiration. For this, we extract several features from audio as well as video and apply various machine learning architectures, including a temporal context-based Long Short-Term Memory Recurrent Neural Network (LSTM-RNN). For the task of predicting cortisol levels from speech, deep learning improves on results obtained by conventional support vector regression—yielding a Spearman correlation coefficient (ρ) of 0.770 and 0.698 for cortisol measurements taken 10 and 20 min after the stress period for the two corpora applicable—showing that audio features alone are sufficient for predicting cortisol, with audiovisual fusion to an extent improving such results. We also obtain a Root Mean Square Error (RMSE) of 38 and 22 BPM for continuous heart rate prediction on the two corpora where this information is available, and a normalised RMSE (NRMSE) of 0.120 for respiration prediction (−10: 10). Both of these continuous physiological signals show to be highly effective markers of stress (based on cortisol grouping analysis), both when available as ground truth and when predicted using speech. This contribution opens up new avenues for future exploration of these signals as proxies for stress in naturalistic settings.

Highlights

  • Understanding how stress manifests in the human body has several meaningful use-cases, from improving safety during driving (Bianco et al, 2019) to early intervention of neurodegeneration (Zafar, 2020)

  • Findings from this study showed that elevated cortisol levels—taken between 10 and 20 min after the Trier Social Stress Test (TSST), i. e., the time of speech under stress—correlate to a substantial level (Spearman’s correlation coefficient (ρ) of at best 0.421) with hand-crafted prosodic related feature sets performing best

  • Our main source of truth for the degree of stress during the TSST setting is the saliva-based cortisol measurements obtained at differing time points

Read more

Summary

Introduction

Understanding how stress manifests in the human body has several meaningful use-cases, from improving safety during driving (Bianco et al, 2019) to early intervention of neurodegeneration (Zafar, 2020). The production of cortisol responds to the activation of the hypothalamicpituitary-adrenal (HPA) axis, which begins to secrete the corticotropin-releasing hormone that causes the additional release of the adrenocorticotrophic hormone (ACTH) from the pituitary The release of such hormones is known to alter other physiological responses, including heart rate (Gönülateş et al, 2017), which in turn affects face colouring (Niu et al, 2018) and speech, during psychosocial stress (Brugnera et al, 2018). With this in mind, the speech signal can (nonintrusively) computationally monitor several states of wellbeing (Cummins et al, 2018). It has shown promise in recent studies to indicate physiological signals which are known to be markers of stress, e. g., correlation with saliva-based cortisol samples (Baird et al, 2019), states of emotional arousal (Stappen et al, 2021a), and co-occurring conditions including anxiety (Baird et al, 2020)

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.