Abstract

Speech classification is one of the most convenient objective measures of internal state exhibited during a problem-solving task that requires verbal communication. This study investigates the hypothesis of speech acoustic characteristics being indicative of trust between team members and team members’ familiarity with each other. Speech recordings from 27 dyadic teams (26 males and 28 females) were made during a distributed threat perception task, determining safe points along a route through the town to be visited by a VIP. Before the threat detection mission, 26 team members knew each other, and the remaining 28 had no prior knowledge of their partners. Two levels (Low Trust and High Trust) of two trust constructs, TTP (Trust, Trustworthiness, Propensity to trust), and RIS (Reliance Intentions Scale), were estimated based on numerical responses to pre- and post-mission surveys. Speech recordings of individual speakers were divided into 1-second intervals and converted into RGB images of amplitude spectrograms. The images were classified using a pre-trained convolutional neural network ResNet-18 fine-tuned to recognize either the trust level or familiarity. In the baseline classification scenario, the speech was classified using a single transfer learning into Low/High-trust categories separately for RIS and TTP constructs before and after the mission yielding an average classification accuracy of 82%-86%. Single transfer learning classification into Know/Unknown-partners categories led to 85% accuracy. Application of double transfer learning, i.e., first tuning the ResNet-18 on Know/Unknown labels and then on Low/High-trust, increased the trust classification accuracy up to 89%. When tuning the ResNet-18 on Low/High-trust and then on Known/Unknown labels, the accuracy of partner familiarity recognition was also increased up to 89%. These results support the hypothesis of speech acoustics being indicative of trust and familiarity between team members and show that by adding prior related knowledge to the model, more efficient learning can be achieved without increasing the training data size.

Highlights

  • Interpersonal trust is commonly defined as the willingness to accept vulnerability to another person's actions or decisions

  • We provide a twofold contribution to automatic trust recognition from speech using convolutional neural networks (CNNs)

  • Speech recordings were automatically classified into two categories, Low/High trust, using a pre-trained CNN model ResNet-18

Read more

Summary

Introduction

Interpersonal trust is commonly defined as the willingness to accept vulnerability to another person's actions or decisions. Trust is a vital component of daily human interactions and decision-making processes with a partner. Understanding of subjective and objective factors affecting trust is an active topic of research in psychology, social studies, and recently in artificial intelligence and machine learning. A comprehensive review of the interpersonal trust research from the perspective of behavioral psychology can be found in [1]. Investigated objective trust indicators include emotional states [2], facial expressions [3], and speech attributes [4]. From the machine learning perspective, speech as an objective factor affecting interpersonal trust is of particular interest. Through the mapping of social attributes of trustworthiness into synthetic speech and machine-made

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.