Abstract

Sound source localization is important in human interaction, such as in locating the origin of long-distance calls or facing other humans while in a conversation. It is of interest to apply such functionality to the core of human-robot interaction (HRI) and investigate its benefits, if any. In this paper, we propose three strategies for how to integrate the functionality of multiple directions-of-arrival (multi-DOA) estimation with a common scenario, in which the robot acts as a waiter while applying audio source localization. The proposed strategies are: a) the robot locates calls from users at a relatively long distance; b) the robot faces the user when taking the order; and c) the robot announces whether the acoustic environment is not conducive to understanding a speech command (mainly where more than one user speaks at once). It was seen that users react favourably to the functionality, and that it even has a noticeable influence on the success of the interaction.

Highlights

  • Sound source localization plays an important part of human interaction, and it is of interest that it plays an important role in HRI

  • We explore three strategies, using the functionality of multi-DOA estimation to complement the process of obtaining an order from groups of users via Automatic Speech Recognition (ASR):

  • One is by an evaluation carried out by volunteers that participated as customers in the waiter scenario

Read more

Summary

Introduction

Sound source localization plays an important part of human interaction, and it is of interest that it plays an important role in HRI. Multi-DOA estimation is a basic component of soundsource localization, which is a vital ability in successfully interacting with the environment Since it is omnidirection‐ al and insensitive to occlusion and lighting conditions, auditory perception provides important complementary information to visual information for the identification and localization of interesting or potentially dangerous events in the environment. Being aware of a speaker’s location has allowed computer conversational systems to respond more naturally to users’ needs This has led to different face-to-face communication frameworks being proposed to enhance the interaction between humans and embodied agents [2, 5, 4, 18] and robots [32], they have mostly been studied using ’Wizard of Oz’ experiments.

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call