Is Someone There or Is That the TV? Detecting Social Presence Using Sound

Nicholas C Georgiou,Emmanuel Adeniran,Rebecca Ramnauth,Michael Lee,Brian Scassellati,Lila Selin

doi:10.1145/3611658

Abstract

Social robots in the home will need to solve audio identification problems to better interact with their users. This article focuses on the classification between (a) natural conversation that includes at least one co-located user and (b) media that is playing from electronic sources and does not require a social response, such as television shows. This classification can help social robots detect a user’s social presence using sound. Social robots that are able to solve this problem can apply this information to assist them in making decisions, such as determining when and how to appropriately engage human users. We compiled a dataset from a variety of acoustic environments that contained either natural or media audio, including audio that we recorded in our own homes. Using this dataset, we performed an experimental evaluation on a range of traditional machine learning classifiers and assessed the classifiers’ abilities to generalize to new recordings, acoustic conditions, and environments. We conclude that a C-Support Vector Classification (SVC) algorithm outperformed other classifiers. Finally, we present a classification pipeline that in-home robots can utilize, and we discuss the timing and size of the trained classifiers as well as privacy and ethics considerations.

Full Text