Abstract

Social robots in the home will need to solve audio identification problems to better interact with their users. This article focuses on the classification between (a) natural conversation that includes at least one co-located user and (b) media that is playing from electronic sources and does not require a social response, such as television shows. This classification can help social robots detect a user’s social presence using sound. Social robots that are able to solve this problem can apply this information to assist them in making decisions, such as determining when and how to appropriately engage human users. We compiled a dataset from a variety of acoustic environments that contained either natural or media audio, including audio that we recorded in our own homes. Using this dataset, we performed an experimental evaluation on a range of traditional machine learning classifiers and assessed the classifiers’ abilities to generalize to new recordings, acoustic conditions, and environments. We conclude that a C-Support Vector Classification (SVC) algorithm outperformed other classifiers. Finally, we present a classification pipeline that in-home robots can utilize, and we discuss the timing and size of the trained classifiers as well as privacy and ethics considerations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call