Learning to Detect Attended Objects in Cultural Sites with Gaze Signals and Weak Object Supervision

Michele Mazzamuto,Antonino Furnari,Giovanni Maria Farinella,Francesco Ragusa

doi:10.1145/3647999

Abstract

Cultural sites such as museums and monuments are popular tourist destinations worldwide. Visitors come to these places to learn about the cultures, histories, and arts of a particular region or country. However, for many cultural sites, traditional visiting approaches are limited and may fail to engage visitors. To enhance visitors’ experiences, previous works have explored how wearable devices can be exploited in this context. Among the many functions that these devices can offer, understanding which artwork or detail the user is attending to is fundamental to provide additional information on the observed artworks, understand the visitor’s tastes, and provide recommendations. This motivates the development of algorithms for understanding visitor attention from egocentric images. We considered the attended object detection task, which involves detecting and recognizing the object observed by the camera wearer, from an input RGB image and gaze signals. To study the problem, we collect a dataset of egocentric images collected by subjects visiting a museum. Since collecting and labeling data in cultural sites for real applications is a time-consuming problem, we present a study comparing unsupervised, weakly supervised, and fully supervised approaches for attended object detection. We evaluate the considered approaches on the collected dataset, assessing also the impact of training models on external datasets such as COCO and EGO-CH. The experiments show that weakly supervised approaches requiring only a 2D point label related to the gaze can be an effective alternative to fully supervised approaches for attended object detection. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/EGO-CH-Gaze/ .

Full Text