Abstract

In this paper, we introduce a non-verbal multimodal joint visual attention model for human-robot interaction in household scenarios. Our model combines the bottom-up saliency and depth-based segmentation with the top-down cues such as pointing and gaze to detect the objects of interest according to the user. For generation of the top-down saliency maps, we have introduced novel methods for object saliency, based on the pointing direction as well as the gaze direction. For gaze estimation, a hybrid model has been introduced which automatically selects keypoint-based matching or back-projection based on the textureness of the object model. The combination of different cues ensures reliable object detection and interaction independent of the relative position between the user, robot and objects. Extensive experiments show good detection results in different interaction scenarios as well as in challenging environmental conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call