Abstract

Social relations are ubiquitous in people’s daily life. Especially, the widespread of video in social media and intelligent surveillance gives us a new chance to discover the social relations among people. Previous researches mostly focus on the recognition of social relations from texts, blogs, or images. However, these methods are only concentrated on limited social relations and incapable of dealing with video data. In this paper, we address the challenges of social relation recognition by employing a multi-stream model to exploit the abundant multimodal information in videos. First of all, we build a video dataset with 16 categories of social relations annotation according to psychology and sociology studies, named Social Relation In Videos (SRIV), which comprises of 3,124 videos. According to our knowledge, it is the first video dataset for the social relation recognition. Secondly, we propose a multi-stream deep learning model as a benchmark for recognizing social relations, which learns high level semantic information of spatial, temporal, and audio of people’s social interactions in videos. Finally, we fuse them with logical regression to achieve accurate recognition. Experimental results show that the multi-stream deep model is effective for social relation recognition on the proposed dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call