Abstract

Social relations are ubiquitous in people’s daily life. Especially, the widespread of video in social media and intelligent surveillance gives us a new chance to discover the social relations among people. Previous researches mostly focus on the recognition of social relations from texts, blogs, or images. However, these methods are only concentrated on limited social relations and incapable of dealing with video data. In this paper, we address the challenges of social relation recognition by employing a multi-stream model to exploit the abundant multimodal information in videos. First of all, we build a video dataset with 16 categories of social relations annotation according to psychology and sociology studies, named Social Relation In Videos (SRIV), which comprises of 3,124 videos. According to our knowledge, it is the first video dataset for the social relation recognition. Secondly, we propose a multi-stream deep learning model as a benchmark for recognizing social relations, which learns high level semantic information of spatial, temporal, and audio of people’s social interactions in videos. Finally, we fuse them with logical regression to achieve accurate recognition. Experimental results show that the multi-stream deep model is effective for social relation recognition on the proposed dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.