Abstract

Due to advancements in technology and social media, a large amount of visual information is created. There is a lot of interesting research going on in Computer Vision that takes into consideration either visual information generated by first-person (egocentric) or third-person(exocentric) cameras. Video data generated by YouTubers, Surveillance cameras, and Drones which is referred to as third-person or exocentric video data. Whereas first-person or egocentric is the one which is generated by GoPro cameras and Google Glass. Exocentric view capture wide and global views whereas egocentric view capture activities an actor is involved in w.r.t. objects. These two perspectives seem to be independent yet related. In Computer Vision, these two perspectives have been studied by various domains like Activity Recognition, Object Detection, Action Recognition, and Summarization independently. Their relationship and comparison are less discussed in the literature. This paper tries to bridge this gap by presenting a systematic study of first-person and third-person videos. Further, we implemented an algorithm to classify videos as first-person/third-person with the validation accuracy of 88.4% and an F1-score of 86.10% using the Charades dataset..

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.