Abstract

Recent efforts in computer vision tackle the problem of human activity understanding in video sequences. Traditionally, these algorithms require annotated video data to learn models. In this paper, we introduce a novel data collection framework, to take advantage of the large amount of video data available on the web. We use this new framework to retrieve videos of human activities in order to build datasets for training and evaluating computer vision algorithms. We rely on Amazon Mechanical Turk workers to obtain high accuracy annotations. An agglomerative clustering technique brings the possibility to achieve reliable and consistent annotations for temporal localization of human activities in videos. Using two different datasets, Olympics Sports and our novel Daily Human Activities dataset, we show that our collection/annotation framework achieves robust annotations for human activities in large amount of video data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call