Abstract

Temporal action localization is one of the most crucial and challenging problems for video understanding in computer vision. It has received a lot of attention in recent years because of the extensive application of daily life. Temporal action localization has made some significant progress, especially with the development of deep learning recently. And more demand is for temporal action localization in untrimmed videos. In this paper, our target is to survey the state-of-the-art techniques and models for video temporal action localization. It mainly includes the related techniques, some benchmark datasets and the evaluation metrics of temporal action localization. In addition, we summarize temporal action localization from two aspects: fully-supervised learning and weakly-supervised learning. And we list several representative works and compare their performances respectively. Finally, we make some deep analysis and propose potential research directions, and conclude the survey.

Highlights

  • With the number of videos grows tremendously, video understanding becomes a hot question and a challenging direction in computer vision

  • We focus on temporal action localization, which is the 4th of the above lists

  • At the actual binary classification, the positive1 label refers to the samples you are more concerned about, such as an action or an abnormal event

Read more

Summary

Introduction

With the number of videos grows tremendously, video understanding becomes a hot question and a challenging direction in computer vision. According to ActivityNet Challenge 2017 [48] held by CVPR in Hawaii, a total of 5 tasks were proposed. E) Dense-Captioning Events in Videos (ActivityNet Captions). A) Untrimmed Video Classification (ActivityNet [7]). In this survey, we focus on temporal action localization, which is the 4th of the above lists. We focus on temporal action localization, which is the 4th of the above lists It requires the detections of temporal intervals which contain the target actions. For a long untrimmed video, temporal action localization mainly solves two tasks which are recognition and localization.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call