A Survey on Temporal Action Localization

Huifen Xia,Yongzhao Zhan

doi:10.1109/access.2020.2986861

Huifen Xia, Yongzhao Zhan

Open Access

PDF Available

https://doi.org/10.1109/access.2020.2986861

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Temporal action localization is one of the most crucial and challenging problems for video understanding in computer vision. It has received a lot of attention in recent years because of the extensive application of daily life. Temporal action localization has made some significant progress, especially with the development of deep learning recently. And more demand is for temporal action localization in untrimmed videos. In this paper, our target is to survey the state-of-the-art techniques and models for video temporal action localization. It mainly includes the related techniques, some benchmark datasets and the evaluation metrics of temporal action localization. In addition, we summarize temporal action localization from two aspects: fully-supervised learning and weakly-supervised learning. And we list several representative works and compare their performances respectively. Finally, we make some deep analysis and propose potential research directions, and conclude the survey.

Highlights

With the number of videos grows tremendously, video understanding becomes a hot question and a challenging direction in computer vision
We focus on temporal action localization, which is the 4th of the above lists
At the actual binary classification, the positive1 label refers to the samples you are more concerned about, such as an action or an abnormal event

Summary

Introduction

With the number of videos grows tremendously, video understanding becomes a hot question and a challenging direction in computer vision. According to ActivityNet Challenge 2017 [48] held by CVPR in Hawaii, a total of 5 tasks were proposed. E) Dense-Captioning Events in Videos (ActivityNet Captions). A) Untrimmed Video Classification (ActivityNet [7]). In this survey, we focus on temporal action localization, which is the 4th of the above lists. We focus on temporal action localization, which is the 4th of the above lists It requires the detections of temporal intervals which contain the target actions. For a long untrimmed video, temporal action localization mainly solves two tasks which are recognition and localization.

Methods

Results

Conclusion