Abstract

Current state-of-the-art approaches for spatio-temporal action detection deal with stable videos and quite sterilized environments, as seen in the UCF-101 benchmark. In addition, the objects of interest are typically relatively close to the camera, and therefore fairly clear and easily distinguished. This study presents an approach method for online human action detection in long-distance imaging affected by atmospheric distortions. We created a unique dataset of typical actions in long-range imaging. Various CNN frameworks were examined for the initial moving object detection phase, including 2D, 3D, one stream, and two-stream (RGB frames and optical flow). The basic object detection methods examined within these frameworks include the YOLOv3 and an extension of the inflated 3D ConvNet with a Feature-Fused Single Shot Multibox Detector (FFSSD) to improve small object detection. To cope with the harmful effect of the spatio-temporal random movements induced by atmospheric effects on motion estimation, we first fit the optical flow stream characteristics to a temporally noisy turbulent environment. A significant improvement of the action detection quality under such noisy conditions was obtained by constructing an online tracking algorithm that incrementally constructs and labels the objects' tracks from the network's frame-level detections. Experimental results show that our approach outperforms the state-of-the-art on our dataset in terms of the mAP measure.

Highlights

  • Action detection focuses on classifying the actions present in a video and localizing them in space and time

  • In this paper we show that an important stage for increasing the precision of the action detection process can be a tracking process that takes into account the random spatio-temporal motions in the video, caused by the long atmospheric path

  • Our main contributions can be summarized as follows: 1) a pre-processing algorithm for optical flow calculation based on characteristics of turbulence, 2) an examination of various 2D and 3D networks combinations. In part of these architectures, we extended the I3D method [12] to support small-object action localization with a modified SSD detector, 3) a novel online tracking algorithm based on turbulence characteristics, 4) an algorithm based on majority voting that updates the labels of the actions dynamically during tracking, reducing false class predictions under turbulent conditions, and 5) the creation of a unique dataset of typical actions undertaken in long-range imaging affected by the atmospheric path, which will be made publicly available

Read more

Summary

Introduction

Action detection focuses on classifying the actions present in a video and localizing them in space and time. It is a challenging problem, and it becomes even more difficult in the case of long-distance imaging (at about two kilometers and above) due to the effects of turbulence and aerosols in the air, which become more meaningful as the imaging path length increases [1]. Figs. (c) and (d) are samples from our dataset, where (c) is an RGB image, and (d) is the corresponding optical flow. The optical flow map is much noisier due to the random movements caused by the air turbulence

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call