A weakly supervised CNN model for spatial localization of human activities in unconstraint environment

N Kumar,N Sukavanam

doi:10.1007/s11760-019-01633-y

Abstract

Human action localization in a given video sequences refers to the spatial and temporal information of the specified action. Similar to its recognition, action localization also plays very important roles in security, disease diagnosis and geographical systems. The necessity of its localization can help in tracking, detection and prediction issues of the concerned event. The main issue is noticed, while it has to process long, untrimmed and highly occluded videos in uncontrolled conditions as it requires expensive as well as laborious tasks of retrieving annotation for every action. Motivated from the recent state of the art in deep learning for image classification, we presented a weakly supervised action localization model based on deep neural network. The proposed model is useful in case of large amount of dealing with large amount of data as developing a big network consumes more computational resources and many times it raises overfitting issues. We utilized the effectiveness of Inception V3 model (GoogLeNet) framework which uses TensorFlow at backend and Batch normalization along with the convolution layers. Batch normalization efficiently removes covariant shifts problem between the network layers. The approach developed in this work is validated on UCF50 and UCF sports action benchmark datasets. The proposed model gives satisfactory results as observed from the two data-samples (UCF50 and UCF sports); it can perform better on long untrimmed video sequences captured from unconstraint environment. The important application of this work can be found in very sensitive tasks, like hidden objects auto-localization and detecting enemy position under camera surveillance.

Full Text