An effective behavior recognition method in the video session using convolutional neural network.

Yizhen Meng,Jun Zhang

doi:10.1371/journal.pone.0266734

Abstract

In order to further improve the accuracy of the video-based behavior recognition method, an effective behavior recognition method in the video session using convolutional neural network is proposed. Specifically, by adding the target detection phase before the behavior recognition algorithm, the body region in the video can be accurately extracted to reduce the interference of redundant and unnecessary background noises, and at the same time, the inappropriate images can be replaced, which has reached the role of balance background trade-off, and finally, the neural network can learn the human behavior information with emphasis. By adding fragmentation and stochastic sampling, the long-time time-domain modeling of the whole video session can be established, so that the model can obtain video-level expression ability. Finally, the improved loss function is used for behavior recognition to solve the problem of classification difficulty and possible sample imbalance. In addition, we conducted the hyperparametric experiment, the ablation experiment and the contrast experiment on different open source and benchmark datasets. Compared with other commonly used behavior recognition algorithms, the experimental results verify the effectiveness of the proposed method. In addition, the related deep learning-based methods used in behavior recognition are reviewed at the beginning of this paper, and the challenges in behavior recognition and future research directions are prospected at the end of this paper, which will undoubtedly play a double role in the work of later researchers.

Full Text