EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment.

Tarique Hussain,Rizwan Qureshi,Zulfiqar Ali Memon,Tanvir Alam

doi:10.3390/s23198106

Abstract

The primary goal of this study is to develop a deep neural network for action recognition that enhances accuracy and minimizes computational costs. In this regard, we propose a modified EMO-MoviNet-A2* architecture that integrates Evolving Normalization (EvoNorm), Mish activation, and optimal frame selection to improve the accuracy and efficiency of action recognition tasks in videos. The asterisk notation indicates that this model also incorporates the stream buffer concept. The Mobile Video Network (MoviNet) is a member of the memory-efficient architectures discovered through Neural Architecture Search (NAS), which balances accuracy and efficiency by integrating spatial, temporal, and spatio-temporal operations. Our research implements the MoviNet model on the UCF101 and HMDB51 datasets, pre-trained on the kinetics dataset. Upon implementation on the UCF101 dataset, a generalization gap was observed, with the model performing better on the training set than on the testing set. To address this issue, we replaced batch normalization with EvoNorm, which unifies normalization and activation functions. Another area that required improvement was key-frame selection. We also developed a novel technique called Optimal Frame Selection (OFS) to identify key-frames within videos more effectively than random or densely frame selection methods. Combining OFS with Mish nonlinearity resulted in a 0.8-1% improvement in accuracy in our UCF101 20-classes experiment. The EMO-MoviNet-A2* model consumes 86% fewer FLOPs and approximately 90% fewer parameters on the UCF101 dataset, with a trade-off of 1-2% accuracy. Additionally, it achieves 5-7% higher accuracy on the HMDB51 dataset while requiring seven times fewer FLOPs and ten times fewer parameters compared to the reference model, Motion-Augmented RGB Stream (MARS).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment.

Abstract

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Journal: Sensors (Basel, Switzerland)	Publication Date: Sep 27, 2023
License type: CC BY 4.0

Similar Papers

Action Recognition in Videos with Temporal Segments Fusions
Yuanye Fang ... Qiu-Feng Wang
-
Yuanye Fang, et. al.Yuanye Fang ... Qiu-Feng Wang
01 Jan 2020
01 Jan 2020

Learning correlations for human action recognition in videos
Yun Yi ... Bowen Zhang
Multimedia Tools and Applications | VOL. 76
Yun Yi, et. al.Yun Yi ... Bowen Zhang
10 Feb 2017
Multimedia Tools and Applications | VOL. 76

Integrating Gaussian mixture model and dilated residual network for action recognition in videos
Ming Fang ... Jianwei Zhao
Multimedia Systems | VOL. 26
Ming Fang, et. al.Ming Fang ... Jianwei Zhao
20 Aug 2020
Multimedia Systems | VOL. 26

Spatial Attention Adapted to a LSTM Architecture with Frame Selection for Human Action Recognition in Videos
Carlos Orozco ... Julio Berlles
-
Carlos Orozco, et. al.Carlos Orozco ... Julio Berlles
24 Jul 2021
24 Jul 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment.

Abstract

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)