Abstract

Imagining recognition of behaviors from video sequences for a machine is full of challenges but meaningful. This work aims to predict students’ behavior in an experimental class, which relies on the symmetry idea from reality to annotated reality centered on the feature space. A heteromorphic ensemble algorithm is proposed to make the obtained features more aggregated and reduce the computational burden. Namely, the deep learning models are improved to obtain feature vectors representing gestures from video frames and the classification algorithm is optimized for behavior recognition. So, the symmetric idea is realized by decomposing the task into three schemas including hand detection and cropping, hand joints feature extraction, and gesture classification. Firstly, a new detector method named YOLOv4-specific tiny detection (STD) is proposed by reconstituting the YOLOv4-tiny model, which could produce two outputs with some attention mechanism leveraging context information. Secondly, the efficient pyramid squeeze attention (EPSA) net is integrated into EvoNorm-S0 and the spatial pyramid pool (SPP) layer to obtain the hand joint position information. Lastly, the D–S theory is used to fuse two classifiers, support vector machine (SVM) and random forest (RF), to produce a mixed classifier named S–R. Eventually, the synergetic effects of our algorithm are shown by experiments on self-created datasets with a high average recognition accuracy of 89.6%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call