Classroom student posture recognition based on an improved high-resolution network

Yiwen Zhang,Huansheng Ning,Zhenyu Liu,Tao Zhu

doi:10.1186/s13638-021-02015-0

Abstract

Due to the large number of students in a typical classroom and crowded seating, most features of student posture are often obscured, making it difficult to balance the accuracy in identifying student postures with computational efficiency. To solve this issue, a novel classroom student posture recognition method is proposed. First, to recognize the poses of multiple students in the classroom, we use the you-only-look-once (YOLOv3) algorithm for object detection and retrain it to detect human objects that are hunching on a table, creating the pose estimation network. Next, to improve the accuracy of the pose estimation network, we use the squeeze-and-excitation network structure that is embedded in the residual structure of high-resolution networks (HRNet). Finally, with the improved HRNet algorithm’s outputs of key human body points, we design a pose classification algorithm based on a support vector machine, to classify human poses in the classroom. Experiments show that the improved HRNet multi-person pose estimation algorithm yields the best mean average precision performance of 73.76% on the common objects in context (COCO) validation dataset. We further test the proposed algorithm on a customer dataset collected in a classroom and achieved a high recognition rate of 90.1% and good robustness.

Highlights

In recent years, due to the growth of surveillance systems for both public and personal usage, pose estimation and detection methods have been developed to meet the emerging needs of various industries
We report standard average precision and recall scores [11], where average precision (AP) stands for the mean of AP scores at 10 positions, object key point similarity (OKS) = 0.50, 0.55...0.9, 0.95, and the same applies to the average recall (AR)
5.2 Comparing different pose estimation methods To verify the effectiveness of the improved high-resolution networks (HRNet), several pose estimation frameworks are investigated, including OpenPose [7], original HRNet [11], and ResNet [28], for comparison

Summary

Introduction

Due to the growth of surveillance systems for both public and personal usage, pose estimation and detection methods have been developed to meet the emerging needs of various industries. In the scenario of a smart university classroom, the task of student pose estimation and detection using computer vision technology has important research implication and great application value. One category is based on object detection algorithms. Lin Tang and Bin T’s method [1, 2] uses the improved Faster R-CNN [3] model for object recognition to detect student postures in classrooms. Li W’s method [4] only detects students sleeping based on improved R-FCN [5].

Methods

Results

Discussion

Conclusion