Abstract
Human action recognition in videos has become a popular research area in artificial intelligence (AI) technology. In the past few years, this research has accelerated in areas such as sports, daily activities, kitchen activities, etc., due to developments in the benchmarks proposed for human action recognition datasets in these areas. However, there is little research in the benchmarking datasets for human activity recognition in educational environments. Therefore, we developed a dataset of teacher and student activities to expand the research in the education domain. This paper proposes a new dataset, called EduNet, for a novel approach towards developing human action recognition datasets in classroom environments. EduNet has 20 action classes, containing around 7851 manually annotated clips extracted from YouTube videos, and recorded in an actual classroom environment. Each action category has a minimum of 200 clips, and the total duration is approximately 12 h. To the best of our knowledge, EduNet is the first dataset specially prepared for classroom monitoring for both teacher and student activities. It is also a challenging dataset of actions as it has many clips (and due to the unconstrained nature of the clips). We compared the performance of the EduNet dataset with benchmark video datasets UCF101 and HMDB51 on a standard I3D-ResNet-50 model, which resulted in 72.3% accuracy. The development of a new benchmark dataset for the education domain will benefit future research concerning classroom monitoring systems. The EduNet dataset is a collection of classroom activities from 1 to 12 standard schools.
Highlights
Before going into detail about the results, we will first present the analyses of the research questions (RQs), as follows: RQ 1: we identified 20 possible teacher- and student-centric classroom actions for our first RQ
The result may be better if we use other variants of Inflated 3D (I3D) available on MXNet, such as I3D-Inception V1, pre-trained on Kinetics-400 and ImageNet, I3D-RestNet-101, pre-trained on Kinetics-400, SlowFast 4 × 16-scratch, I3D-slow-ResNet-101 pre-trained on Kinetics-700, etc
We developed the EduNet dataset—a challenging action recognition dataset for the classroom environment
Summary
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Despite the rapid growth in video data generation, computer vision research is lagging in regard to automatically recognizing human activities. The actions belonged to 1 to 12 standard students and their respective teachers This complex ground-level dataset indicates the need for higher algorithmic research requirements. The majority of human action recognition video datasets focus on certain types of activities, i.e., cooking [8,9], sports [2,3], or simple actions [10]. A recent survey [11] listed 26 open action recognition video datasets in the following four categories—(i) action level datasets, (ii) behavior level datasets, (iii) interaction level datasets, (iv) group activities level datasets This rapid development demands different levels of advancement in video datasets. Specially developed for human action recognition (HAR) [12,13,14], show the incremental development of action classes, from 400 to 600 and 700, which prove the continuous enhancement of HAR video datasets
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have