Abstract

Human action recognition in videos has become a popular research area in artificial intelligence (AI) technology. In the past few years, this research has accelerated in areas such as sports, daily activities, kitchen activities, etc., due to developments in the benchmarks proposed for human action recognition datasets in these areas. However, there is little research in the benchmarking datasets for human activity recognition in educational environments. Therefore, we developed a dataset of teacher and student activities to expand the research in the education domain. This paper proposes a new dataset, called EduNet, for a novel approach towards developing human action recognition datasets in classroom environments. EduNet has 20 action classes, containing around 7851 manually annotated clips extracted from YouTube videos, and recorded in an actual classroom environment. Each action category has a minimum of 200 clips, and the total duration is approximately 12 h. To the best of our knowledge, EduNet is the first dataset specially prepared for classroom monitoring for both teacher and student activities. It is also a challenging dataset of actions as it has many clips (and due to the unconstrained nature of the clips). We compared the performance of the EduNet dataset with benchmark video datasets UCF101 and HMDB51 on a standard I3D-ResNet-50 model, which resulted in 72.3% accuracy. The development of a new benchmark dataset for the education domain will benefit future research concerning classroom monitoring systems. The EduNet dataset is a collection of classroom activities from 1 to 12 standard schools.

Highlights

  • Before going into detail about the results, we will first present the analyses of the research questions (RQs), as follows: RQ 1: we identified 20 possible teacher- and student-centric classroom actions for our first RQ

  • The result may be better if we use other variants of Inflated 3D (I3D) available on MXNet, such as I3D-Inception V1, pre-trained on Kinetics-400 and ImageNet, I3D-RestNet-101, pre-trained on Kinetics-400, SlowFast 4 × 16-scratch, I3D-slow-ResNet-101 pre-trained on Kinetics-700, etc

  • We developed the EduNet dataset—a challenging action recognition dataset for the classroom environment

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Despite the rapid growth in video data generation, computer vision research is lagging in regard to automatically recognizing human activities. The actions belonged to 1 to 12 standard students and their respective teachers This complex ground-level dataset indicates the need for higher algorithmic research requirements. The majority of human action recognition video datasets focus on certain types of activities, i.e., cooking [8,9], sports [2,3], or simple actions [10]. A recent survey [11] listed 26 open action recognition video datasets in the following four categories—(i) action level datasets, (ii) behavior level datasets, (iii) interaction level datasets, (iv) group activities level datasets This rapid development demands different levels of advancement in video datasets. Specially developed for human action recognition (HAR) [12,13,14], show the incremental development of action classes, from 400 to 600 and 700, which prove the continuous enhancement of HAR video datasets

Problem Statement
Motivation
Contribution
Background
Benchmark HAR Datasets
HAR Deep Learning Models
Related Work
Dataset Details
Data Collection
Number
Naming Convention
Dataset Split
Comparison with Other Datasets
Experimental Details
Machine Setup
Model Architecture
Results and Discussion
Analysis of Result
Threats to Internal Validity
Threats to External Validity
Construct Validity
Conclusion Validity
Conclusions and Future Research
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call