Education is pivotal in shaping future generations, with artificial intelligence (AI) technologies revolutionizing conventional classroom approaches. Understanding student behavior is essential for improving teaching quality and learning outcomes. However, in large classrooms, monitoring each student becomes a challenge. Therefore, we propose an intelligent framework utilizing deep learning to create a vision-based classroom capable of autonomously analyzing student behavior and attention. Our approach, Parallel Spatio-Temporal SlowFast (PST-SlowFast), combines spatial and temporal attention mechanisms to enhance behavior detection accuracy. The key contributions of the PST-SlowFast model lie in its ability to handle spatial and temporal features in videos effectively, facilitated by integrating both spatial and temporal attention modules. By leveraging these attention mechanisms, the model captures relevant spatial and temporal features, enhancing performance in behavior detection tasks. The experimental results demonstrate the effectiveness of the PST-SlowFast model in recognizing behaviors such as reading, writing, and hand-raising. The PST-SlowFast model achieves an average mAP@50 of 88.8%, which is an improvement of 14% compared to state-of-the-art models such as YOLOv5x and YOLOv8x. These findings indicate the promise of the PST-SlowFast model for real-world applications in educational settings.
Read full abstract7-days of FREE Audio papers, translation & more with Prime
7-days of FREE Prime access