Abstract

The world’s elderly population continues to grow at an unprecedented rate, creating a need to monitor the safety of an aging population. One of the current problems is accurately classifying elderly physical activities, especially falling down, and delivering prompt assistance to someone in need. With the advent of deep learning, there has been extensive research on vision-based action recognition and fall detection in particular using human pose estimation employing 2D human pose information. Nevertheless, due to a lack of large-scale elderly fall datasets and the continuation of numerous challenges such as varying camera angles, illumination, and occlusion accurately classifying falls has been a problematic. To address these problems, this research first carried out a comprehensive study of the AI Hub dataset collected from real lives of elderly people in order to benchmark the performance of state-of-the-art human pose estimation methods. Secondly, owing to the limited number of real datasets, augmentation with synthetic data was applied and performance improvement was validated based on changes in the degree of accuracy. Third, this study shows that a Transformer network applied to elderly action recognition outperforms LSTM-based networks by a noticeable margin. Lastly, by observing the quantitative and qualitative performances of different networks, this paper proposes an efficient solution for elderly activity recognition and fall detection in the context of surveillance cameras.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call