The timely detection of falls and alerting medical aid is critical for health monitoring in elderly individuals living alone. This paper mainly focuses on issues such as poor adaptability, privacy infringement, and low recognition accuracy associated with traditional visual sensor-based fall detection. We propose an infrared video-based fall detection method utilizing spatial-temporal graph convolutional networks (ST-GCNs) to address these challenges. Our method used fine-tuned AlphaPose to extract 2D human skeleton sequences from infrared videos. Subsequently, the skeleton data was represented in Cartesian and polar coordinates and processed through a two-stream ST-GCN to recognize fall behaviors promptly. To enhance the network's recognition capability for fall actions, we improved the adjacency matrix of graph convolutional units and introduced multi-scale temporal graph convolution units. To facilitate practical deployment, we optimized time window and network depth of the ST-GCN, striking a balance between model accuracy and speed. The experimental results on a proprietary infrared human action recognition dataset demonstrated that our proposed algorithm accurately identifies fall behaviors with the highest accuracy of 96%. Moreover, our algorithm performed robustly, identifying falls in both near-infrared and thermal-infrared videos.
Read full abstract