Human Activity Recognition (HAR) plays a crucial role in communication and the Internet of Things (IoT), by enabling vision sensors to understand and respond to human behavior more intelligently and efficiently. Existing deep learning models are complex to deal with the low illumination, diverse viewpoints, and cluttered backgrounds, which require substantial computing resources and are not appropriate for edge devices. Furthermore, without an effective video analysis technique it processes entire frames, resulting inadequate performance. To address these key challenges, a cloud-assisted IoT computing framework is proposed for HAR in uncertain low-lighting environments, which is mainly composed of two tiers: edge and cloud computing. Initially, a lightweight Convolutional Neural Network (CNN) model is developed which is responsible to enhance the low-light frames, followed by the human detection algorithm to process the selective frames, thus enabling efficient resource utilization. Next, these refined frames are then transmitted to the cloud for accurate HAR, where dual stream CNN and transformer fusion network extract both short-and long-range spatiotemporal discriminative features followed by proposed Optimized Parallel Sequential Temporal Network (OPSTN) with squeeze and excitation attention to efficiently learn HAR in complex scenarios. Finally, extensive experiments are conducted over three challenging HAR datasets to deeply examine the proposed framework from various perspectives such as complex activity recognition, lowlighting, etc., where the results are outperformed compared with the state-of-art methods.
Read full abstract