Abstract

AbstractWith the exponential growth of video data, action recognition has become an increasingly important area of study. Despite various advancements, achieving a balance between detection accuracy and lightness remains a formidable challenge, primarily due to the complexity of existing action recognition models. To address this issue, DenseGCN is developed, a lightweight network designed to optimize accuracy and efficiency. The aim was to create a detection model that has high accuracy while remaining lightweight for real‐world applications. DenseGCN operates via a unique three‐level feature fusion system. The initial stage involves the Multi‐level Fusion Network (MlFN), which contains dense connections and a Spatial‐Temporal Fusion Attention module (STF‐Att), designed to eliminate bias in feature extraction caused by deep networks. In the next stage, RefineBone tackles optimization issues in low‐dimensional feature layers by leveraging high‐dimensional feature layers, thus avoiding gradient stacking. Finally, the Multi‐temporal Fusion Feature Pyramid Network (MF‐FPN) generates a discriminative classification feature map by repetitively combining data from multiple dimensions. This strategy has proven successful in refining the extracted feature, allowing for discriminative feature extraction even with a reduced number of channels. This efficient design not only contributes to further research in developing lightweight networks but also offers enhanced possibilities for real‐world implementations. In two large‐scale datasets, NTU RGB+D 60 and 120, DenseGCN outperformed other state‐of‐the‐art methods, achieving an accuracy of 92.7% on the X‐View benchmark of the NTU RGB+D 60 dataset. The DenseGCN is 10.2 × faster and 10 × smaller than the spatial temporal graph attention network (STGAT) proposed in 2022 while retaining very competitive accuracy. The findings suggest that this model significantly improves the quality of feature extraction. As a result, DenseGCN presents a remarkable balance between accuracy and lightness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call