Design and Hardware Implementation of CNN-GCN Model for Skeleton-Based Human Action Recognition

Amine Mansouri,Toufik Bakir,Mahdi Madani,Abdellah Elzaar

doi:10.37394/232018.2024.12.31

Abstract

Deep learning algorithms in general have shown outstanding performances on tasks involving Human Action Recognition and spatio-temporal modeling. In our ever-more interconnected world where mobile and edge devices take center stage, there is a growing need for algorithms to operate directly on embedded platforms. Field-Programmable Gate Arrays (FPGA), owing to their reprogrammable nature and low-power attributes, stand out as excellent choices for these edge computing applications. In this work, our aims are threefold. Firstly, we aim to design and develop a novel custom model that combines Convolutional Neural Networks (CNNs) and Graph Convolutional Networks (GCNs) to effectively capture spatial and temporal features from skeleton data to achieve Human Action Recognition. Secondly, we endeavor to implement this custom model onto FPGA hardware using Vitis-AI, thus exploring the potential for efficient hardware acceleration of deep learning models. Lastly, we seek to evaluate the real-time performance of the FPGA-accelerated model in comparison to CPU and GPU implementations, assessing its suitability for deployment in real-world applications requiring low-latency inference. Experimental results based on the NTU dataset demonstrate the efficiency and accuracy of our custom model compared to traditional CPU or GPU implementations.

Full Text