Abstract
Object detection and analysis using deep neural networks (DNNs) have become significant challenges due to their computational and power requirements. Hence, such computation is possible on general-purpose platforms like central processing units (CPUs), graphic processing units (GPUs), application-specific integrated circuits (ASICs), and field-programmable gate arrays (FPGAs). However, the development of high computational platforms is a critical challenge for efficient edge computing tasks like object detection, where power and bandwidth are low but faster and energy-efficient solutions are required. System-on-chip (SoC) designs are an optimistic solution for addressing these challenges. This study presents the power and delay-optimized Multiply-Accumulate (MAC) unit architecture for DNN and compares the parameters of 4-bit, 8-bit, 12-bit, and 16-bit MAC units. Vivado software has been employed to construct the MAC unit. It can carry out addition, accumulation, and multiplication operations. The design is analyzed and simulated using the Vivado High-Level Synthesis (HLS) tool, which is subsequently deployed on the Zybo Evaluation and Development Kit. The proposed approach outperforms the existing state-of-the-art models in terms of processing time and power for different precisions.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have