Abstract

The optimization for hardware processor and system for performing deep learning operations such as Convolutional Neural Networks (CNN) in resource limited embedded devices are recent active research area. In order to perform an optimized deep neural network model using the limited computational unit and memory of an embedded device, it is necessary to quickly apply various configurations of hardware modules to various deep neural network models and find the optimal combination. The Electronic System Level (ESL) Simulator based on SystemC is very useful for rapid hardware modeling and verification. In this paper, we designed and implemented a Deep Learning Accelerator (DLA) that performs Deep Neural Network (DNN) operation based on the RISC-V Virtual Platform implemented in SystemC in order to enable rapid and diverse analysis of deep learning operations in an embedded device based on the RISC-V processor, which is a recently emerging embedded processor. The developed RISC-V based DLA prototype can analyze the hardware requirements according to the CNN data set through the configuration of the CNN DLA architecture, and it is possible to run RISC-V compiled software on the platform, can perform a real neural network model like Darknet. We performed the Darknet CNN model on the developed DLA prototype, and confirmed that computational overhead and inference errors can be analyzed with the DLA prototype developed by analyzing the DLA architecture for various data sets.

Highlights

  • Deep Neural Network (DNN) in the field of Artificial Intelligence (AI) has been applied to various application fields with the high accurate inference abilities such as object detection, based on the learning of huge data set and rich computational resources

  • The Deep Learning Accelerator (DLA) module operates like the controller of the RISC-V Core in the RISC-V Virtual Platform (VP) platform, and the RISC-V Core can control the DLA module by accessing the DLA registers set allocated in the RISC-V memory map area, in which we call the register set as Global Function Set Register (GFSR)

  • We developed the SystemC DLA Simulator that can and efficiently analyze the DLA of an embedded edge system based on RISC-V embedded processor

Read more

Summary

Introduction

DNN in the field of Artificial Intelligence (AI) has been applied to various application fields with the high accurate inference abilities such as object detection, based on the learning of huge data set and rich computational resources. For rapid prototyping of embedded edge device with RISC-V-based DLA, CNN DLA system was designed and implemented with SystemC at ESL level on RISC-Vbased Virtual Platform (VP). It is possible to analyze the required amount of DLA internal buffer/cache according to the dataset, internal parallelism of processing element in DLA architecture, and quantization efficiency by analyzing computational overhead and inference error according to the unit of the model parameter. Because the designed RISC-V-based DLA prototype can run RISC-V software, it is possible to perform an actual DNN or object detection applications such as Darknet [5,18]. We analyzed the performance of DLA according to buffer/cache size for various datasets, parallelism issues of processing element, and quantization efficiency, and identified that computational overhead and inference error according to model parameters can be analyzed

Related Work
SystemC-Based RISC-V VP
CNN DLA Overview
GFSR Register Set in DLA
Data Loader Module and Buffer
CPIPE and APIPE Module
DNN Applications on the RISC-V DLA System
Extention Issues
Verification and Analysis with Experiments
Darknet Running on DLA with RISC-V VP
Buffer Effects in DLA System
Quantization Effect
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call