Deep learning accelerators: a case study with MAESTRO

Hamidreza Bolhasani,Somayyeh Jafarali Jassbi

doi:10.1186/s40537-020-00377-8

Hamidreza Bolhasani, Somayyeh Jafarali Jassbi

Open Access

https://doi.org/10.1186/s40537-020-00377-8

Copy DOI

Abstract

In recent years, deep learning has become one of the most important topics in computer sciences. Deep learning is a growing trend in the edge of technology and its applications are now seen in many aspects of our life such as object detection, speech recognition, natural language processing, etc. Currently, almost all major sciences and technologies are benefiting from the advantages of deep learning such as high accuracy, speed and flexibility. Therefore, any efforts in improving performance of related techniques is valuable. Deep learning accelerators are considered as hardware architecture, which are designed and optimized for increasing speed, efficiency and accuracy of computers that are running deep learning algorithms. In this paper, after reviewing some backgrounds on deep learning, a well-known accelerator architecture named MAERI (Multiply-Accumulate Engine with Reconfigurable interconnects) is investigated. Performance of a deep learning task is measured and compared in two different data flow strategies: NLR (No Local Reuse) and NVDLA (NVIDIA Deep Learning Accelerator), using an open source tool called MAESTRO (Modeling Accelerator Efficiency via Spatio-Temporal Resource Occupancy). Measured performance indicators of novel optimized architecture, NVDLA shows higher L1 and L2 computation reuse, and lower total runtime (cycles) in comparison to the other one.

Highlights

The main idea of neural networks (NN) is based on biological neural system structure, which consists of several connected elements named neurons [1]
Hyoukjun et al [15] proposed a novel architecture named Multiply-accumulate engine with reconfigurable interconnects (MAERI) (Multiply-Accumulate Engine with Reconfigurable Interconnects), which is reconfigurable and employs Augmented reduction tree (ART) (Augmented Reduction Tree) which showed 8 ~ 459% better utilization for different data flows over a strict network-on-chip (NoC) fabric
LayerFile: Including the information related to the layers of neural network

Summary

Introduction

The main idea of neural networks (NN) is based on biological neural system structure, which consists of several connected elements named neurons [1]. Neural networks are made up of artificial neurons for handling brain tasks like learning, recognition and optimization. In this structure, the nodes are neurons, links can be considered as synapses and biases as activation thresholds [2]. Each layer extracts some information related to the features and forwards them with a weight to the layer. Output is the sum of all these information gains multiplied by their related weights.

Methods

Results

Conclusion