Survey of deployment locations and underlying hardware architectures for contemporary deep neural networks

Miloš Kotlar,Veljko Milutinović,Marija Punt,Dragan Bojić

doi:10.1177/1550147719868669

Abstract

This article overviews the emerging use of deep neural networks in data analytics and explores which type of underlying hardware and architectural approach is best used in various deployment locations when implementing deep neural networks. The locations which are discussed are in the cloud, fog, and dew computing (dew computing is performed by end devices). Covered architectural approaches include multicore processors (central processing unit), manycore processors (graphics processing unit), field programmable gate arrays, and application-specific integrated circuits. The proposed classification in this article divides the existing solutions into 12 different categories, organized in two dimensions. The proposed classification allows a comparison of existing architectures, which are predominantly cloud-based, and anticipated future architectures, which are expected to be hybrid cloud-fog-dew architectures for applications in Internet of Things and Wireless Sensor Networks. Researchers interested in studying trade-offs among data processing bandwidth, data processing latency, and processing power consumption would benefit from the classification made in this article.

Highlights

Over the last several years, with proliferation of data and devices, machine learning algorithms have become unavoidable in almost every aspect of human life,[1] especially when deep learning (DL) techniques are used
Training and inference of deep neural networks (DNNs) could be done in the cloud, with enormous computational power, as well as hundreds of kilometers away from data sources, with a less strong computational power in the fog, but much closer to data sources, and in the dew, at the source of data, where processing is closely coupled with
In our search, which we conducted using Google Scholar database, we focused on keywords related to DNNs and hardware acceleration

Summary

Introduction

Over the last several years, with proliferation of data and devices, machine learning algorithms have become unavoidable in almost every aspect of human life,[1] especially when deep learning (DL) techniques are used. This problem, many new developments are switching from the control-flow paradigm to the dataflow paradigm.[32,33] Cloud services which rely on FPGA, such as Baidu XPU, still cannot offer performance of today’s GPUs or TPUs (tensor processing units), but offer better energy efficiency for training and inference of DNNs. Companies like Baidu, Intel, and Microsoft have cloud services which rely on FPGAs, which under certain conditions of interest could achieve better performance per watt. In terms of performance on ResNet-50 (CNN), two cloud-based services, Google TPU and Amazon EC2, which rely on TPUv2 ASIC and Nvidia V100 GPU, are

43 Google TPUv2 43 Nvidia V100

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Distributed Sensor Networks	Publication Date: Aug 1, 2019
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Survey of deployment locations and underlying hardware architectures for contemporary deep neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Distributed Sensor Networks

Lead the way for us

Similar Papers

A Survey of Deep Neural Networks: Deployment Location and Underlying Hardware
Milos Kotlar ... Marija Punt
-
Milos Kotlar, et. al.Milos Kotlar ... Marija Punt
01 Nov 2018
01 Nov 2018

An architecture-level analysis on deep learning models for low-impact computations
Hengyi Li ... Zhichen Wang
Artificial Intelligence Review | VOL. 56
Hengyi Li, et. al.Hengyi Li ... Zhichen Wang
26 Jun 2022
Artificial Intelligence Review | VOL. 56

Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL
Corey Lammie ... Mostafa Rahimi Azghadi
-
Corey Lammie, et. al.Corey Lammie ... Mostafa Rahimi Azghadi
15 May 2019
15 May 2019

Course Design and Development: Essential Technologies, Applications, and Commercial Patterns of Internet of Things
Bin Shen ... Weitao Jiang
Journal of Convergence Information Technology | VOL. 8
Bin Shen , et. al.Bin Shen ... Weitao Jiang
28 Feb 2013
Journal of Convergence Information Technology | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Survey of deployment locations and underlying hardware architectures for contemporary deep neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Distributed Sensor Networks