Convolutional Neural Networks Inference Research Articles

Convolutional Neural Networks (CNNs) exhibit significant performance enhancements in several machine learning tasks such as surveillance, intelligent transportation, smart grids and healthcare systems. With the proliferation of physical things being connected to internet and enabled with sensory capabilities to form an Internet of Thing (IoT) network, it is increasingly important to run CNN inference, a computationally intensive application, on the resource constrained IoT devices. Object detection is a fundamental computer vision problem that provides information for image understanding in several artificial intelligence (AI) applications in smart cities. Among various object detection algorithms, CNN has emerged as a new paradigm to improve the overall performance. The Multiply-accumulate (MAC) operations, which are used repeatedly in the convolution layers of CNN, hold extreme computational complexity. Hence, the overall computational workloads and their respective energy consumption of any CNN applications are on the rise. To overcome these escalating challenges, approximate computing mechanism has played a vital role in reducing power and area of computation intensive CNN applications. In this paper, we have designed an approximate MAC architecture, termed Shift and Accumulator Unit (SAC), for the error-resilient CNN based object detection algorithm targeting embedded platforms. The proposed computing unit deliberately trades accuracy to reduce design complexity and power consumption, thus suiting the resource constrained IoT devices. The pipeline architecture of the SAC unit saves approximately 1.8× clock cycles than the non-pipeline SAC architecture. The performance evaluation shows that the proposed computing unit has better energy efficiency and resource utilization than the accurate multiplier and state-of-the-art approximate multipliers without noticeable deterioration in overall performance.

In recent years, research in the space community has shown a growing interest in Artificial Intelligence (AI), mostly driven by systems miniaturization and commercial competition. In particular, the application of Deep Learning (DL) techniques on board Earth Observation (EO) satellites might lead to numerous advantages in terms of mitigation of downlink bandwidth constraints, costs, and increment of the satellite autonomy. In this framework, the CloudScout project, funded by the European Space Agency (ESA), represents the first time in-orbit demonstration of a Convolutional Neural Network (CNN) applied to hyperspectral images for cloud detection. The first instance of this use case has been done with an INTEL Myriad 2 VPU on board a CubeSat optimized for low cost, size, and power efficiency. Nevertheless, this solution introduces multiple drawbacks due to its design not specifically being for the space environment, thus limiting its applicability to short-lifetime Low Earth Orbit (LEO) applications. The current work provides a benchmark between the Myriad 2 and our custom hardware accelerator designed for Field Programmable Gate Arrays (FPGAs). The metrics used for comparison include inference time, power consumption, space qualification, and components. The obtained results show that the FPGA-based solution is characterized by a reduced inference time, and a higher possibility of customization, but at the cost of greater power consumption and a longer Time to Market. As a conclusion, the proposed approach might extend the potential market of DL-based solutions to long-term LEO or interplanetary exploration missions through deployment on space-qualified FPGAs, with a limited cost in energy efficiency.

Convolutional Neural Networks Inference Research Articles

Related Topics

Articles published on Convolutional Neural Networks Inference

One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

Sparse convolutional neural network acceleration with lossless input feature map compression for resource‐constrained systems

Towards Edge Computing Using Early-Exit Convolutional Neural Networks

A Multi-Cache System for On-Chip Memory Optimization in FPGA-Based CNN Accelerators

Real-time on-site inspection system for power transmission based on heterogeneous computing

Simulating quantized inference on convolutional neural networks

Investigating data representation for efficient and reliable Convolutional Neural Networks

Accelerating Inference of Convolutional Neural Networks Using In-memory Computing.

A low‐cost compensated approximate multiplier for Bfloat16 data processing on convolutional neural network inference

An efficient loop tiling framework for convolutional neural network inference accelerators

Area and energy efficient shift and accumulator unit for object detection in IoT applications

Methods for Preventing Visual Attacks in Convolutional Neural Networks Based on Data Discard and Dimensionality Reduction

SWM: A High-Performance Sparse-Winograd Matrix Multiplication CNN Accelerator

Toward Multi-FPGA Acceleration of the Neural Networks

S-CNN-ESystem: An end-to-end embedded CNN inference system with low hardware cost and hardware-software time-balancing

An FPGA-Based Hardware Accelerator for CNNs Inference on Board Satellites: Benchmarking with Myriad 2-Based Solution for the CloudScout Case Study

A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems

DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices

Mixed-Clipping Quantization for Convolutional Neural Networks

DeepSlicing: Collaborative and Adaptive CNN Inference With Low Latency

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Convolutional Neural Networks Inference Research Articles

Related Topics

Articles published on Convolutional Neural Networks Inference

One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

Sparse convolutional neural network acceleration with lossless input feature map compression for resource‐constrained systems

Towards Edge Computing Using Early-Exit Convolutional Neural Networks

A Multi-Cache System for On-Chip Memory Optimization in FPGA-Based CNN Accelerators

Real-time on-site inspection system for power transmission based on heterogeneous computing

Simulating quantized inference on convolutional neural networks

Investigating data representation for efficient and reliable Convolutional Neural Networks

Accelerating Inference of Convolutional Neural Networks Using In-memory Computing.

A low‐cost compensated approximate multiplier for Bfloat16 data processing on convolutional neural network inference

An efficient loop tiling framework for convolutional neural network inference accelerators

Area and energy efficient shift and accumulator unit for object detection in IoT applications

Methods for Preventing Visual Attacks in Convolutional Neural Networks Based on Data Discard and Dimensionality Reduction

SWM: A High-Performance Sparse-Winograd Matrix Multiplication CNN Accelerator

Toward Multi-FPGA Acceleration of the Neural Networks

S-CNN-ESystem: An end-to-end embedded CNN inference system with low hardware cost and hardware-software time-balancing

An FPGA-Based Hardware Accelerator for CNNs Inference on Board Satellites: Benchmarking with Myriad 2-Based Solution for the CloudScout Case Study

A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems

DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices

Mixed-Clipping Quantization for Convolutional Neural Networks

DeepSlicing: Collaborative and Adaptive CNN Inference With Low Latency