Neural Network Inference Research Articles

The ability of resistive memory (ReRAM) to naturally conduct vector–matrix multiplication (VMM), which is the primary operation carried out during the training and inference of neural networks, has caught the interest of researchers. The memristor crossbar is one of the desirable architectures to perform VMM because it offers various benefits over other memory technologies, including in-memory computing, low power, and high density. Direct downloading and chip-on-the-loop approaches are typically used to train ReRAM-based neural networks. In these methods, all weight computations are carried out by a host machine, and the computed weights are downloaded in the crossbar. It has been seen that the network does not deliver the same precision as promised by the host system once the weights have been downloaded. This is because crossbars contain a significant number of faulty memristors and suffer from cell resistance variations because of immature manufacturing technologies. As a result, a cell may not be able to take the exact weight values that the host system generates, and may lead to incorrect inferences. Existing techniques for fault-tolerant mapping either involve network retraining or employ a graph-matching strategy that comes with hardware, power, and latency overheads. In this paper, we propose a mapping method to tolerate the effect of defective memristors. In order to lessen the impact of faulty memristors, the mapping is done in a way that allows network weights to cover up faulty memristors. Further, this work prioritizes the different faults based on the frequency of occurrence. The mapping efficiency is found to increase significantly with low power, area and latency overheads in the proposed approach. Experimental analyses show considerable improvement as compared to state-of-the-art works.

Applications such as autonomous driving or assistive robotics heavily rely on the usage of Deep Neural Networks. In particular, Convolutional Neural Networks (CNNs) provide precise and reliable results in image processing tasks like camera-based object detection or semantic segmentation. However, to achieve even better results, CNNs are becoming more and more complex. Deploying these networks in distributed embedded systems thereby imposes new challenges, due to additional constraints regarding performance and energy consumption in the near-sensor compute platforms, i.e. the sensor nodes. Processing all data in the central node, however, is disadvantageous since raw data of camera consumes large bandwidth and running CNN inference of multiple tasks requires certain performance. Moreover, sending raw data over the interconnect is not advisable for privacy reasons. Hence, offloading CNN workload to the sensor nodes in the system can lead to reduced traffic on the link and a higher level of data security.However, due to the limited hardware-resources on the sensor nodes, partitioning CNNs has to be done carefully to meet overall latency requirements and energy constraints. Therefore, we present CNNParted, an open-source framework for efficient, hardware-aware CNN inference partitioning targeting embedded AI applications. It automatically searches for potential partitioning points in the CNN to find a beneficial workload distribution between sensor nodes and a central edge node. Thereby, CNNParted not only analyzes the CNN architecture but also takes hardware components, such as dedicated hardware accelerators and memories, into consideration to evaluate inference partitioning regarding latency and energy consumption.Exemplary, we apply CNNParted to three commonly used feed forward CNNs in embedded systems. Thereby, the framework first searches for several potential partitioning points and then evaluates the latter regarding inference latency and energy consumption. Based on the results, beneficial partitioning points can be identified depending on the system constraints. Using the framework, we are able to find and evaluate 10 potential partitioning points for FCN ResNet-50, 13 partitioning points for GoogLeNet, and 8 partitioning points for SqueezeNet V1.1 within 520 s, 330 s, and 140 s, respectively, on an AMD EPYC 7702P running 8 concurrent threads. For GoogLeNet, we determine two partitioning points that provide a good trade-off between required bandwidth, latency and energy consumption. We also provide insights into further interesting findings that can be derived from the evaluation results.

Neural Network Inference Research Articles

Related Topics

Articles published on Neural Network Inference

Homomorphic inference of deep neural networks for zero-knowledge verification of nuclear warheads

MJOA-MU: End-to-edge collaborative computation for DNN inference based on model uploading

Optical neural network via loose neuron array and functional learning

PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference

Throughput Maximization of Delay-Aware DNN Inference in Edge Computing by Exploring DNN Model Partitioning and Inference Parallelism

A Mapping Method Tolerating SAF and Variation for Memristor Crossbar Array Based Neural Network Inference on Edge Devices

Deep Learning Accelerated Design of Mechanically Efficient Architected Materials

Gradient distribution-aware INT8 training for neural networks

Probability propagation for faster and efficient point cloud segmentation using a neural network

B-LNN: Inference-time linear model for secure neural network inference

Efficient grouping approach for fault tolerant weight mapping in memristive crossbar array

LHDNN: Maintaining High Precision and Low Latency Inference of Deep Neural Networks on Encrypted Data

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

CNNParted: An open source framework for efficient Convolutional Neural Network inference partitioning in embedded systems

Intrinsic resistive switching in ultrathin SiOx memristors for neuromorphic inference accelerators

Automated Exploration and Implementation of Distributed CNN Inference at the Edge

A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm

Study of Intelligent Fire Identification System Based on Back Propagation Neural Network

Model-driven Cluster Resource Management for AI Workloads in Edge Clouds

Prediction Analysis of Surface Roughness of Aluminum Al6061 in End Milling CNC Machine Using Soft Computing Techniques

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Neural Network Inference Research Articles

Related Topics

Articles published on Neural Network Inference

Homomorphic inference of deep neural networks for zero-knowledge verification of nuclear warheads

MJOA-MU: End-to-edge collaborative computation for DNN inference based on model uploading

Optical neural network via loose neuron array and functional learning

PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference

Throughput Maximization of Delay-Aware DNN Inference in Edge Computing by Exploring DNN Model Partitioning and Inference Parallelism

A Mapping Method Tolerating SAF and Variation for Memristor Crossbar Array Based Neural Network Inference on Edge Devices

Deep Learning Accelerated Design of Mechanically Efficient Architected Materials

Gradient distribution-aware INT8 training for neural networks

Probability propagation for faster and efficient point cloud segmentation using a neural network

B-LNN: Inference-time linear model for secure neural network inference

Efficient grouping approach for fault tolerant weight mapping in memristive crossbar array

LHDNN: Maintaining High Precision and Low Latency Inference of Deep Neural Networks on Encrypted Data

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

CNNParted: An open source framework for efficient Convolutional Neural Network inference partitioning in embedded systems

Intrinsic resistive switching in ultrathin SiOx memristors for neuromorphic inference accelerators

Automated Exploration and Implementation of Distributed CNN Inference at the Edge

A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm

Study of Intelligent Fire Identification System Based on Back Propagation Neural Network

Model-driven Cluster Resource Management for AI Workloads in Edge Clouds

Prediction Analysis of Surface Roughness of Aluminum Al6061 in End Milling CNC Machine Using Soft Computing Techniques