Dot Product Operation Research Articles

At the core of any inference procedure, deep neural networks are dot product operations, which are the component that requires the highest computational resources. For instance, deep neural networks, such as VGG-16, require up to 15-G operations in order to perform the dot products present in a single forward pass, which results in significant energy consumption and thus limits their use in resource-limited environments, e.g., on embedded devices or smartphones. One common approach to reduce the complexity of the inference is to prune and quantize the weight matrices of the neural network. Usually, this results in matrices whose entropy values are low, as measured relative to the empirical probability mass distribution of its elements. In order to efficiently exploit such matrices, one usually relies on, inter alia, sparse matrix representations. However, most of these common matrix storage formats make strong statistical assumptions about the distribution of the elements; therefore, cannot efficiently represent the entire set of matrices that exhibit low-entropy statistics (thus, the entire set of compressed neural network weight matrices). In this paper, we address this issue and present new efficient representations for matrices with low-entropy statistics. Alike sparse matrix data structures, these formats exploit the statistical properties of the data in order to reduce the size and execution complexity. Moreover, we show that the proposed data structures can not only be regarded as a generalization of sparse formats but are also more energy and time efficient under practically relevant assumptions. Finally, we test the storage requirements and execution performance of the proposed formats on compressed neural networks and compare them to dense and sparse representations. We experimentally show that we are able to attain up to ×42 compression ratios, ×5 speed ups, and ×90 energy savings when we lossless convert the state-of-the-art networks, such as AlexNet, VGG-16, ResNet152, and DenseNet, into the new data structures and benchmark their respective dot product.

A number of recent efforts have attempted to design accelerators for popular machine learning algorithms, such as those involving convolutional and deep neural networks (CNNs and DNNs). These algorithms typically involve a large number of multiply-accumulate (dot-product) operations. A recent project, DaDianNao, adopts a near data processing approach, where a specialized neural functional unit performs all the digital arithmetic operations and receives input weights from adjacent eDRAM banks. This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner. While the use of crossbar memory as an analog dot-product engine is well known, no prior work has designed or characterized a full-fledged accelerator based on crossbars. In particular, our work makes the following contributions: (i) We design a pipelined architecture, with some crossbars dedicated for each neural network layer, and eDRAM buffers that aggregate data between pipeline stages. (ii) We define new data encoding techniques that are amenable to analog computations and that can reduce the high overheads of analog-to-digital conversion (ADC). (iii) We define the many supporting digital components required in an analog CNN accelerator and carry out a design space exploration to identify the best balance of memristor storage/compute, ADCs, and eDRAM storage on a chip. On a suite of CNN and DNN workloads, the proposed ISAAC architecture yields improvements of 14.8×, 5.5×, and 7.5× in throughput, energy, and computational density (respectively), relative to the state-of-the-art DaDianNao architecture.

Dot Product Operation Research Articles

Related Topics

Articles published on Dot Product Operation

Efficient Multiple-Precision Floating-Point Fused Multiply-Add with Mixed-Precision Support

Infrared Dim Target Detection Based on Multi-Feature Fusion

Compact and Computationally Efficient Representation of Deep Neural Networks.

A Robust Digital RRAM-Based Convolutional Block for Low-Power Image Processing and Learning Applications

Dynamic Performance Evaluation of a Redundantly Actuated and Over-constrained Parallel Manipulator

Fast multi-trace impedance inversion using anisotropic total p-variation regularization in the frequency domain

A multiply-add engine with monolithically integrated 3D memristor crossbar/CMOS hybrid circuit

Descent methods for elastic body simulation on the GPU

ISAAC

Vector Symbolic Spiking Neural Network Model of Hippocampal Subarea CA1 Novelty Detection Functionality.

DOA and Polarization Estimation Based on Sparse COLD Array

Novel inverse kinematic approaches for robot manipulators with Pieper-Criterion based geometry

Efficient hardware implementation of PMI+ for low-resource devices in mobile cloud computing

Microstructural and physiological features of tissues elucidated by quantitative-diffusion-tensor MRI

GPU-accelerated fem solver for three dimensional electromagnetic analysis

Multi-functional floating-point MAF designs with dot product support

The Pythagorean Theorem: What Is It About?

Fast Ray-Axis Aligned Bounding Box Overlap Tests with Plucker Coordinates

Validated roundings of dot products by sticky accumulation

Microstructural and Physiological Features of Tissues Elucidated by Quantitative-Diffusion-Tensor MRI

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Dot Product Operation Research Articles

Related Topics

Articles published on Dot Product Operation

Efficient Multiple-Precision Floating-Point Fused Multiply-Add with Mixed-Precision Support

Infrared Dim Target Detection Based on Multi-Feature Fusion

Compact and Computationally Efficient Representation of Deep Neural Networks.

A Robust Digital RRAM-Based Convolutional Block for Low-Power Image Processing and Learning Applications

Dynamic Performance Evaluation of a Redundantly Actuated and Over-constrained Parallel Manipulator

Fast multi-trace impedance inversion using anisotropic total p-variation regularization in the frequency domain

A multiply-add engine with monolithically integrated 3D memristor crossbar/CMOS hybrid circuit

Descent methods for elastic body simulation on the GPU

ISAAC

Vector Symbolic Spiking Neural Network Model of Hippocampal Subarea CA1 Novelty Detection Functionality.

DOA and Polarization Estimation Based on Sparse COLD Array

Novel inverse kinematic approaches for robot manipulators with Pieper-Criterion based geometry

Efficient hardware implementation of PMI+ for low-resource devices in mobile cloud computing

Microstructural and physiological features of tissues elucidated by quantitative-diffusion-tensor MRI

GPU-accelerated fem solver for three dimensional electromagnetic analysis

Multi-functional floating-point MAF designs with dot product support

The Pythagorean Theorem: What Is It About?

Fast Ray-Axis Aligned Bounding Box Overlap Tests with Plucker Coordinates

Validated roundings of dot products by sticky accumulation

Microstructural and Physiological Features of Tissues Elucidated by Quantitative-Diffusion-Tensor MRI