DL-RSIM: A Reliability and Deployment Strategy Simulation Framework for ReRAM-based CNN Accelerators
Memristor-based deep learning accelerators provide a promising solution to improve the energy efficiency of neuromorphic computing systems. However, the electrical properties and crossbar structure of memristors make these accelerators error-prone. In addition, due to the hardware constraints, the way to deploy neural network models on memristor crossbar arrays affects the computation parallelism and communication overheads. To enable reliable and energy-efficient memristor-based accelerators, a simulation platform is needed to precisely analyze the impact of non-ideal circuit/device properties on the inference accuracy and the influence of different deployment strategies on performance and energy consumption. In this paper, we propose a flexible simulation framework, DL-RSIM, to tackle this challenge. A rich set of reliability impact factors and deployment strategies are explored by DL-RSIM, and it can be incorporated with any deep learning neural networks implemented by TensorFlow. Using several representative convolutional neural networks as case studies, we show that DL-RSIM can guide chip designers to choose a reliability-friendly design option and energy-efficient deployment strategies and develop optimization techniques accordingly.
- Conference Article
81
- 10.1145/3240765.3240800
- Nov 5, 2018
Memristor-based deep learning accelerators provide a promising solution to improve the energy efficiency of neuromorphic computing systems. However, the electrical properties and crossbar structure of memristors make these accelerators error-prone. To enable reliable memristor-based accelerators, a simulation platform is needed to precisely analyze the impact of non-ideal circuit and device properties on the inference accuracy. In this paper, we propose a flexible simulation framework, DL-RSIM, to tackle this challenge. DL-RSIM simulates the error rates of every sum-of-products computation in the memristor-based accelerator and injects the errors in the targeted TensorFlow-based neural network model. A rich set of reliability impact factors are explored by DL-RSIM, and it can be incorporated with any deep learning neural network implemented by TensorFlow. Using three representative convolutional neural networks as case studies, we show that DL-RSIM can guide chip designers to choose a reliability-friendly design option and develop reliability optimization techniques.
- Research Article
5
- 10.21271/zjpas.34.2.3
- Apr 12, 2022
- ZANCO JOURNAL OF PURE AND APPLIED SCIENCES
Comprehensive Study for Breast Cancer Using Deep Learning and Traditional Machine Learning
- Research Article
1
- 10.1049/el.2019.2376
- Sep 1, 2019
- Electronics Letters
GenSyth: a new way to understand deep learning
- Research Article
169
- 10.1142/s0219530518500124
- Nov 1, 2018
- Analysis and Applications
Deep learning based on structured deep neural networks has provided powerful applications in various fields. The structures imposed on the deep neural networks are crucial, which makes deep learning essentially different from classical schemes based on fully connected neural networks. One of the commonly used deep neural network structures is generated by convolutions. The produced deep learning algorithms form the family of deep convolutional neural networks. Despite of their power in some practical domains, little is known about the mathematical foundation of deep convolutional neural networks such as universality of approximation. In this paper, we propose a family of new structured deep neural networks: deep distributed convolutional neural networks. We show that these deep neural networks have the same order of computational complexity as the deep convolutional neural networks, and we prove their universality of approximation. Some ideas of our analysis are from ridge approximation, wavelets, and learning theory.
- Research Article
1
- 10.1002/aelm.202400106
- May 28, 2024
- Advanced Electronic Materials
Memristor, with the ability of analog computing, is widely investigated for improving the computing efficiency of deep neural networks (DNNs) deployment. However, how to fully take advantage of the analog computing ability of memristive computing system (MCS) for DNN deployment is still an open question. Here, a new neural network models deployment scheme, that is, an information dimension matching (IDM) scheme, is proposed to fully take advantage of the analog computing ability of MCS. Furthermore, the spatial and temporal DNN, that is convolutional neural network (CNN) and recurrent neural network (RNN) is used to verify the proposed deployment scheme, respectively. The experimental results indicate that, compared to the traditional deployment schemes, the proposed deployment scheme shows obvious inference accuracy and energy efficiency improvement (>4 × in four‐layer DNNs deployment), and the energy efficiency improvement increases dramatically with the layers increment of DNNs. This work paves the path for developing high computing efficiency analog MCS.
- Research Article
20
- 10.2144/fsoa-2022-0010
- Mar 8, 2022
- Future science OA
Artificial intelligence in interdisciplinary life science and drug discovery research.
- Conference Article
8
- 10.5555/3199700.3199802
- Nov 13, 2017
Neural Networks (NNs) have recently gained popularity in a wide range of modern application domains due to its superior inference accuracy. With growing problem size and complexity, modern NNs, e.g., CNNs (Convolutional NNs) and DNNs (Deep NNs), contain a large number of weights, which require tremendous efforts not only to prepare representative training datasets but also to train the network. There is an increasing demand to protect the NN weight matrices, an emerging Intellectual Property (IP) in NN field. Unfortunately, adopting conventional encryption method faces significant performance and energy consumption overheads. In this paper, we propose AEP, a DianNao based NN accelerator design for IP protection. AEP aggressively reduces DRAM timing to generate a device dependent error mask, i.e., a set of erroneous cells while the distribution of these cells are device dependent due to process variations. AEP incorporates the error mask in the NN training process so that the trained weights are device dependent, which effectively defects IP piracy as exporting the weights to other devices cannot produce satisfactory inference accuracy. In addition, AEP speeds up NN inference and achieves significant energy reduction due to the fact that main memory dominates the energy consumption in DianNao accelerator. Our evaluation results show that by injecting 0.1% to 5% memory errors, AEP has negligible inference accuracy loss on the target device while exhibiting unacceptable accuracy degradation on other devices. In addition, AEP achieves an average of 72% performance improvement and 44% energy reduction over the DianNao baseline.
- Conference Article
4
- 10.1109/iccad.2017.8203854
- Nov 1, 2017
Neural Networks (NNs) have recently gained popularity in a wide range of modern application domains due to its superior inference accuracy. With growing problem size and complexity, modern NNs, e.g., CNNs (Convolutional NNs) and DNNs (Deep NNs), contain a large number of weights, which require tremendous efforts not only to prepare representative training datasets but also to train the network. There is an increasing demand to protect the NN weight matrices, an emerging Intellectual Property (IP) in NN field. Unfortunately, adopting conventional encryption method faces significant performance and energy consumption overheads. In this paper, we propose AEP, a DianNao based NN accelerator design for IP protection. AEP aggressively reduces DRAM timing to generate a device dependent error mask, i.e., a set of erroneous cells while the distribution of these cells are device dependent due to process variations. AEP incorporates the error mask in the NN training process so that the trained weights are device dependent, which effectively defects IP piracy as exporting the weights to other devices cannot produce satisfactory inference accuracy. In addition, AEP speeds up NN inference and achieves significant energy reduction due to the fact that main memory dominates the energy consumption in DianNao accelerator. Our evaluation results show that by injecting 0.1% to 5% memory errors, AEP has negligible inference accuracy loss on the target device while exhibiting unacceptable accuracy degradation on other devices. In addition, AEP achieves an average of 72% performance improvement and 44% energy reduction over the DianNao baseline.
- Conference Article
1
- 10.1145/3422575.3422774
- Sep 28, 2020
Design of edge devices is driven by the need for the lowest possible cost and energy consumption. Both of these are strongly affected by on-chip memories as they often constitute a large fraction of embedded processors. One way to reduce energy consumption is by reducing the supply voltage. However, this causes memory cell hard fault rates to rise exponentially, thus degrading yield at low voltage and increasing cost. Also the weaker memory cells often lead to worsened chip yield and mean-time-to-failure. Deep learning neural network applications constitute a significant fraction of the workloads that are run today on these low cost embedded devices. Despite the inherent resilience of most of these deep learning applications, inference accuracy degrades significantly at high fault rates. We propose SAME-Infer, a software assisted memory resilience technique for efficient inference at the edge. It is a fault-aware linking methodology for software managed embedded memories to efficiently map the critical code/layers onto the non-faulty segments of the memory and the non-critical fault tolerant sections in the faulty or error-prone memory segments. This is done in a way such that memory hard faults can be tolerated and voltage be lowered without degrading the accuracy (SAME inference accuracy at lower voltage/higher error rate). Our evaluation on 10 real microcontroller class chips shows that more than 175mV reduction in voltage can be achieved without any loss in accuracy for a variety of neural networks. SAME-Infer can also be considered as an efficient fault tolerance/in-field repair technique as it tolerates on average 25x (upto 350x) increase in bit error rate with minimal impact on inference accuracy.
- Conference Article
14
- 10.1145/3404397.3404408
- Aug 17, 2020
Deep neural networks (DNNs) have gained considerable attention in various real-world applications due to the strong performance on representation learning. However, a DNN needs to be trained many epochs for pursuing a higher inference accuracy, which requires storing sequential versions of DNNs and releasing the updated versions to users. As a result, large amounts of storage and network resources are required, significantly hampering DNN utilization on resource-constrained platforms (e.g., IoT, mobile phone). In this paper, we present a novel delta compression framework called Delta-DNN, which can efficiently compress the float-point numbers in DNNs by exploiting the floats similarity existing in DNNs during training. Specifically, (1) we observe the high similarity of float-point numbers between the neighboring versions of a neural network in training; (2) inspired by delta compression technique, we only record the delta (i.e., the differences) between two neighboring versions, instead of storing the full new version for DNNs; (3) we use the error-bounded lossy compression to compress the delta data for a high compression ratio, where the error bound is strictly assessed by an acceptable loss of DNNs’ inference accuracy; (4) we evaluate Delta-DNN’s performance on two scenarios, including reducing the transmission of releasing DNNs over the network and saving the storage space occupied by multiple versions of DNNs. According to experimental results on six popular DNNs, Delta-DNN achieves the compression ratio 2 × -10 × higher than state-of-the-art methods, while without sacrificing inference accuracy and changing the neural network structure.
- Research Article
4
- 10.3150/22-bej1553
- Aug 1, 2023
- Bernoulli
In modern deep learning, there is a recent and growing literature on the interplay between large-width asymptotics for deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed weights, and classes of Gaussian stochastic processes (SPs). Such an interplay has proved to be critical in several contexts of practical interest, e.g. Bayesian inference under Gaussian SP priors, kernel regression for infinite-wide deep NNs trained via gradient descent, and information propagation within infinite-wide NNs. Motivated by empirical analysis, showing the potential of replacing Gaussian distributions with Stable distributions for the NN's weights, in this paper we investigate large-width asymptotics for (fully connected) feed-forward deep Stable NNs, i.e. deep NNs with Stable-distributed weights. First, we show that as the width goes to infinity jointly over the NN's layers, a suitable rescaled deep Stable NN converges weakly to a Stable SP whose distribution is characterized recursively through the NN's layers. Because of the non-triangular NN's structure, this is a non-standard asymptotic problem, to which we propose a novel and self-contained inductive approach, which may be of independent interest. Then, we establish sup-norm convergence rates of a deep Stable NN to a Stable SP, quantifying the critical difference between the settings of ``joint growth and ``sequential growth of the width over the NN's layers. Our work extends recent results on infinite-wide limits for deep Gaussian NNs to the more general deep Stable NNs, providing the first result on convergence rates for infinite-wide deep NNs.
- Conference Article
123
- 10.1109/codesisss.2015.7331375
- Oct 1, 2015
Deep neural networks (DNNs) have recently proved their effectiveness in complex data analyses such as object/speech recognition. As their applications are being expanded to mobile devices, their energy efficiencies are becoming critical. In this paper, we propose a novel concept called big/LITTLE DNN (BL-DNN) which significantly reduces energy consumption required for DNN execution at a negligible loss of inference accuracy. The BL-DNN consists of a little DNN (consuming low energy) and a full-fledged big DNN. In order to reduce energy consumption, the BL-DNN aims at avoiding the execution of the big DNN whenever possible. The key idea for this goal is to execute the little DNN first for inference (without big DNN execution) and simply use its result as the final inference result as long as the result is estimated to be accurate. On the other hand, if the result from the little DNN is not considered to be accurate, the big DNN is executed to give the final inference result. This approach reduces the total energy consumption by obtaining the inference result only with the little, energy-efficient DNN in most cases, while maintaining the similar level of inference accuracy through selectively utilizing the big DNN execution. We present design-time and runtime methods to control the execution of big DNN under a trade-off between energy consumption and inference accuracy. Experiments with state-of-the-art DNNs for ImageNet and MNIST show that our proposed BL-DNN can offer up to 53.7% (ImageNet) and 94.1% (MNIST) reductions in energy consumption at a loss of 0.90% (ImageNet) and 0.12% (MNIST) in inference accuracy, respectively.
- Conference Article
75
- 10.5555/2830840.2830854
- Oct 4, 2015
Deep neural networks (DNNs) have recently proved their effectiveness in complex data analyses such as object/speech recognition. As their applications are being expanded to mobile devices, their energy efficiencies are becoming critical. In this paper, we propose a novel concept called big/LITTLE DNN (BL-DNN) which significantly reduces energy consumption required for DNN execution at a negligible loss of inference accuracy. The BL-DNN consists of a little DNN (consuming low energy) and a full-fledged big DNN. In order to reduce energy consumption, the BL-DNN aims at avoiding the execution of the big DNN whenever possible. The key idea for this goal is to execute the little DNN first for inference (without big DNN execution) and simply use its result as the final inference result as long as the result is estimated to be accurate. On the other hand, if the result from the little DNN is not considered to be accurate, the big DNN is executed to give the final inference result. This approach reduces the total energy consumption by obtaining the inference result only with the little, energy-efficient DNN in most cases, while maintaining the similar level of inference accuracy through selectively utilizing the big DNN execution. We present design-time and runtime methods to control the execution of big DNN under a trade-off between energy consumption and inference accuracy. Experiments with state-of-the-art DNNs for ImageNet and MNIST show that our proposed BL-DNN can offer up to 53.7% (ImageNet) and 94.1% (MNIST) reductions in energy consumption at a loss of 0.90% (ImageNet) and 0.12% (MNIST) in inference accuracy, respectively.
- Research Article
4
- 10.30837/itssi.2021.15.014
- Mar 31, 2021
- Innovative Technologies and Scientific Solutions for Industries
The subject of the research is the methods of constructing and training neural networks as a nonlinear modeling apparatus for solving the problem of predicting the energy consumption of metallurgical enterprises. The purpose of this work is to develop a model for forecasting the consumption of the power system of a metallurgical enterprise and its experimental testing on the data available for research of PJSC "Dneprospetsstal". The following tasks have been solved: analysis of the time series of power consumption; building a model with the help of which data on electricity consumption for a historical period is processed; building the most accurate forecast of the actual amount of electricity for the day ahead; assessment of the forecast quality. Methods used: time series analysis, neural network modeling, short-term forecasting of energy consumption in the metallurgical industry. The results obtained: to develop a model for predicting the energy consumption of a metallurgical enterprise based on artificial neural networks, the MATLAB complex with the Neural Network Toolbox was chosen. When conducting experiments, based on the available statistical data of a metallurgical enterprise, a selection of architectures and algorithms for learning neural networks was carried out. The best results were shown by the feedforward and backpropagation network, architecture with nonlinear autoregressive and learning algorithms: Levenberg-Marquard nonlinear optimization, Bayesian Regularization method and conjugate gradient method. Another approach, deep learning, is also considered, namely the neural network with long short-term memory LSTM and the adam learning algorithm. Such a deep neural network allows you to process large amounts of input information in a short time and build dependencies with uninformative input information. The LSTM network turned out to be the most effective among the considered neural networks, for which the indicator of the maximum prediction error had the minimum value. Conclusions: analysis of forecasting results using the developed models showed that the chosen approach with experimentally selected architectures and learning algorithms meets the necessary requirements for forecast accuracy when developing a forecasting model based on artificial neural networks. The use of models will allow automating high-precision operational hourly forecasting of energy consumption in market conditions.
 Keywords: energy consumption; forecasting; artificial neural network; time series.
- Book Chapter
1
- 10.1017/9781316408032.007
- Jan 1, 2017
Deep learning (also known as deep structured learning, hierarchical learning, or deep machine learning) is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and nonlinear transformations. Deep learning has been characterized as a class of machine learning algorithms with the following characteristics [257]: • They use a cascade of many layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. The algorithms may be supervised or unsupervised and applications include pattern analysis (unsupervised) and classification (supervised). • They are based on the (unsupervised) learning of multiple levels of features or representations of the data. Higher-level features are derived from lower-level features to form a hierarchical representation. • They are part of the broader machine learning field of learning representations of data. • They learn multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts. These definitions have in common: multiple layers of nonlinear processing units and the supervised or unsupervised learning of feature representations in each layer, with the layers forming a hierarchy from low-level to high-level features. Various deep learning architectures such as deep neural networks, convolutional deep neural networks, deep belief networks (DBN), and recurrent neural networks have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition, and bioinformatics where they have been shown to produce state-of-the-art results on various tasks. In this chapter we start in Section 7.1 with an introduction, giving a brief history of this field, the relevant literature, and its applications. Then we study some basic concepts of deep learning such as convolutional neural networks, recurrent neural networks, backpropagation algorithm, restricted Boltzmann machines, and deep learning networks in Section 7.2. Then we illustrate three examples for Apache Spark implementation for mobile big data (MBD), user moving pattern extraction, and combination with nonparametric Bayesian learning, respectively, in Sections 7.3 through 7.5. Finally, we have summary in Section 7.6.