Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing, and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand to enable numerous edge AI applications. This article provides an overview of efficient deep learning methods, systems, and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization, as well as compact model design. To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. We then cover efficient on-device training to enable user customization based on the local data on mobile devices. Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video, and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.

Similar Papers
  • Dissertation
  • 10.25148/etd.fidc008960
Machine Vision, Not Human Vision, Guided Compression Towards Low-Latency and Robust Deep Learning Systems
  • Sep 2, 2021
  • Zihao Liu

Deep Neural Networks (DNNs) have been achieving extraordinary performance across many exciting real-world applications, including image classification, speech recognition, natural language processing, medical diagnosis, self-driving cars, drones, anomaly detection and recognition of voice commands. However, the de facto DNN technique in real life exposes to two critical issues: First, the ever-increasing amounts of data generated from mobile devices, sensors, and the Internet of Things (IoT) challenge the performance of the DNN system. there lack efficient solutions to reduce the power-hungry data offloading and storage on terminal devices like edge sensors, especially in face of the stringent constraints on communication bandwidth, energy, and hardware resources. Second, DNN models are inherently vulnerable to adversarial examples (AEs), i.e.malicious inputs crafted by adding small and human-imperceptible perturbations to normal inputs, strongly fooling the cognitive function of DNNs. Though image compression technique has been explored to mitigate the adversarial examples, however, existing solutions are unable to offer a good balance between the efficiency of removing adversarial perturbation on malicious inputs and classification accuracy on benign samples. This dissertation makes solid strides towards developing low-latency and robust deep learning systems by for the first time leveraging the deep understandings of the image perception difference between human vision and deep learning systems (a.k.a. "machine vision" in this dissertation). In the first part, we propose to develop three types of “machine vision" guided image compression frameworks, dedicated to accelerating both cloud-based deep learning image classification and 3D medical image segmentation with almost zero accuracy drop, by embracing the nature of deep cascaded information process mechanism of DNN architecture. To the best of our knowledge, this is the first effort to systematically re-architecture existing data compression techniques that are centered around the human vision to be machine vision favorable, thereby achieving significant service speed-up. In the second part, we propose a JPEG-based defensive compression framework, namely “feature-distillation”, to effectively rectify adversarial examples without impacting classification accuracy on benign images. Experimental results show that the very low cost “feature-distillation" can deliver the best defense efficiency with negligible accuracy reduction among existing input pre-processing based defense techniques, serving as a new baseline and reference design for future defense methods development.

  • Book Chapter
  • Cite Count Icon 6
  • 10.1016/b978-0-12-822109-9.00013-8
Chapter 4 - Efficient methods for deep learning
  • Jan 1, 2022
  • Advanced Methods and Deep Learning in Computer Vision
  • Han Cai + 2 more

Chapter 4 - Efficient methods for deep learning

  • Research Article
  • Cite Count Icon 6
  • 10.1109/tnnls.2023.3244006
Automatic Learning Rate Adaption for Memristive Deep Learning Systems.
  • Aug 1, 2024
  • IEEE transactions on neural networks and learning systems
  • Yang Zhang + 1 more

As a possible device to further enhance the performance of the hybrid complementary metal oxide semiconductor (CMOS) technology in the hardware, the memristor has attracted widespread attention in implementing efficient and compact deep learning (DL) systems. In this study, an automatic learning rate tuning method for memristive DL systems is presented. Memristive devices are utilized to adjust the adaptive learning rate in deep neural networks (DNNs). The speed of the learning rate adaptation process is fast at first and then becomes slow, which consist of the memristance or conductance adjustment process of the memristors. As a result, no manual tuning of learning rates is required in the adaptive back propagation (BP) algorithm. While cycle-to-cycle and device-to-device variations could be a significant issue in memristive DL systems, the proposed method appears robust to noisy gradients, various architectures, and different datasets. Moreover, fuzzy control methods for adaptive learning are presented for pattern recognition, such that the over-fitting issue can be well addressed. To our best knowledge, this is the first memristive DL system using an adaptive learning rate for image recognition. Another highlight of the presented memristive adaptive DL system is that quantized neural network architecture is utilized, and there is therefore a significant increase in the training efficiency, without the loss of testing accuracy.

  • Dissertation
  • 10.25148/etd.fidc008944
A System-level Perspective Towards Efficient, Reliable and Secure Neural Network Computing
  • Jan 1, 2020
  • Tao Liu

The Digital Era is now evolving into the Intelligence Era, driven overwhelmingly by the revolution of Deep Neural Network (DNN), which opens the door for intelligent data interpretation, turning the data and information into actions that create new capabilities, richer experiences, and unprecedented economic opportunities, achieving game-changing outcomes spanning from image recognition, natural language processing, self-driving cars to biomedical analysis. Moreover, the emergence of deep learning accelerators and neuromorphic computing further pushes DNN computation from cloud to the edge devices for the low-latency scalable on-device neural network computing. However, such promising embedded neural network computing systems are subject to various technical challenges. First, performing high-accurate inference for complex DNNs requires massive amounts of computation and memory resources, causing very limited energy efficiency for existing computing platforms. Even the brain-inspired spiking neuromorphic computing architecture which originates from the more bio-plausible spiking neural network (SNN) and relies on the occurrence frequency of a large number of electrical spikes to represent the data and perform the computation, is subject to significant limitations on both energy efficiency and processing speed. Second, although many memristor-based DNN accelerators and emerging neuromorphic accelerators have been proposed to improve the performance-per-watt of embedded DNN computing with the highly parallelizable Processing-in-Memory (PIM) architecture, one critical challenge faced by these memristor-based designs is their poor reliability. A DNN weight, which is represented as the memristance of a memristor cell, can be easily distorted by the inherent physical limitations of memristor devices, resulting in significant accuracy degradation. Third, DNN computing systems are also subject to ever-increasing security concerns. Attackers can easily fool a normally trained DNN model by exploiting the algorithmic vulnerabilities of DNN classifiers through adversary examples to mislead the inference results. Moreover, system vulnerabilities in open-sourced DNN computing frameworks such as heap overflow are increasingly exploited to either distort the inference accuracy or corrupt the learning environment. This dissertation focuses on designing efficient, reliable, and secured neural network computing systems. An architecture and algorithm co-design approach is presented to address the aforementioned design pillars from a system-level perspective, namely efficiency, reliability and security. Three case study examples centered around each design pillar, including Single-spike Neuromorphic Accelerator, Fault-tolerant DNN Accelerator, and Mal-DNN: Malicious DNN-powered Stegomalware, are discussed in this dissertation, offering the community an alternative thinking about developing more efficient, reliable and secure deep learning systems.

  • Research Article
  • Cite Count Icon 509
  • 10.1016/j.preteyeres.2019.04.003
Deep learning in ophthalmology: The technical and clinical considerations.
  • Apr 29, 2019
  • Progress in Retinal and Eye Research
  • Daniel S.W Ting + 11 more

Deep learning in ophthalmology: The technical and clinical considerations.

  • Research Article
  • Cite Count Icon 97
  • 10.1109/tevc.2023.3252612
Neural Architecture Search Based on a Multi-Objective Evolutionary Algorithm With Probability Stack
  • Aug 1, 2023
  • IEEE Transactions on Evolutionary Computation
  • Yu Xue + 2 more

With the emergence of deep neural networks, many research fields, such as image classification, object detection, speech recognition, natural language processing, machine translation and automatic driving, have made major breakthroughs in technology and the research achievements have been successfully applied in many real-life applications. Combining evolutionary computation and neural architecture search (NAS) is an important approach to improve the performance of deep neural networks. Usually, the related researchers only focus on precision. Thus, the searched neural architectures always perform poorly in the other indexes such as time cost. In this paper, a multi-objective evolutionary algorithm with a probability stack (MOEA-PS) is proposed for NAS, which considers the two objects of precision and time consumption. MOEA-PS uses an adjacency list to represent the internal structure of deep neural networks. Besides, a unique mechanism is introduced into the multi-objective genetic algorithm to guide the process of crossover and mutation when generating offspring. Furthermore, the structure blocks are stacked using a proxy model to generate deep neural networks. The results of the experiments on Cifar-10 and Cifar-100 demonstrate that the proposed algorithm has a similar error rate compared with the most advanced NAS algorithms, but the time cost is lower. Finally, the network structure searched on Cifar-10 is transferred directly to the ImageNet dataset, which can achieve 73.6% classification accuracy.

  • Research Article
  • Cite Count Icon 16
  • 10.1109/tcc.2022.3160129
An Efficient and Robust Cloud-Based Deep Learning With Knowledge Distillation
  • Apr 1, 2023
  • IEEE Transactions on Cloud Computing
  • Zeyi Tao + 3 more

In recent years, deep neural networks have shown extraordinary power in various practical learning tasks, especially in object detection, classification, natural language processing. However, deploying such large models on resource-constrained devices or embedded systems is challenging due to their high computational cost. Efforts such as model partition, pruning, or quantization have been used at the expense of accuracy loss. Knowledge distillation is a technique that transfers model knowledge from a well-trained model (teacher) to a smaller and shallow model (student). Instead of using a learning model on the cloud, we can deploy distilled models on various edge devices, significantly reducing the computational cost, memory usage and prolonging the battery lifetime. In this work, we propose a novel neuron manifold distillation (NMD) method, where the student models imitate the teacher's output distribution and learn the feature geometry of the teacher model. In addition, to further improve the cloud-based learning system reliability, we propose a confident prediction mechanism to calibrate the model predictions. We conduct experiments with different distillation configurations over multiple datasets. Our proposed method demonstrates a consistent improvement in accuracy-speed trade-offs for the distilled model.

  • Research Article
  • Cite Count Icon 25
  • 10.1109/jssc.2022.3179303
A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference With Bayesian Sound Source Separation and Attention-Based DNNs
  • Feb 1, 2023
  • IEEE Journal of Solid-State Circuits
  • Thierry Tambe + 9 more

The proliferation of personal artificial intelligence (AI) -assistant technologies with speech-based conversational AI interfaces is driving the exponential growth in the consumer Internet of Things (IoT) market. As these technologies are being applied to keyword spotting (KWS), automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications, it is of paramount importance that they provide uncompromising performance for context learning in long sequences, which is a key benefit of the attention mechanism, and that they work seamlessly in polyphonic environments. In this work, we present a 25-mm2 system-on-chip (SoC) in 16-nm FinFET technology, codenamed SM6, which executes end-to-end speech-enhancing attention-based ASR and NLP workloads. The SoC includes: 1) FlexASR, a highly reconfigurable NLP inference processor optimized for whole-model acceleration of bidirectional attention-based sequence-to-sequence (seq2seq) deep neural networks (DNNs); 2) a Markov random field source separation engine (MSSE), a probabilistic graphical model accelerator for unsupervised inference via Gibbs sampling, used for sound source separation; 3) a dual-core Arm Cortex A53 CPU cluster, which provides on-demand single Instruction/multiple data (SIMD) fast fourier transform (FFT) processing and performs various application logic (e.g., expectation–maximization (EM) algorithm and 8-bit floating-point (FP8) quantization); and 4) an always-ON M0 subsystem for audio detection and power management. Measurement results demonstrate the efficiency ranges of 2.6–7.8 TFLOPs/W and 4.33–17.6 Gsamples/s/W for FlexASR and MSSE, respectively; MSSE denoising performance allowing 6 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> smaller ASR model to be stored on-chip with negligible accuracy loss; and 2.24-mJ energy consumption while achieving real-time throughput, end-to-end, and per-frame ASR latencies of 18 ms.

  • Research Article
  • Cite Count Icon 361
  • 10.1016/j.eswa.2023.122666
A comprehensive survey on applications of transformers for deep learning tasks
  • Nov 23, 2023
  • Expert Systems with Applications
  • Saidul Islam + 6 more

A comprehensive survey on applications of transformers for deep learning tasks

  • Research Article
  • 10.51584/ijrias.2024.908065
Developing and Analyzing Deep Learning and Natural Language Processing Systems in the Context of Medical Information Processing
  • Jan 1, 2024
  • International Journal of Research and Innovation in Applied Science
  • Emmanuel, Victoria Nkemjika + 3 more

This work aims to develop and analyze deep learning and natural language processing systems in the context of medical information processing. The amount of data created about patients in the healthcare system is always increasing. The human review of this enormous volume of data derived from numerous sources is expensive and takes a lot of time. Additionally, during a patient visit, doctors write down the patient’s medical encounter and send it to nurses and other medical departments for processing. Often, the doctor doesn’t have enough time to record every observation made while examining the patient and asking about their medical history which takes time for a medical diagnosis to be made. The manual review of this vast amount of data generated from multiple sources is costly and very time-consuming. It brings huge challenges while attempting to review this data meaningfully. Therefore, the goal of this research is to create a system that will address the aforementioned issues. The suggested method extracts voice data from medical encounters and converts it to text using Deep Learning (DL) and Natural Language Processing (NLP) techniques. More so, the system developed will improve medical intelligence processing by using deep learning to analyze medical datasets and produce results of a diagnosis, assisting medical professionals at various levels in making realistic, intelligent decisions in real-time regarding crucial health issues. The system was designed using the Object-Oriented Analysis and Design Methodology (OOADM), and the user interfaces were put into place utilizing Natural Language Processing techniques, particularly speech recognition and natural language comprehension. Speech recognition allows for the taking of free text notes, which can drastically cut down on the amount of time medical staff spends on labor-in the tensive clinical recording. By extracting different pieces of data for medical diagnosis and producing results in a matter of seconds, a deep learning algorithm demonstrates a significant capacity to construct clinical decision support systems. The system’s results demonstrate that the deep learning algorithm enabled medical intelligence to be 96.7 percent accurate.

  • Research Article
  • Cite Count Icon 4
  • 10.1007/s10994-005-1399-6
Guest Editors Introduction: Machine Learning in Speech and Language Technologies
  • Sep 1, 2005
  • Machine Learning
  • Pascale Fung + 1 more

Machine learning techniques have long been the foundations of speech processing. Bayesian classiflcation, decision trees, unsupervised clustering, the EM algorithm, maximum entropy, etc. are all part of existing speech recognition systems. The success of statistical speech recognition has led to the rise of statistical and empirical methods in natural language processing. Indeed, many of the machine learning techniques used in language processing, from statistical part-of-speech tagging to the noisy channel model for machine translation have roots in work conducted in the speech fleld. However, advances in learning theory and algorithmic machine learning approaches in recent years have led to signiflcant changes in the direction and emphasis of the statistical and learning centered research in natural language processing and made a mark on natural language and speech processing. Approaches such as memory based learning, a range of linear classiflers such as Boosting, SVMs and SNoW and others have been successfully applied to a broad range of natural language problems, and these now inspire new research in speech retrieval and recognition. We have seen an increasingly close collaboration between voice and language processing researchers in some of the shared tasks such as spontaneous speech recognition and understanding, voice data information extraction, and machine translation. The purpose of this special issue was to invite speech and language researchers to communicate with each other, and with the machine learning community on the latest machine learning advances in their work. The call for papers was met with great enthusiasm from the speech and natural language community. Thirty six submissions were received; each paper was reviewed by at least three reviewers. Only ten papers were selected re∞ecting not only some of the best work on machine learning in the areas of natural language and spoken language processing but also what we view as a collection of papers that represent current trends in these areas of research both from the perspective of

  • Book Chapter
  • 10.71443/9788197282164-06
Detailed Study of Supervised Learning Algorithms and Their Applications in Real-World Scenarios
  • Jun 22, 2024
  • S Praveena

Neural Architecture Search (NAS) has revolutionized the design of deep learning models by automating the exploration of neural network architectures, thereby enhancing performance across various domains. This chapter delves into the latest advancements in NAS, focusing on its application in image classification, natural language processing, autonomous systems, and hardware optimization. Key methodologies, including reinforcement learning-based and efficient NAS approaches, are explored in depth to illustrate their impact on model accuracy and computational efficiency. Through comprehensive case studies, the chapter highlights the transformative potential of NAS in generating state-of-the-art architectures, optimizing resource utilization, and addressing complex tasks with unprecedented precision. The discussion emphasizes the balance between search efficiency and model performance, providing insights into the future trajectory of NAS research. This chapter was essential for understanding the cutting-edge techniques and practical applications of NAS, offering valuable knowledge for researchers and practitioners in the field of machine learning and artificial intelligence.

  • Research Article
  • Cite Count Icon 46
  • 10.1109/access.2023.3253818
Neural Architecture Search Benchmarks: Insights and Survey
  • Jan 1, 2023
  • IEEE Access
  • Krishna Teja Chitty-Venkata + 3 more

Neural Architecture Search (NAS), a promising and fast-moving research field, aims to automate the architectural design of Deep Neural Networks (DNNs) to achieve better performance on the given task and dataset. NAS methods have been very successful in discovering efficient models for various Computer Vision, Natural Language Processing, etc. The major obstacles to the advancement of NAS techniques are the demand for large computation resources and fair evaluation of various search methods. The differences in training pipeline and setting make it challenging to compare the efficiency of two NAS algorithms. A large number of NAS Benchmarks to simulate the architecture evaluation in seconds have been released over the last few years to ease the computation burden of training neural networks and can aid in the unbiased assessment of different search methods. This paper provides and extensive review of several publicly available NAS Benchmarks in the literature. We provide technical details and a deeper understanding of each benchmark and point out future directions.

  • Conference Article
  • Cite Count Icon 1
  • 10.1117/12.2305226
Understanding adversarial attack and defense towards deep compressed neural networks
  • May 3, 2018
  • Qi Liu + 2 more

Modern deep neural networks (DNNs) have been demonstrating a phenomenal success in many exciting appli- cations such as computer vision, speech recognition, and natural language processing, thanks to recent machine learning model innovation and computing hardware advancement. However, recent studies show that state-of- the-art DNNs can be easily fooled by carefully crafted input perturbations that are even imperceptible to human eyes, namely “adversarial examples”, causing the emerging security concerns for DNN based intelligent systems. Moreover, to ease the intensive computation and memory resources requirement imposed by the fast-growing DNN model size, aggressively pruning the redundant model parameters through various hardware-favorable DNN techniques (i.e. hash, deep compression, circulant projection) has become a necessity. This procedure further complicates the security issues of DNN systems. In this paper, we first study the vulnerabilities of hardware-oriented deep compressed DNNs under various adversarial attacks. Then we survey the existing mitigation approaches such as gradient distillation, which is originally tailored to the software-based DNN systems. Inspired by the gradient distillation and weight reshaping, we further develop a near zero-cost but effective gradient silence (GS) method to protect both software and hardware-based DNN systems against adversarial attacks. Compared with defensive distillation, our gradient salience method can achieve better resilience to adversarial attacks without additional training, while still maintaining very high accuracies across small and large DNN models for various image classification benchmarks like MNIST and CIFAR10.

  • Research Article
  • Cite Count Icon 6
  • 10.1109/tmc.2023.3244170
DGL: Device Generic Latency Model for Neural Architecture Search on Mobile Devices
  • Feb 1, 2024
  • IEEE Transactions on Mobile Computing
  • Qinsi Wang + 1 more

The low-cost Neural Architecture Search (NAS) for lightweight networks working on massive mobile devices is essential for fast-developing ICT technology. Current NAS work can not search on unseen devices without latency sampling, which is a big obstacle to the implementation of NAS on mobile devices. In this paper, we overcome this challenge by proposing the Device Generic Latency (DGL) model. By absorbing processor modeling technology, the proposed DGL formula maps the parameters in the interval theory to the seven static configuration parameters of the device. And to make the formula more practical, we refine it to low-cost form by decreasing the number of configuration parameters to four. Then based on this formula, the DGL model is proposed which introduces the network parameters predictor and accuracy predictor to work with the DGL formula to predict the network latency. We propose the DGL-based NAS framework to enable fast searches without latency sampling. Extensive experiments results validate that the DGL model can achieve more accurate latency predictions than existing NAS latency predictors on unseen mobile devices. When configured with current state-of-the-art predictors, DGL-based NAS can search for architectures with higher accuracy that meet the latency limit than other NAS implementations, while using less training time and prediction time. Our work shed light on how to adopt domain knowledge into NAS topic and play important role in low-cost NAS on mobile devices.

Save Icon
Up Arrow
Open/Close