Deep learning on edge computing devices: design challenges of algorithm and architecture

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Deep learning on edge computing devices: design challenges of algorithm and architecture

Similar Papers
  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-030-87059-1_5
Deep Learning Frameworks for Internet of Things
  • Jan 1, 2022
  • Dristi Datta + 1 more

Artificial intelligence is a common platform in which the concept of machine learning (ML) and deep learning (DL) appears. The DL is becoming a hot research topic in recent years as it enables various smarter applications and services, including the Internet of Things (IoTs). DL discovers characteristics and responsibilities straightforwardly from data including pixels, images, shapes, dimensions, text, and sound. DL is also considered as an end-to-end learning approach because the tasks are associated with learning from data. Several hidden layers consist of a neural network, and therefore, it is also known as a deep neural network (DNN). The convolution neural network (CNN) is commonly used in DNN which contains a significant number of hidden layers. This chapter aims to explore DL frameworks for IoT. The chapter begins with a discussion on the development and architecture of the DL framework. We then discuss various DL models associated with deep reinforcement learning approaches for IoT. The potential applications, including smart grid management, road traffic management, industrial sector, estimation of crop production, and detection of various plant diseases, are discussed. Various design issues and challenges in implementing DL are also discussed. The findings reported in this chapter provide some insights into DL frameworks for IoT that can help network researchers and engineers to contribute further toward the development of next-generation IoT.KeywordsArtificial intelligenceDeep learningDeep neural networkFrameworkIoTMachine learning

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-319-94199-8_38
New Challenges for the Industrial Architecture. Ergonomics on the Edge of a New Era of IT Technology and Deep Learning
  • Jun 27, 2018
  • Pawel Horn

Presently we observe a shift of human activity from the traditional methods of manufacturing products for increasingly specialized and evolving robotic and IT systems. For obvious economic and technological reasons this change is first strongly visible in industrial production. Along with technological advances is a dramatic shift of man’s place in the production process from the position at the machine to the back of this process as designer, supervisor and controller of information systems that manage production process. This seemingly obvious change results in completely new challenges for both industrial architecture but also the wider built environment as it dramatically reduces the number of jobs with completely new requirements for workplace and its architecture. The author of this article discusses the above issue on the example of the design of technologically advanced 3D printing plant from the point of view of the designer.

  • Research Article
  • 10.17485/ijst/v17i44.2663
Unlocking Modern VLSI Placement with Cutting-Edge GPU Acceleration Integrated with Deep Learning Tool Kit
  • Dec 10, 2024
  • Indian Journal Of Science And Technology
  • Akshaya Kumar Dash + 1 more

Objectives: In VLSI, cell placement is critical in determining the overall performance, area, runtime efficiency, and power consumption of integrated circuits. The objective of this work is to find the best possible locations for the cells to meet the mentioned constraints. Methods: The proposed method utilizes a deep reinforcement learning strategy with heuristics to address the complexities of modern VLSI design challenges. A multi-objective deep reinforcement learning approach fused with GPU acceleration is explored to optimize placement metrics like wirelength, congestion, and runtime to gain a globally optimal placement solution. The suggested method dynamically selects appropriate parameters, a process known as Parameter Tuning, to produce high-quality placement solutions. The strategy's potency is demonstrated using open-source benchmarks sourced from storage, with BlackParrot and MemPool. Findings: The trial outcomes of the strategy on benchmark data show considerable improvements in placement quality. It reduces wire length by up to 4% and congestion by about 10%. Moreover, it is highly scalable and reliable in providing global placement solutions. The reduced aspect ratio indicates lower chip area utilization and reduced power consumption. Novelty: The integration of GPU acceleration and deep learning methodology as a strategy for VLSI global placement has not been reported in prior placement work; however, similar approaches are available for legalization and detailed placement. Keywords: Placement, VLSI, Deep Learning, GPU Acceleration, Parameter Tuning

  • Conference Article
  • Cite Count Icon 13
  • 10.1109/iccad.2017.8203877
Deep learning challenges and solutions with Xilinx FPGAs
  • Nov 1, 2017
  • Elliott Delaye + 3 more

In this paper, we will describe the architectural, software, performance, and implementation challenges and solutions and current research on the use of programmable logic to enable deep learning applications. First a discussion of characteristics of building a deep learning system will described. Next architectural choices will be explained for how a FPGA fabric can efficiently solve deep learning tasks. Finally specific techniques for how DSPs, memories and are used in high performance applications will be described.

  • Research Article
  • Cite Count Icon 4
  • 10.3390/electronics10060689
On-Device Deep Learning Inference for System-on-Chip (SoC) Architectures
  • Mar 15, 2021
  • Electronics
  • Tom Springer + 3 more

As machine learning becomes ubiquitous, the need to deploy models on real-time, embedded systems will become increasingly critical. This is especially true for deep learning solutions, whose large models pose interesting challenges for target architectures at the “edge” that are resource-constrained. The realization of machine learning, and deep learning, is being driven by the availability of specialized hardware, such as system-on-chip solutions, which provide some alleviation of constraints. Equally important, however, are the operating systems that run on this hardware, and specifically the ability to leverage commercial real-time operating systems which, unlike general purpose operating systems such as Linux, can provide the low-latency, deterministic execution required for embedded, and potentially safety-critical, applications at the edge. Despite this, studies considering the integration of real-time operating systems, specialized hardware, and machine learning/deep learning algorithms remain limited. In particular, better mechanisms for real-time scheduling in the context of machine learning applications will prove to be critical as these technologies move to the edge. In order to address some of these challenges, we present a resource management framework designed to provide a dynamic on-device approach to the allocation and scheduling of limited resources in a real-time processing environment. These types of mechanisms are necessary to support the deterministic behavior required by the control components contained in the edge nodes. To validate the effectiveness of our approach, we applied rigorous schedulability analysis to a large set of randomly generated simulated task sets and then verified the most time critical applications, such as the control tasks which maintained low-latency deterministic behavior even during off-nominal conditions. The practicality of our scheduling framework was demonstrated by integrating it into a commercial real-time operating system (VxWorks) then running a typical deep learning image processing application to perform simple object detection. The results indicate that our proposed resource management framework can be leveraged to facilitate integration of machine learning algorithms with real-time operating systems and embedded platforms, including widely-used, industry-standard real-time operating systems.

  • Research Article
  • Cite Count Icon 62
  • 10.1093/bib/bbac102
Protein design via deep learning.
  • Mar 25, 2022
  • Briefings in Bioinformatics
  • Wenze Ding + 2 more

Proteins with desired functions and properties are important in fields like nanotechnology and biomedicine. De novo protein design enables the production of previously unseen proteins from the ground up and is believed as a key point for handling real social challenges. Recent introduction of deep learning into design methods exhibits a transformative influence and is expected to represent a promising and exciting future direction. In this review, we retrospect the major aspects of current advances in deep-learning-based design procedures and illustrate their novelty in comparison with conventional knowledge-based approaches through noticeable cases. We not only describe deep learning developments in structure-based protein design and direct sequence design, but also highlight recent applications of deep reinforcement learning in protein design. The future perspectives on design goals, challenges and opportunities are also comprehensively discussed.

  • Research Article
  • 10.3390/sym17071109
Self-Adapting CPU Scheduling for Mixed Database Workloads via Hierarchical Deep Reinforcement Learning
  • Jul 10, 2025
  • Symmetry
  • Suchuan Xing + 2 more

Modern database systems require autonomous CPU scheduling frameworks that dynamically optimize resource allocation across heterogeneous workloads while maintaining strict performance guarantees. We present a novel hierarchical deep reinforcement learning framework augmented with graph neural networks to address CPU scheduling challenges in mixed database environments comprising Online Transaction Processing (OLTP), Online Analytical Processing (OLAP), vector processing, and background maintenance workloads. Our approach introduces three key innovations: first, a symmetric two-tier control architecture where a meta-controller allocates CPU budgets across workload categories using policy gradient methods while specialized sub-controllers optimize process-level resource allocation through continuous action spaces; second, graph neural network-based dependency modeling that captures complex inter-process relationships and communication patterns while preserving inherent symmetries in database architectures; and third, meta-learning integration with curiosity-driven exploration enabling rapid adaptation to previously unseen workload patterns without extensive retraining. The framework incorporates a multi-objective reward function balancing Service Level Objective (SLO) adherence, resource efficiency, symmetric fairness metrics, and system stability. Experimental evaluation through high-fidelity digital twin simulation and production deployment demonstrates substantial performance improvements: 43.5% reduction in p99 latency violations for OLTP workloads and 27.6% improvement in overall CPU utilization, with successful scaling to 10,000 concurrent processes maintaining sub-3% scheduling overhead. This work represents a significant advancement toward truly autonomous database resource management, establishing a foundation for next-generation self-optimizing database systems with implications extending to broader orchestration challenges in cloud-native architectures.

  • Conference Article
  • Cite Count Icon 16
  • 10.1109/cvpr42600.2020.00937
Unsupervised Deep Shape Descriptor With Point Distribution Learning
  • Jun 1, 2020
  • Yi Shi + 3 more

Deep learning models have achieved great success in supervised shape descriptor learning for 3D shape retrieval, classification, and correspondence. However, the unsupervised shape descriptor calculated via deep learning is less studied than that of supervised ones due to the design challenges of unsupervised neural network architecture. This paper proposes a novel probabilistic framework for the learning of unsupervised deep shape descriptors with point distribution learning. In our approach, we firstly associate each point with a Gaussian, and the point clouds are modeled as the distribution of the points. We then use deep neural networks (DNNs) to model a maximum likelihood estimation process that is traditionally solved with an iterative Expectation-Maximization (EM) process. Our key novelty is that ``training'' these DNNs with unsupervised self-correspondence L2 distance loss will elegantly reveal the statically significant deep shape descriptor representation for the distribution of the point clouds. We have conducted experiments over various 3D datasets. Qualitative and quantitative comparisons demonstrate that our proposed method achieves superior classification performance over existing unsupervised 3D shape descriptors. In addition, we verified the following attractive properties of our shape descriptor through experiments: multi-scale shape representation, robustness to shape rotation, and robustness to noise.

  • Front Matter
  • Cite Count Icon 1
  • 10.1016/s2589-7500(19)30223-7
New beginnings
  • Dec 23, 2019
  • The Lancet Digital Health
  • The Lancet Digital Health

New beginnings

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/icetet-sip58143.2023.10151610
A Review Paper on Object-Detection using the DeepLearning Approach
  • Apr 28, 2023
  • Minal Bodke + 4 more

Because of their close association with object identification, video analysis and picture comprehension have drawn a lot of interest in recent years. Detection of conventional objects the solution is built on handcrafted functions and architecture that seems to be trainable. The accumulation produces a little stall in performance. A complicated set made up of numerous low-level pictures and Scene Classifier features with item detectors and high-level context. As the deep learning field matures, more semantic, high-level, Deeper features are developed to solve existing challenges in conventional architecture. In terms of network design, training technique, and optimization function, for example, these models behave differently. This white paper gives an introduction to Deep. A framework for object identification based on learning. our examination starts A short history of deep learning, the representative tool, the Convolutional Neural Network (CNN). Then we concentrate on a standard generic object identification architecture with certain modifications and handy methods. Improve detecting performance even further. particularly specific the features of discovery tasks vary. Monitoring of particular tasks, such as detection of prominent items, face recognition, and pedestrian identification. Examination of experimental data It is also possible to compare various methodologies and get some significant findings. Finally, as guidance for future effort, several intriguing areas and tasks are presented. Both object identification and a neural network-based learning system are involved. One of the biggest difficulties of object detection is that an object viewed from different angles may look completely different. For example, the images of the cakes that you can see below differ from each other because they show the object from different sides.

  • Conference Article
  • 10.1109/iccvw.2019.00496
Auto-Encoding Meshes of any Topology with the Current-Splatting and Exponentiation Layers
  • Oct 1, 2019
  • Alexandre Bone + 2 more

Deep learning has met key applications in image computing, but still lacks processing paradigms for meshes, i.e. collections of elementary geometrical parts such as points, segments or triangles. Meshes are both a powerful representation for geometrical objects, and a challenge for network architectures because of their inherent irregular structure. This work contributes to adapt classical deep learning paradigms to this particular type of data in three ways. First, we introduce the current-splatting layer which embeds meshes in a metric space, allowing the downstream network to process them without any assumption on their topology: they may be composed of varied numbers of elements or connected components, contain holes, or bear high levels of geometrical noise. Second, we adapt to meshes the exponentiation layer which, from an upstream image array, generates shapes with a diffeomorphic control over their topology. Third, we take advantage of those layers to devise a variational auto-encoding architecture, which we interpret as a generative statistical model that learns adapted low-dimensional representations for mesh data sets. An explicit norm-control layer ensures the correspondence between the latent-space Euclidean metric and the shape-space log-Euclidean one. We illustrate this method on simulated and real data sets, and show the practical relevance of the learned representation for visualization, classification and mesh synthesis.

  • Preprint Article
  • 10.26434/chemrxiv-2025-0wkrd
A Minimalistic Deep Graph Learning Approach for Protein-Ligand Binding Affinity: One Step Towards Generalization
  • Mar 21, 2025
  • Ulises Rojas-Castañeda + 3 more

Predicting protein-ligand binding affinity is a fundamental challenge in structure-based drug design. While deep learning models have significantly improved affinity predictions, many state-of-the-art approaches rely on complex architectures with tens or hundreds of thousands of trainable parameters, which may lead to overfitting and reduced generalizability. In this study, we introduce ECIF-GCN, a minimalist deep graph learning model that extends the Extended Connectivity Interaction Features (ECIF) framework by incorporating a fully connected graph representation and leveraging Graph Convolutional Networks (GCNs) to process molecular interactions. ECIF-GCN was trained and evaluated on LP-PDBbind, a benchmark specifically designed to minimize protein and ligand similarity across dataset splits, providing a rigorous assessment of model generalization. Despite having significantly fewer trainable parameters compared to more complex architectures, ECIF-GCN achieved the lowest RMSE (1.52) in the test set of LP-PDBbind, outperforming models such as InteractionGraphNet and RF-Score, which contain a substantially larger number of parameters. These results demonstrate that high predictive accuracy in binding affinity estimation does not require highly overparameterized deep learning models. These results highlight the potential of minimalist deep learning architectures in protein-ligand binding affinity prediction, providing a balance between predictive power, computational efficiency, and generalization ability, and suggest that a carefully designed low-parameter model can achieve state-of-the-art performance, reinforcing the idea that overparameterization is not a prerequisite for robust molecular modeling.

  • Research Article
  • 10.1155/2021/9874724
Deep Learning in Mobile Computing: Architecture, Applications, and Future Challenges
  • Aug 14, 2021
  • Mobile Information Systems
  • Xiaoxian Yang + 2 more

Deep Learning in Mobile Computing: Architecture, Applications, and Future Challenges

  • Research Article
  • Cite Count Icon 1
  • 10.1360/n112018-00288
Deep learning hardware acceleration based on general vector DSP
  • Mar 1, 2019
  • SCIENTIA SINICA Informationis
  • Huili Wang + 2 more

As deep learning (DL) plays an increasingly significant role in several fields, designing a high performance, low power, low-latency hardware accelerator for DL has become a topic of interest in the field of architecture. Based on the structure and optimization method of DL algorithms, this study aims to analyze the difficulties and challenges in DL hardware design. In comparison with the current mainstream DL hardware acceleration platform, advantages of the DL hardware acceleration based on general vector DSP are discussed. Besides, acceleration techniques, such as vector broadcasting and matrix conversion, are described. From the viewpoint of the shortcomings of the general vector DSP discussed herein, optimization techniques such as reconfigurable computing arrays that take into account the general vector calculations as well as specific DL acceleration are discussed in depth.

  • Research Article
  • Cite Count Icon 15
  • 10.1504/ijics.2008.018515
On security issues in embedded systems: challenges and solutions
  • Jan 1, 2008
  • International Journal of Information and Computer Security
  • Lyes Khelladi + 3 more

Ensuring security in embedded systems translates into several design challenges, imposed by the unique features of these systems. These features make the integration of conventional security mechanisms impractical, and require a better understanding of the whole security problem. This paper provides a unified view on security in embedded systems, by introducing first the implied design and architectural challenges. It then surveys and discusses the currently proposed security solutions that address these challenges, drawing from both current practices and emerging research, and identifies some open research problems that represent the most interesting areas of contribution.

More from: Contemporary Physics
  • Research Article
  • 10.1080/00107514.2025.2562755
The supernova 1987A system and its recent evolution – a review
  • Oct 28, 2025
  • Contemporary Physics
  • Michael J Barlow

  • Research Article
  • 10.1080/00107514.2025.2550105
Flat bands in condensed-matter systems – perspective for magnetism and superconductivity
  • Oct 21, 2025
  • Contemporary Physics
  • Hideo Aoki

  • Research Article
  • 10.1080/00107514.2025.2554578
From stars to life: a quantitative approach to astrobiology
  • Oct 10, 2025
  • Contemporary Physics
  • B Ishak

  • Research Article
  • 10.1080/00107514.2025.2566047
Machine learning for physics and astronomy
  • Oct 10, 2025
  • Contemporary Physics
  • Yow Ai Ping

  • Research Article
  • 10.1080/00107514.2025.2566052
Statistical mechanics of phases and phase transitions
  • Oct 9, 2025
  • Contemporary Physics
  • Prince Sharma

  • Research Article
  • 10.1080/00107514.2025.2566048
Matilda meets the universe
  • Oct 9, 2025
  • Contemporary Physics
  • Gerry Gilmore

  • Research Article
  • 10.1080/00107514.2025.2566050
Biophysics: tools and techniques for the physics of life, 2nd edition
  • Oct 9, 2025
  • Contemporary Physics
  • Rifal Ramadhan + 1 more

  • Research Article
  • 10.1080/00107514.2025.2554582
Studies in theoretical physics, volume 2: advanced mathematical methods
  • Oct 2, 2025
  • Contemporary Physics
  • John D Clayton

  • Research Article
  • 10.1080/00107514.2025.2556579
Single-molecule biophysics
  • Sep 27, 2025
  • Contemporary Physics
  • Mark C Leake

  • Research Article
  • 10.1080/00107514.2025.2554584
Quantum phases of matter
  • Sep 26, 2025
  • Contemporary Physics
  • Tom Lancaster

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon