Deep learning on edge computing devices: design challenges of algorithm and architecture
Deep learning on edge computing devices: design challenges of algorithm and architecture
- Book Chapter
1
- 10.1007/978-3-030-87059-1_5
- Jan 1, 2022
Artificial intelligence is a common platform in which the concept of machine learning (ML) and deep learning (DL) appears. The DL is becoming a hot research topic in recent years as it enables various smarter applications and services, including the Internet of Things (IoTs). DL discovers characteristics and responsibilities straightforwardly from data including pixels, images, shapes, dimensions, text, and sound. DL is also considered as an end-to-end learning approach because the tasks are associated with learning from data. Several hidden layers consist of a neural network, and therefore, it is also known as a deep neural network (DNN). The convolution neural network (CNN) is commonly used in DNN which contains a significant number of hidden layers. This chapter aims to explore DL frameworks for IoT. The chapter begins with a discussion on the development and architecture of the DL framework. We then discuss various DL models associated with deep reinforcement learning approaches for IoT. The potential applications, including smart grid management, road traffic management, industrial sector, estimation of crop production, and detection of various plant diseases, are discussed. Various design issues and challenges in implementing DL are also discussed. The findings reported in this chapter provide some insights into DL frameworks for IoT that can help network researchers and engineers to contribute further toward the development of next-generation IoT.KeywordsArtificial intelligenceDeep learningDeep neural networkFrameworkIoTMachine learning
- Book Chapter
1
- 10.1007/978-3-319-94199-8_38
- Jun 27, 2018
Presently we observe a shift of human activity from the traditional methods of manufacturing products for increasingly specialized and evolving robotic and IT systems. For obvious economic and technological reasons this change is first strongly visible in industrial production. Along with technological advances is a dramatic shift of man’s place in the production process from the position at the machine to the back of this process as designer, supervisor and controller of information systems that manage production process. This seemingly obvious change results in completely new challenges for both industrial architecture but also the wider built environment as it dramatically reduces the number of jobs with completely new requirements for workplace and its architecture. The author of this article discusses the above issue on the example of the design of technologically advanced 3D printing plant from the point of view of the designer.
- Research Article
- 10.17485/ijst/v17i44.2663
- Dec 10, 2024
- Indian Journal Of Science And Technology
Objectives: In VLSI, cell placement is critical in determining the overall performance, area, runtime efficiency, and power consumption of integrated circuits. The objective of this work is to find the best possible locations for the cells to meet the mentioned constraints. Methods: The proposed method utilizes a deep reinforcement learning strategy with heuristics to address the complexities of modern VLSI design challenges. A multi-objective deep reinforcement learning approach fused with GPU acceleration is explored to optimize placement metrics like wirelength, congestion, and runtime to gain a globally optimal placement solution. The suggested method dynamically selects appropriate parameters, a process known as Parameter Tuning, to produce high-quality placement solutions. The strategy's potency is demonstrated using open-source benchmarks sourced from storage, with BlackParrot and MemPool. Findings: The trial outcomes of the strategy on benchmark data show considerable improvements in placement quality. It reduces wire length by up to 4% and congestion by about 10%. Moreover, it is highly scalable and reliable in providing global placement solutions. The reduced aspect ratio indicates lower chip area utilization and reduced power consumption. Novelty: The integration of GPU acceleration and deep learning methodology as a strategy for VLSI global placement has not been reported in prior placement work; however, similar approaches are available for legalization and detailed placement. Keywords: Placement, VLSI, Deep Learning, GPU Acceleration, Parameter Tuning
- Conference Article
13
- 10.1109/iccad.2017.8203877
- Nov 1, 2017
In this paper, we will describe the architectural, software, performance, and implementation challenges and solutions and current research on the use of programmable logic to enable deep learning applications. First a discussion of characteristics of building a deep learning system will described. Next architectural choices will be explained for how a FPGA fabric can efficiently solve deep learning tasks. Finally specific techniques for how DSPs, memories and are used in high performance applications will be described.
- Research Article
4
- 10.3390/electronics10060689
- Mar 15, 2021
- Electronics
As machine learning becomes ubiquitous, the need to deploy models on real-time, embedded systems will become increasingly critical. This is especially true for deep learning solutions, whose large models pose interesting challenges for target architectures at the “edge” that are resource-constrained. The realization of machine learning, and deep learning, is being driven by the availability of specialized hardware, such as system-on-chip solutions, which provide some alleviation of constraints. Equally important, however, are the operating systems that run on this hardware, and specifically the ability to leverage commercial real-time operating systems which, unlike general purpose operating systems such as Linux, can provide the low-latency, deterministic execution required for embedded, and potentially safety-critical, applications at the edge. Despite this, studies considering the integration of real-time operating systems, specialized hardware, and machine learning/deep learning algorithms remain limited. In particular, better mechanisms for real-time scheduling in the context of machine learning applications will prove to be critical as these technologies move to the edge. In order to address some of these challenges, we present a resource management framework designed to provide a dynamic on-device approach to the allocation and scheduling of limited resources in a real-time processing environment. These types of mechanisms are necessary to support the deterministic behavior required by the control components contained in the edge nodes. To validate the effectiveness of our approach, we applied rigorous schedulability analysis to a large set of randomly generated simulated task sets and then verified the most time critical applications, such as the control tasks which maintained low-latency deterministic behavior even during off-nominal conditions. The practicality of our scheduling framework was demonstrated by integrating it into a commercial real-time operating system (VxWorks) then running a typical deep learning image processing application to perform simple object detection. The results indicate that our proposed resource management framework can be leveraged to facilitate integration of machine learning algorithms with real-time operating systems and embedded platforms, including widely-used, industry-standard real-time operating systems.
- Research Article
62
- 10.1093/bib/bbac102
- Mar 25, 2022
- Briefings in Bioinformatics
Proteins with desired functions and properties are important in fields like nanotechnology and biomedicine. De novo protein design enables the production of previously unseen proteins from the ground up and is believed as a key point for handling real social challenges. Recent introduction of deep learning into design methods exhibits a transformative influence and is expected to represent a promising and exciting future direction. In this review, we retrospect the major aspects of current advances in deep-learning-based design procedures and illustrate their novelty in comparison with conventional knowledge-based approaches through noticeable cases. We not only describe deep learning developments in structure-based protein design and direct sequence design, but also highlight recent applications of deep reinforcement learning in protein design. The future perspectives on design goals, challenges and opportunities are also comprehensively discussed.
- Research Article
- 10.3390/sym17071109
- Jul 10, 2025
- Symmetry
Modern database systems require autonomous CPU scheduling frameworks that dynamically optimize resource allocation across heterogeneous workloads while maintaining strict performance guarantees. We present a novel hierarchical deep reinforcement learning framework augmented with graph neural networks to address CPU scheduling challenges in mixed database environments comprising Online Transaction Processing (OLTP), Online Analytical Processing (OLAP), vector processing, and background maintenance workloads. Our approach introduces three key innovations: first, a symmetric two-tier control architecture where a meta-controller allocates CPU budgets across workload categories using policy gradient methods while specialized sub-controllers optimize process-level resource allocation through continuous action spaces; second, graph neural network-based dependency modeling that captures complex inter-process relationships and communication patterns while preserving inherent symmetries in database architectures; and third, meta-learning integration with curiosity-driven exploration enabling rapid adaptation to previously unseen workload patterns without extensive retraining. The framework incorporates a multi-objective reward function balancing Service Level Objective (SLO) adherence, resource efficiency, symmetric fairness metrics, and system stability. Experimental evaluation through high-fidelity digital twin simulation and production deployment demonstrates substantial performance improvements: 43.5% reduction in p99 latency violations for OLTP workloads and 27.6% improvement in overall CPU utilization, with successful scaling to 10,000 concurrent processes maintaining sub-3% scheduling overhead. This work represents a significant advancement toward truly autonomous database resource management, establishing a foundation for next-generation self-optimizing database systems with implications extending to broader orchestration challenges in cloud-native architectures.
- Conference Article
16
- 10.1109/cvpr42600.2020.00937
- Jun 1, 2020
Deep learning models have achieved great success in supervised shape descriptor learning for 3D shape retrieval, classification, and correspondence. However, the unsupervised shape descriptor calculated via deep learning is less studied than that of supervised ones due to the design challenges of unsupervised neural network architecture. This paper proposes a novel probabilistic framework for the learning of unsupervised deep shape descriptors with point distribution learning. In our approach, we firstly associate each point with a Gaussian, and the point clouds are modeled as the distribution of the points. We then use deep neural networks (DNNs) to model a maximum likelihood estimation process that is traditionally solved with an iterative Expectation-Maximization (EM) process. Our key novelty is that ``training'' these DNNs with unsupervised self-correspondence L2 distance loss will elegantly reveal the statically significant deep shape descriptor representation for the distribution of the point clouds. We have conducted experiments over various 3D datasets. Qualitative and quantitative comparisons demonstrate that our proposed method achieves superior classification performance over existing unsupervised 3D shape descriptors. In addition, we verified the following attractive properties of our shape descriptor through experiments: multi-scale shape representation, robustness to shape rotation, and robustness to noise.
- Front Matter
1
- 10.1016/s2589-7500(19)30223-7
- Dec 23, 2019
- The Lancet Digital Health
New beginnings
- Conference Article
2
- 10.1109/icetet-sip58143.2023.10151610
- Apr 28, 2023
Because of their close association with object identification, video analysis and picture comprehension have drawn a lot of interest in recent years. Detection of conventional objects the solution is built on handcrafted functions and architecture that seems to be trainable. The accumulation produces a little stall in performance. A complicated set made up of numerous low-level pictures and Scene Classifier features with item detectors and high-level context. As the deep learning field matures, more semantic, high-level, Deeper features are developed to solve existing challenges in conventional architecture. In terms of network design, training technique, and optimization function, for example, these models behave differently. This white paper gives an introduction to Deep. A framework for object identification based on learning. our examination starts A short history of deep learning, the representative tool, the Convolutional Neural Network (CNN). Then we concentrate on a standard generic object identification architecture with certain modifications and handy methods. Improve detecting performance even further. particularly specific the features of discovery tasks vary. Monitoring of particular tasks, such as detection of prominent items, face recognition, and pedestrian identification. Examination of experimental data It is also possible to compare various methodologies and get some significant findings. Finally, as guidance for future effort, several intriguing areas and tasks are presented. Both object identification and a neural network-based learning system are involved. One of the biggest difficulties of object detection is that an object viewed from different angles may look completely different. For example, the images of the cakes that you can see below differ from each other because they show the object from different sides.
- Conference Article
- 10.1109/iccvw.2019.00496
- Oct 1, 2019
Deep learning has met key applications in image computing, but still lacks processing paradigms for meshes, i.e. collections of elementary geometrical parts such as points, segments or triangles. Meshes are both a powerful representation for geometrical objects, and a challenge for network architectures because of their inherent irregular structure. This work contributes to adapt classical deep learning paradigms to this particular type of data in three ways. First, we introduce the current-splatting layer which embeds meshes in a metric space, allowing the downstream network to process them without any assumption on their topology: they may be composed of varied numbers of elements or connected components, contain holes, or bear high levels of geometrical noise. Second, we adapt to meshes the exponentiation layer which, from an upstream image array, generates shapes with a diffeomorphic control over their topology. Third, we take advantage of those layers to devise a variational auto-encoding architecture, which we interpret as a generative statistical model that learns adapted low-dimensional representations for mesh data sets. An explicit norm-control layer ensures the correspondence between the latent-space Euclidean metric and the shape-space log-Euclidean one. We illustrate this method on simulated and real data sets, and show the practical relevance of the learned representation for visualization, classification and mesh synthesis.
- Preprint Article
- 10.26434/chemrxiv-2025-0wkrd
- Mar 21, 2025
Predicting protein-ligand binding affinity is a fundamental challenge in structure-based drug design. While deep learning models have significantly improved affinity predictions, many state-of-the-art approaches rely on complex architectures with tens or hundreds of thousands of trainable parameters, which may lead to overfitting and reduced generalizability. In this study, we introduce ECIF-GCN, a minimalist deep graph learning model that extends the Extended Connectivity Interaction Features (ECIF) framework by incorporating a fully connected graph representation and leveraging Graph Convolutional Networks (GCNs) to process molecular interactions. ECIF-GCN was trained and evaluated on LP-PDBbind, a benchmark specifically designed to minimize protein and ligand similarity across dataset splits, providing a rigorous assessment of model generalization. Despite having significantly fewer trainable parameters compared to more complex architectures, ECIF-GCN achieved the lowest RMSE (1.52) in the test set of LP-PDBbind, outperforming models such as InteractionGraphNet and RF-Score, which contain a substantially larger number of parameters. These results demonstrate that high predictive accuracy in binding affinity estimation does not require highly overparameterized deep learning models. These results highlight the potential of minimalist deep learning architectures in protein-ligand binding affinity prediction, providing a balance between predictive power, computational efficiency, and generalization ability, and suggest that a carefully designed low-parameter model can achieve state-of-the-art performance, reinforcing the idea that overparameterization is not a prerequisite for robust molecular modeling.
- Research Article
- 10.1155/2021/9874724
- Aug 14, 2021
- Mobile Information Systems
Deep Learning in Mobile Computing: Architecture, Applications, and Future Challenges
- Research Article
1
- 10.1360/n112018-00288
- Mar 1, 2019
- SCIENTIA SINICA Informationis
As deep learning (DL) plays an increasingly significant role in several fields, designing a high performance, low power, low-latency hardware accelerator for DL has become a topic of interest in the field of architecture. Based on the structure and optimization method of DL algorithms, this study aims to analyze the difficulties and challenges in DL hardware design. In comparison with the current mainstream DL hardware acceleration platform, advantages of the DL hardware acceleration based on general vector DSP are discussed. Besides, acceleration techniques, such as vector broadcasting and matrix conversion, are described. From the viewpoint of the shortcomings of the general vector DSP discussed herein, optimization techniques such as reconfigurable computing arrays that take into account the general vector calculations as well as specific DL acceleration are discussed in depth.
- Research Article
15
- 10.1504/ijics.2008.018515
- Jan 1, 2008
- International Journal of Information and Computer Security
Ensuring security in embedded systems translates into several design challenges, imposed by the unique features of these systems. These features make the integration of conventional security mechanisms impractical, and require a better understanding of the whole security problem. This paper provides a unified view on security in embedded systems, by introducing first the implied design and architectural challenges. It then surveys and discusses the currently proposed security solutions that address these challenges, drawing from both current practices and emerging research, and identifies some open research problems that represent the most interesting areas of contribution.
- Research Article
- 10.1080/00107514.2025.2562755
- Oct 28, 2025
- Contemporary Physics
- Research Article
- 10.1080/00107514.2025.2550105
- Oct 21, 2025
- Contemporary Physics
- Research Article
- 10.1080/00107514.2025.2554578
- Oct 10, 2025
- Contemporary Physics
- Research Article
- 10.1080/00107514.2025.2566047
- Oct 10, 2025
- Contemporary Physics
- Research Article
- 10.1080/00107514.2025.2566052
- Oct 9, 2025
- Contemporary Physics
- Research Article
- 10.1080/00107514.2025.2566048
- Oct 9, 2025
- Contemporary Physics
- Research Article
- 10.1080/00107514.2025.2566050
- Oct 9, 2025
- Contemporary Physics
- Research Article
- 10.1080/00107514.2025.2554582
- Oct 2, 2025
- Contemporary Physics
- Research Article
- 10.1080/00107514.2025.2556579
- Sep 27, 2025
- Contemporary Physics
- Research Article
- 10.1080/00107514.2025.2554584
- Sep 26, 2025
- Contemporary Physics
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.