Coinf: QoS-aware DRL-based Inference Task Scheduling Framework with Batching Processing
The emergence of deploying Deep neural network (DNN) services on edge servers has spurred research into efficiently provisioning inference services. However, previous studies have neglected to consider the implications of different types of DNN and varying quality of service (QoS) requirements on QoS violation rates. In this paper, we propose a novel framework, named Coinf, for scheduling heterogeneous DNN inference tasks on edge servers. Coinf has the following four advantages to effectively handle attribute analysis, performance balancing, parallel execution, and model accuracy: 1) It enables efficient profiling of domain-specific attributes of various DNN tasks during the offline stage, achieved by constructing a regression model to predict the end-to-end latency of each task. 2) By utilizing the predicted execution time, Coinf achieves a commendable balance among inference latency, system throughput, and QoS violation rate. 3) It employs emerging deep reinforcement learning (DRL) to aggregate individual DNN tasks into batches, enabling concurrent parallel execution. 4) Coinf preserves the accuracies of the provided DNN models by not modifying them. Numerical experiments are constructed to validate the reliability and efficiency of Coinf in handling heterogeneous inference tasks.
- Research Article
3
- 10.1142/s0218194023410085
- Nov 29, 2023
- International Journal of Software Engineering and Knowledge Engineering
In mobile edge computing environment, intelligent inference services driven by DNN are highly sensitive to latency. Recently, collaborative inference between User Devices and Edge Servers (ESs) based on Deep Neural Networks (DNN) partition has achieved success in service acceleration. However, most of the existing collaborative acceleration schemes are partitioned for a single DNN inference task, which cannot quickly make partition decisions for a set of concurrent inference tasks, and often sacrifice inference accuracy. In addition, due to the limited resources of ESs, there is resource competition among concurrent requests, which makes the partitioned tasks cannot be offloaded to ESs in time for processing. Therefore, designing an efficient offloading scheme becomes essential. The task offloading schemes based on deep reinforcement learning can solve complex decision-making problems in high-dimensional state space, but they have problems such as insufficient sample diversity and easily falling into local optimum. In this paper, a Collaborative Inference Acceleration Scheme integrating DNN Partitioning and Task Offloading (CIAS-PnO) is proposed. First, while ensuring inference accuracy, the Collaborative DNN Layer Partitioning (CDLP) algorithm is designed with the goal of optimal latency. CDLP can reduce the problem scale of concurrent inference tasks partition by pruning operation and determine the partition decisions in time. Then, the Distributed Soft Actor-Critic (SAC)-based Partition Task Offloading algorithm (DSACO) is designed. DSACO supports SAC Agents to explore samples in parallel and share learning experiences, and uses the automatic entropy adjustment mechanism to improve the exploration efficiency of Agents, so as to avoid falling into local optimum and achieve efficient offloading of partition tasks. Experimental results on DNN benchmarks show that compared with the baseline acceleration schemes, CIAS-PnO achieves more than 19.8% acceleration performance improvement, and has higher convergence performance and task success rate.
- Conference Article
3
- 10.1109/vtc2021-fall52928.2021.9625281
- Sep 1, 2021
With the development of wireless network technologies, such as LTE/5G, Mobile Cloud Computing (MCC) has been proposed as a solution for mobile devices that need to carry out high-complexity computation with limited resources. Technically, with MCC, high-complexity computation tasks are offloaded from mobile devices to cloud servers. However, MCC does not work well for time-sensitive mobile applications due to the relatively long latency between mobile devices and cloud servers. Mobile Edge Computing (MEC), is expected to solve the problem with MCC. With MEC, edge servers, instead of cloud servers, are deployed at the edge of the network to provide offloading services to mobile devices. Since edge servers are much closer to mobile devices, the resulting latency is significantly lower. Despite the advantages of MEC over MCC, edge servers are not as resource-abundant as cloud servers. Consequently, when many offloaded tasks arrive at an edge server, admission control needs to be in place to arrive at the best performance. In this paper, we propose a Deep Reinforcement Learning (DRL) based admission control scheme, DAC, to maximize the system throughput of an edge server. Our experimental results indicate that DAC outperforms the existing admission control schemes for MEC in terms of system throughput.
- Research Article
10
- 10.1016/j.ipm.2021.102850
- Jan 15, 2022
- Information Processing & Management
Fast-DRD: Fast decentralized reinforcement distillation for deadline-aware edge computing
- Research Article
36
- 10.1016/j.comcom.2022.02.011
- Feb 22, 2022
- Computer Communications
Deep reinforcement learning-based multi-objective edge server placement in Internet of Vehicles
- Research Article
6
- 10.1049/cmu2.12309
- Nov 27, 2021
- IET Communications
Multi‐access edge computing provides computation and network resources in proximity to user applications in mobile environments. Deploying edge servers in network boundary can not only offload the heavy task loading on the cloud, but also alleviate resource‐limited capabilities of mobile devices. Rather than many stand‐alone edge servers, the concept of multi‐server edge computing is recently advocated to contend with the issues of system scalability and service quality against dynamic task workload. This study exploits collaborative computing resources and designs a task migration strategy for multiple edge servers in mobile networks. This study formulates a queueing optimization problem of minimizing the overall service time in a multi‐server system. An intelligent task migration scheme is then developed using the deep reinforcement learning and Q‐learning techniques. With a variety of numerical attributes derived from the queueing model, this intelligent scheme can arrange the task distribution among edge servers to enhance the task processing capability. Simulation‐based results show that the proposed task migration scheme can sustain service efficiency and resource utilization, which is promising as compared with conventional designs without collaborative intelligence in mobile environments.
- Research Article
45
- 10.1109/tii.2022.3192882
- Feb 1, 2023
- IEEE Transactions on Industrial Informatics
Joint task inference, which fully utilizes end edge cloud cooperation, can effectively enhance the performance of deep neural network (DNN) inference services in the industrial internet of things (IIoT) applications. In this paper, we propose a novel joint resource management scheme for a multi task and multi service scenario consisting of multiple sensors, a cloud server, and a base station equipped with an edge server . A time slotted system model is proposed, incorporating DNN deployment, data size control, task offloading, computing resource allocation, and wireless channel allocation. Among them, the DNN deployment is to deploy proper DNNs on the edge server under its total resource constraint, and the data size control is to make trade off between task inference accuracy and task transmission delay through changing task da ta size. Our goal is to minimize the total cost including total task processing delay and total error inference penalty while guaranteeing long term task queue stability and all task inference accuracy requirements. Leveraging the Lyapunov optimization, we first transform the optimization problem into a deterministic problem for each time slot. Then, a deep deterministic policy gradient (DDPG) based deep reinforcement learning (DRL) algorithm is designed to provide the near optimal solution. We further desi gn a fast numerical method for the data size control sub problem to reduce the training complexity of the DRL model, and design a penalty mechanism to prevent frequent optimizations of DNN deployment. Extensive experiments are conducted by varying differen t crucial parameters. The superiority of our scheme is demonstrated in comparison with 3 other schemes.
- Research Article
16
- 10.1109/ojcoms.2023.3280359
- Jan 1, 2023
- IEEE Open Journal of the Communications Society
Aiming at protecting device data privacy, Federated Learning (FL) is a framework of distributed machine learning in which devices’ local model parameters are exchanged with a centralized server without revealing the actual data. Hierarchical Federated Learning (HFL) framework was introduced to improve FL communication efficiency where devices are clustered and seek model consensus with the support of edge servers (e.g., base stations). Devices in a cluster submit their local model updates to their assigned local edge server for aggregation at each iteration. The edge servers transmit the aggregated models to a centralized server and establish a global consensus. However, similar to FL, adversaries may threaten the security and privacy of HFL. The client devices within a cluster may deliberately provide unreliable local model updates through poisoning attacks or poor-quality model updates due to inconsistent communication channels, increased device mobility, or inadequate device resources. To address the above challenges, this paper investigates the client selection problem in the HFL framework to eliminate the impact of unreliable clients while maximizing the global model accuracy of HFL. Each FL edge server is equipped with a Deep Reinforcement Learning (DRL)-based reputation model to optimally measure the reliability and trustworthiness of FL workers within its cluster. A Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is utilized to enhance the accuracy and stability of the HFL global model, given the workers’ dynamic behaviors in the HFL environment. The experimental results indicate that our proposed MADDPG improves the accuracy and stability of HFL compared with the conventional reputation model and single-agent DDPG-based reputation model.
- Research Article
5
- 10.32604/cmc.2023.034892
- Jan 1, 2023
- Computers, Materials & Continua
The main aim of future mobile networks is to provide secure, reliable, intelligent, and seamless connectivity. It also enables mobile network operators to ensure their customer’s a better quality of service (QoS). Nowadays, Unmanned Aerial Vehicles (UAVs) are a significant part of the mobile network due to their continuously growing use in various applications. For better coverage, cost-effective, and seamless service connectivity and provisioning, UAVs have emerged as the best choice for telco operators. UAVs can be used as flying base stations, edge servers, and relay nodes in mobile networks. On the other side, Multi-access Edge Computing (MEC) technology also emerged in the 5G network to provide a better quality of experience (QoE) to users with different QoS requirements. However, UAVs in a mobile network for coverage enhancement and better QoS face several challenges such as trajectory designing, path planning, optimization, QoS assurance, mobility management, etc. The efficient and proactive path planning and optimization in a highly dynamic environment containing buildings and obstacles are challenging. So, an automated Artificial Intelligence (AI) enabled QoS-aware solution is needed for trajectory planning and optimization. Therefore, this work introduces a well-designed AI and MEC-enabled architecture for a UAVs-assisted future network. It has an efficient Deep Reinforcement Learning (DRL) algorithm for real-time and proactive trajectory planning and optimization. It also fulfills QoS-aware service provisioning. A greedy-policy approach is used to maximize the long-term reward for serving more users with QoS. Simulation results reveal the superiority of the proposed DRL mechanism for energy-efficient and QoS-aware trajectory planning over the existing models.
- Research Article
2
- 10.1016/j.comnet.2024.110609
- Jun 22, 2024
- Computer Networks
BD-TTS: A blockchain and DRL-based framework for trusted task scheduling in edge computing
- Research Article
20
- 10.1109/mnet.011.2000663
- Jul 1, 2021
- IEEE Network
With the rapid development of smart city and 5G, user demand for Internet services has increased exponentially. Through collaborative content sharing, the storage limitation of a single edge server (ES) can be broken. However, when mobile users need to download the whole content through multiple regions, independently deciding the caching content for ESs in different regions may result in redundant caching. Furthermore, frequent switching of communication connection during user movement also causes retransmission delay. As a revolutionary approach in the artificial intelligence field, deep reinforcement learning (DRL) has earned great success in solving high-dimensional and network resource management related problems. Therefore, we integrate collaborative caching and DRL to build an intelligent edge caching framework, so as to realize collaborative caching between cloud and ESs. In this caching framework, a fed-erated-machine-learning-based user behavior prediction model is first designed to characterize the content preference and movement trajectory of mobile users. Next, to achieve efficient resource aggregation of ESs, a user-behavior-aware dynamic collaborative caching domain (DCCD) construction and management mechanism is devised to divide ESs into clusters, select cluster heads, and set the re-clustering rules. Then a DRL-based content caching and delivery algorithm is presented to decide the caching content of ESs in a DCCD from a global perspective and plan the transmission path for users, which reduces redundant content and transmission delay. Especially when a user request cannot be satisfied by the current DCCD, a cross-domain content delivery strategy is presented to allow ESs in other DCCDs to provide and forward content to the user, avoiding the traffic pressure and delay caused by requesting services from cloud. The simulation results show that the proposed collaborative caching framework can improve user satisfaction in terms of content hit rate and service delay.
- Research Article
- 10.1002/cpe.70174
- Jul 3, 2025
- Concurrency and Computation: Practice and Experience
ABSTRACTEdge cameras are ubiquitous, together with the recent boom in computer vision technology, and a variety of video analytics tasks are being processed at the edge. It is challenging to support more complex video analytics tasks on edge servers with unpredictable request loads and limited resources. However, most of these works use only a single optimization approach, focusing only on the improvement of a certain performance metric in a single processing stage, ignoring the balance of other performance metrics, and the space available for optimization is often very limited. Especially when dealing with video analytics tasks that need to be divided into two GPU‐CPU stages for completion, this unidirectional focus may lead to execution performance imbalance or even negative quality of service (QoS) optimization. In addition, to fully utilize the valuable resources on the edge servers, it is often necessary to schedule multiple types of video analytics tasks on the edge servers. However, most of the existing scheduling strategies only focus on how to allocate computational resources for end‐to‐end tasks. They lack the awareness and consideration of the execution of tasks in different execution stages, as well as the mutual interference among tasks. These scheduling strategies, lacking stage‐sensitivity and interference‐sensitivity, may cause performance conflicts in environments running multiple tasks involving GPU‐CPU dual‐stage processing, thus affecting the overall QoS. To address these challenges, we first evaluate the impact of batch processing, frame rate control, resolution selection, and CPU concurrency processing on throughput, latency, and accuracy when running dual‐stage tasks on edge platforms. Then, we propose DualRT, a soft real‐time video analytics framework for dual‐stage tasks, to optimize the QoS of dual‐stage tasks while avoiding request stacking on edge platforms. In the scheduling module of DualRT, we design a scheduling method using a multi‐agent deep reinforcement learning algorithm and a variable time window approach to schedule multiple dual‐stage tasks with joint control of batch size, resolution, frame rate, and CPU concurrency for each task. Our experimental results show that DualRT improves QoS by an average of 13.3% and maximum throughput by an average of 24.6% compared to state‐of‐the‐art solutions.
- Research Article
22
- 10.1109/twc.2022.3192613
- Jan 1, 2023
- IEEE Transactions on Wireless Communications
Graphics processing units (GPUs) can improve deep neural network inference throughput via batch processing, where multiple tasks are concurrently processed. We focus on novel scenarios that the energy-constrained mobile devices offload inference tasks to an edge server with GPU. The inference task is partitioned into sub-tasks for a finer granularity of offloading and scheduling, and the user energy consumption minimization problem under inference latency constraints is investigated. To deal with the coupled offloading and scheduling introduced by concurrent batch processing, we first consider an offline problem with a constant edge inference latency and the same latency constraint. It is proven that optimizing the offloading policy of each user independently and aggregating all the same sub-tasks in one batch is optimal, and thus the independent partitioning and same sub-task aggregating (IP-SSA) algorithm is inspired. Further, the optimal grouping (OG) algorithm is proposed to optimally group tasks when the latency constraints are different. Finally, when future task arrivals cannot be precisely predicted, a deep deterministic policy gradient (DDPG) agent is trained to call OG. Experiments show that IP-SSA reduces up to 94.9% user energy consumption in the offline setting, while DDPG-OG outperforms DDPG-IP-SSA by up to 8.92% in the online setting.
- Book Chapter
- 10.1201/9781003144977-20
- Feb 18, 2021
The advantage of mobile edge computing (MEC) is that the computing and storage resources can be distributed in all parts of the network, and the deployment node of MEC also meets the requirements of an application for low latency. However, user mobility may make it far from the edge server which undertakes the application task, resulting in inevitable service interruption. In this paper, a new virtual machine (VM) service migration scheme supporting mobility is proposed. Our scheme is realized in three aspects: (1) Some VMs in the related edge server can host the multiuser application tasks. The VM migration strategy can properly migrate the user’s tasks, reduce the user-perceived delay, and ameliorate the quality of service (QoS); (2) The system dynamically allocates resources to users, including bandwidth resources and computing resources, which affects the perceived delay of users; (3) We further propose a multiuser service migration scheme based on deep reinforcement learning (DRL), which can reduce the large state space and realize fast decision-making. We conduct extensive experiments, which show that using the DRL algorithm outperforms the classical RL algorithm and some other baseline algorithm.
- Research Article
37
- 10.1109/jiot.2023.3264281
- Sep 1, 2023
- IEEE Internet of Things Journal
Vehicular edge networks involves edge servers that are close to mobile devices to provide extra computation resource to complete the computation tasks of mobile devices with low latency and high reliability. Considerable efforts on computation offloading in vehicular edge networks have been developed to reduce the energy consumption and computation latency, in which roadside units (RSUs) are usually considered as the fixed edge servers. Nonetheless, the computation offloading with considering mobile vehicles as mobile edge servers in vehicular edge networks still needs to be further investigated. To this end, in this paper, we propose a deep reinforcement learning based computation offloading with mobile vehicles in vehicular edge computing, namely DRL-COMV, in which some vehicles (such as autonomous vehicle) are deployed and considered as the mobile edge servers that move in vehicular edge networks and cooperate with fixed edge servers to provide extra computation resource for mobile devices, in order to assist in completing the computation tasks of these mobile devices with great QoE (i.e.,low latency) for mobile devices. Particularly, the computation offloading model with considering both mobile and fixed edge servers is conducted to achieve the computation tasks offloading through vehicle-to-vehicle (V2V) communications, and a collaborative route planning is considered for these mobile edge servers to move in vehicular edge networks with objective of improving efficiency of computation offloading. Then, a deep reinforcement learning approach with designing rational reward function is proposed to determine the effective computation offloading strategies for multiple mobile devices and multiple edge servers with objective of maximizing both QoE (i.e., low latency) for mobile devices. Through performance evaluations, our results show that our proposed DRL-COMV scheme can achieve a great convergence and stability. Additionally, our results also demonstrate that our DRL-COMV scheme also can achieve better both QoE and task offloading requests hit ratio for mobile devices in comparison with existing approaches (i.e., DDPG, IMOPSOQ and GABDOS).
- Research Article
42
- 10.1109/jiot.2021.3073034
- Oct 1, 2021
- IEEE Internet of Things Journal
The emergence of edge computing can effectively tackle the problem of large transmission delays caused by the long-distance between user devices and remote cloud servers. Users can offload tasks to the nearby edge servers to perform computations, so as to minimize the average task response time through effective task dispatching and scheduling methods. However: 1) in the task dispatching phase, the dynamic features of network conditions and server loads make it difficult for the offloaded tasks to select the optimal edge server and 2) in the task scheduling phase, each edge server may face a large number of offloading tasks to schedule, resulting in long average task response time, or even severe task starvation. In this article, we propose an online task dispatching and fair scheduling method OTDS to tackle the above two challenges, which combines online learning (OL) and deep reinforcement learning (DRL) techniques. Specifically, using an OL approach, OTDS performs real-time estimating of network conditions and server loads, and then dynamically assigns tasks to the optimal edge servers accordingly. Meanwhile, at each edge server, by combing the round-robin mechanism with DRL, OTDS is able to allocate appropriate resources to each task according to its time sensitivity and achieve high efficiency and fairness in task scheduling. Evaluation results show that our online method can dynamically allocate network resources and computing resources to those offloaded tasks according to their time-sensitive requirements. Thus, OTDS outperforms the existing methods in terms of the efficiency and fairness on task dispatching and scheduling by significantly reducing the average task response time.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.