Inference Services Research Articles

6G will connect heterogeneous intelligent agents to make them natively operate complex cooperative tasks. When connecting intelligence, two main research questions arise to identify how artificial intelligence and machine learning models behave depending on (i) their input data quality, affected by errors induced by interference and additive noise during wireless communication; (ii) their contextual effectiveness and resilience to interpret and exploit the meaning behind the data. Both questions are within the realm of semantic and goal-oriented communications. With this paper, we investigate how to effectively share communication spectrum resources between a legacy communication system (i.e., data-oriented) and a new goal-oriented edge intelligence one. Specifically, we address the scenario of an enhanced Mobile Broadband (eMBB) service, i.e., a user uploading a video stream to a radio access point, interfering with an edge inference system, in which a user uploads images to a Mobile Edge Host that runs a classification task. Our objective is to achieve, through cooperation, the highest eMBB service data rate, subject to a targeted goal effectiveness of the edge inference service, namely the probability of confident inference on time. We first formalize a general definition of a goal in the context of wireless communications. This includes the goal effectiveness, (i.e., the goal achievability rate, or the probability of achieving the goal), as well as goal cost (i.e., the network resource consumption needed to achieve the goal with target effectiveness). We argue and show, through numerical evaluations, that communication reliability and goal effectiveness are not straightforwardly linked. Then, after a performance evaluation aiming to clarify the difference between communication performance and goal effectiveness, a long-term optimization problem is formulated and solved via Lyapunov stochastic network optimization tools to guarantee the desired target performance. Finally, our numerical results assess the advantages of the proposed optimization and the superiority of the goal-oriented strategy against baseline 5G-compliant legacy approaches, under both stationary and non-stationary communication (and computation) environments.

Read full abstract

In mobile edge computing environment, intelligent inference services driven by DNN are highly sensitive to latency. Recently, collaborative inference between User Devices and Edge Servers (ESs) based on Deep Neural Networks (DNN) partition has achieved success in service acceleration. However, most of the existing collaborative acceleration schemes are partitioned for a single DNN inference task, which cannot quickly make partition decisions for a set of concurrent inference tasks, and often sacrifice inference accuracy. In addition, due to the limited resources of ESs, there is resource competition among concurrent requests, which makes the partitioned tasks cannot be offloaded to ESs in time for processing. Therefore, designing an efficient offloading scheme becomes essential. The task offloading schemes based on deep reinforcement learning can solve complex decision-making problems in high-dimensional state space, but they have problems such as insufficient sample diversity and easily falling into local optimum. In this paper, a Collaborative Inference Acceleration Scheme integrating DNN Partitioning and Task Offloading (CIAS-PnO) is proposed. First, while ensuring inference accuracy, the Collaborative DNN Layer Partitioning (CDLP) algorithm is designed with the goal of optimal latency. CDLP can reduce the problem scale of concurrent inference tasks partition by pruning operation and determine the partition decisions in time. Then, the Distributed Soft Actor-Critic (SAC)-based Partition Task Offloading algorithm (DSACO) is designed. DSACO supports SAC Agents to explore samples in parallel and share learning experiences, and uses the automatic entropy adjustment mechanism to improve the exploration efficiency of Agents, so as to avoid falling into local optimum and achieve efficient offloading of partition tasks. Experimental results on DNN benchmarks show that compared with the baseline acceleration schemes, CIAS-PnO achieves more than 19.8% acceleration performance improvement, and has higher convergence performance and task success rate.

Read full abstract

Inference Services Research Articles

Related Topics

Articles published on Inference Services

FastPTM: Fast weights loading of pre-trained models for parallel inference service provisioning

CollectNET: a web server for integrated inference of cell-cell communication network.

Internet of Conscious Things: Ontology-Based Social Capabilities for Smart Objects

BCEdge: SLO-Aware DNN Inference Services With Adaptive Batch-Concurrent Scheduling on Edge Devices

ESEN: Efficient GPU sharing of Ensemble Neural Networks

Flexible Deployment of Machine Learning Inference Pipelines in the Cloud–Edge–IoT Continuum

SecureTLM: Private inference for transformer-based large model with MPC

Proposed Fuzzy-Stranded-Neural Network Model That Utilizes IoT Plant-Level Sensory Monitoring and Distributed Services for the Early Detection of Downy Mildew in Viticulture

6G Goal-Oriented Communications: How to Coexist with Legacy Systems?

A novel open-source CADs platform for 3D CT pulmonary analysis

QAVAN: Query-answering approach for actionable numerical relationships over Knowledge Graphs

Collaborative Inference Acceleration Integrating DNN Partitioning and Task Offloading in Mobile Edge Computing

FPCNN: A fast privacy-preserving outsourced convolutional neural network with low-bandwidth

InferFair: Towards QoS-aware scheduling for performance isolation guarantee in heterogeneous model serving systems

Manto: A Practical and Secure Inference Service of Convolutional Neural Networks for IoT

FEBench: A Benchmark for Real-Time Relational Data Feature Extraction

SecGNN: Privacy-Preserving Graph Neural Network Training and Inference as a Cloud Service

Optimizing Secure Decision Tree Inference Outsourcing

Energy-Aware, Device-to-Device Assisted Federated Learning in Edge Computing

MPDM: A Multi-Paradigm Deployment Model for Large-Scale Edge-Cloud Intelligence

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Inference Services Research Articles

Related Topics

Articles published on Inference Services

FastPTM: Fast weights loading of pre-trained models for parallel inference service provisioning

CollectNET: a web server for integrated inference of cell-cell communication network.

Internet of Conscious Things: Ontology-Based Social Capabilities for Smart Objects

BCEdge: SLO-Aware DNN Inference Services With Adaptive Batch-Concurrent Scheduling on Edge Devices

ESEN: Efficient GPU sharing of Ensemble Neural Networks

Flexible Deployment of Machine Learning Inference Pipelines in the Cloud–Edge–IoT Continuum

SecureTLM: Private inference for transformer-based large model with MPC

Proposed Fuzzy-Stranded-Neural Network Model That Utilizes IoT Plant-Level Sensory Monitoring and Distributed Services for the Early Detection of Downy Mildew in Viticulture

6G Goal-Oriented Communications: How to Coexist with Legacy Systems?

A novel open-source CADs platform for 3D CT pulmonary analysis

QAVAN: Query-answering approach for actionable numerical relationships over Knowledge Graphs

Collaborative Inference Acceleration Integrating DNN Partitioning and Task Offloading in Mobile Edge Computing

FPCNN: A fast privacy-preserving outsourced convolutional neural network with low-bandwidth

InferFair: Towards QoS-aware scheduling for performance isolation guarantee in heterogeneous model serving systems

Manto: A Practical and Secure Inference Service of Convolutional Neural Networks for IoT

FEBench: A Benchmark for Real-Time Relational Data Feature Extraction

SecGNN: Privacy-Preserving Graph Neural Network Training and Inference as a Cloud Service

Optimizing Secure Decision Tree Inference Outsourcing

Energy-Aware, Device-to-Device Assisted Federated Learning in Edge Computing

MPDM: A Multi-Paradigm Deployment Model for Large-Scale Edge-Cloud Intelligence