Abstract

Collecting and monitoring data in low-latency from numerous sensing devices is one of the key foundations in networked cyber-physical applications such as industrial process control, intelligent traffic control, and networked robots. As the delay in data updates can degrade the quality of networked monitoring, it is desirable to continuously maintain the optimal setting on sensing devices in terms of transmission rates and bandwidth allocation, taking into account application requirements as well as time-varying conditions of underlying network environments. In this paper, we adapt deep reinforcement learning (RL) to achieve a bandwidth allocation policy in networked monitoring. We present a transferable RL model Repot in which a policy trained in an easy-to-learn network environment can be readily adjusted in various target network environments. Specifically, we employ flow embedding and action shaping schemes in Repot that enable the systematic adaptation of a bandwidth allocation policy to the conditions of a target environment. Through experiments with the NS-3 network simulator, we show that Repot achieves stable and high monitoring performance across different network conditions, e.g., outperforming other heuristics and learning-based solutions by 14.5~20.8% in quality-of-experience (QoE) for a target network environment. We also demonstrate the sample-efficient adaptation in Repot by exploiting only 6.25% of the sample amount required for model training from scratch. We present a case study with the SUMO mobility simulator and verify the benefits of Repot in practical scenarios, showing performance gains over the others, e.g., 6.5% in urban-scale and 12.6% in suburb-scale.

Highlights

  • In cyber-physical applications, sensing devices operate as data sources of a distributed database in that each continuously sends its status information to a centralized node that evaluates application-specific queries on aggregated information

  • In the same vein as those reinforcement learning (RL)-based approaches, we address the optimization problem of status updates in networked monitoring by formulating it as successive decisions on bandwidth allocation for multiple sensing devices, which can be modeled in a Markov decision process (MDP) to be learned through RL

  • EVALUATION we describe the implementation of Repot, and evaluate its performance compared to other algorithms including a learning model based on a state-of-the-art age of information (AoI) algorithm [3] under various network conditions

Read more

Summary

INTRODUCTION

In cyber-physical applications, sensing devices operate as data sources of a distributed database in that each continuously sends its status information to a centralized node that evaluates application-specific queries on aggregated information. The action shaping scheme is intended to decompose the action inference on bandwidth allocation into a two-staged procedure with several modules, by which a latent action is generated upon flow embeddings and its representation can be transformed according to different conditions (e.g., underlying network limitations or spatial characteristics of monitored objects) These two schemes in Repot enable the rapid adaptation of a policy optimized in an easy-to-learn source environment to a target environment, and alleviate the difficulty in achieving an optimal policy across a variety of network scales and environment conditions.

OVERALL SYSTEM
RL-BASED ORCHESTRATION
POLICY TRANSFERABLE RL STRUCTURE
FLOW EMBEDDING
BANDWIDTH ALLOCATION
IMPLEMENTATION
EVALUATION
LEARNING ENVIRONMENTS
16 KBytes
COMPARISON METHODS
40 Random Top-Opt
ADAPTATION PERFORMANCE
CASE STUDY
RELATED WORK
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call