Repot: Transferable Reinforcement Learning for Quality-Centric Networked Monitoring in Various Environments

Youngseok Lee,Ikjun Yeom,Honguk Woo,Sung Hyun Choi,Woo Kyung Kim

doi:10.1109/access.2021.3125008

Abstract

Collecting and monitoring data in low-latency from numerous sensing devices is one of the key foundations in networked cyber-physical applications such as industrial process control, intelligent traffic control, and networked robots. As the delay in data updates can degrade the quality of networked monitoring, it is desirable to continuously maintain the optimal setting on sensing devices in terms of transmission rates and bandwidth allocation, taking into account application requirements as well as time-varying conditions of underlying network environments. In this paper, we adapt deep reinforcement learning (RL) to achieve a bandwidth allocation policy in networked monitoring. We present a transferable RL model Repot in which a policy trained in an easy-to-learn network environment can be readily adjusted in various target network environments. Specifically, we employ flow embedding and action shaping schemes in Repot that enable the systematic adaptation of a bandwidth allocation policy to the conditions of a target environment. Through experiments with the NS-3 network simulator, we show that Repot achieves stable and high monitoring performance across different network conditions, e.g., outperforming other heuristics and learning-based solutions by 14.5~20.8% in quality-of-experience (QoE) for a target network environment. We also demonstrate the sample-efficient adaptation in Repot by exploiting only 6.25% of the sample amount required for model training from scratch. We present a case study with the SUMO mobility simulator and verify the benefits of Repot in practical scenarios, showing performance gains over the others, e.g., 6.5% in urban-scale and 12.6% in suburb-scale.

Highlights

In cyber-physical applications, sensing devices operate as data sources of a distributed database in that each continuously sends its status information to a centralized node that evaluates application-specific queries on aggregated information
In the same vein as those reinforcement learning (RL)-based approaches, we address the optimization problem of status updates in networked monitoring by formulating it as successive decisions on bandwidth allocation for multiple sensing devices, which can be modeled in a Markov decision process (MDP) to be learned through RL
EVALUATION we describe the implementation of Repot, and evaluate its performance compared to other algorithms including a learning model based on a state-of-the-art age of information (AoI) algorithm [3] under various network conditions

Summary

INTRODUCTION

In cyber-physical applications, sensing devices operate as data sources of a distributed database in that each continuously sends its status information to a centralized node that evaluates application-specific queries on aggregated information. The action shaping scheme is intended to decompose the action inference on bandwidth allocation into a two-staged procedure with several modules, by which a latent action is generated upon flow embeddings and its representation can be transformed according to different conditions (e.g., underlying network limitations or spatial characteristics of monitored objects) These two schemes in Repot enable the rapid adaptation of a policy optimized in an easy-to-learn source environment to a target environment, and alleviate the difficulty in achieving an optimal policy across a variety of network scales and environment conditions.

OVERALL SYSTEM

RL-BASED ORCHESTRATION

POLICY TRANSFERABLE RL STRUCTURE

FLOW EMBEDDING

BANDWIDTH ALLOCATION

IMPLEMENTATION

EVALUATION

LEARNING ENVIRONMENTS

16 KBytes

COMPARISON METHODS

40 Random Top-Opt

ADAPTATION PERFORMANCE

CASE STUDY

RELATED WORK

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Repot: Transferable Reinforcement Learning for Quality-Centric Networked Monitoring in Various Environments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2021
License type: CC BY 4.0

Similar Papers

Dynamic Bandwidth Allocation Scheme for Wireless Networks with Energy Harvesting Using Actor-Critic Deep Reinforcement Learning
Quang Vinh Do ... Insoo Koo
-
Quang Vinh Do, et. al.Quang Vinh Do ... Insoo Koo
01 Feb 2019
01 Feb 2019

Learning Tailored Adaptive Bitrate Algorithms to Heterogeneous Network Conditions: A Domain-Specific Priors and Meta-Reinforcement Learning Approach
Tianchi Huang ... Rui-Xiao Zhang
IEEE Journal on Selected Areas in Communications | VOL. 40
Tianchi Huang, et. al.Tianchi Huang ... Rui-Xiao Zhang
01 Aug 2022
IEEE Journal on Selected Areas in Communications | VOL. 40

RL-based HTTP adaptive streaming with edge collaboration in multi-client environment
Jeongho Kang ... Kwangsue Chung
Journal of Microcomputer Applications | VOL. 223
Jeongho Kang, et. al.Jeongho Kang ... Kwangsue Chung
22 Jan 2024
Journal of Microcomputer Applications | VOL. 223

HTTP adaptive streaming scheme based on reinforcement learning with edge computing assistance
Minsu Kim ... Kwangsue Chung
Journal of Microcomputer Applications | VOL. 213
Minsu Kim, et. al.Minsu Kim ... Kwangsue Chung
17 Feb 2023
Journal of Microcomputer Applications | VOL. 213

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Repot: Transferable Reinforcement Learning for Quality-Centric Networked Monitoring in Various Environments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions