Abstract

This paper considers a transmission control problem in network-coded two-way relay channels (NC-TWRC), where the relay buffers randomly arrived packets from two users, and the channels are assumed to be fading. The problem is modeled by a discounted infinite horizon Markov decision process (MDP). The objective is to find an adaptive transmission control policy that minimizes the packet delay, buffer overflow, transmission power consumption and downlink error rate simultaneously and in the long run. By using the concepts of submodularity, multimodularity and L ♮ -convexity, we study the structure of the optimal policy searched by dynamic programming (DP) algorithm. We show that the optimal transmission policy is nondecreasing in queue occupancies and/or channel states under certain conditions such as the chosen values of parameters in the MDP model, channel modeling method, and the preservation of stochastic dominance in the transitions of system states. Based one these results, we propose to use two low-complexity algorithms for searching the optimal monotonic policy: monotonic policy iteration (MPI) and discrete simultaneous perturbation stochastic approximation (DSPSA). We show that MPI reduces the time complexity of DP, and DSPSA is able to adaptively track the optimal policy when the statistics of the packet arrival processes change with time.

Highlights

  • Network coding (NC) was proposed in [1] to maximize the information flow in a wired network

  • By thinking of each iteration in dynamic programming (DP) as a one-stage pure coordination supermodular game, we show that equiprobable traffic rates and certain conditions on unit costs guarantee that each tuple in the optimal policy is monotonic in the queue state that is controlled by that tuple and the queue state that is associated with the information flow of the opposite direction, i.e., the one that is not under the control of that tuple

  • By observing the submodularity of DP, we show the sufficient conditions for an optimal policy to be nondecreasing in both queue and channel states in terms of unit costs, channel statistics, and finite-state Markov chain (FSMC) models

Read more

Summary

Introduction

Network coding (NC) was proposed in [1] to maximize the information flow in a wired network. By observing the L -convexity and submodularity of DP function, we derive the sufficient conditions for the optimal policy to be nondecreasing in queue and/or channel states These structured results are used to derive two low complexity algorithms: monotonic policy iteration (MPI) and discrete simultaneous perturbation stochastic approximation (DSPSA). 3.4 Immediate cost C : X × A → R+ is the cost incurred immediately after action a is taken in state x at current decision epoch It reflects three optimization concerns: the packet delay and queue overflow, the transmission power, and the downlink transmission error rate. 3.5 Objective and dynamic programming Let x(t) and a(t) denote the state and action at decision epoch t, respectively, and consider an infinite-horizon MDP modeling where the discrete decision making process is assumed to be infinitely long.

Monotonic policies in queues states
Discrete simultaneous perturbation stochastic approximation
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.