Relative Value Iteration Research Articles

The relative value iteration scheme (RVI) for Markov decision processes (MDP) dates back to White (1963) , a seminal work, which introduced an algorithm for solving the ergodic dynamic programming equation for the finite state, finite action case. Its ramifications have given rise to popular learning algorithms (Q-learning). More recently, this algorithm gained prominence because of its implications for model predictive control (MPC). For stochastic control problems on an infinite time horizon, especially for problems that seek to optimize the average performance (ergodic control), obtaining the optimal policy in explicit form is only possible for a few classes of well-structured models. What is often used in practice is a heuristic method called the rolling horizon, or receding horizon, or MPC. This works as follows: one solves the finite horizon problem for a given number of steps N, or for an interval [0,T] in the case of a continuous time problem. The result is a nonstationary Markov policy, which is optimal for the finite horizon problem. We fix the initial action (this is the action determined at the Nth step of the value iteration (VI) algorithm) and apply it as a stationary Markov control. We refer to this Markov control as the rolling horizon control. This of course depends on the length of the horizon N. One expects that for well-structured problems, if N is sufficiently large, then the rolling horizon control is near optimal. Of course, this is a heuristic. The rolling horizon control might not even be stable. For a good discussion on this problem, we refer the reader to Della Vecchia et al. (2012) . Obtaining such solutions is further complicated by the fact that the value of the ergodic cost required in the successive iteration scheme is not known. This is the reason for the RVI.

Read full abstract

The general problem of a queue-aware radio resource management and scheduling design is investigated for wireless communications under quasi-static fading channel conditions. Based on an analysis of the source buffer queuing system, the problem is formulated as a constrained nonlinear discrete programming problem. The state transition matrix of the queuing system determined by the queue-aware scheduler is shown to have a highly dynamic structure, so that the conventional matrix analysis and optimization tools are not applicable. By reformulating the problem into a nonlinear integer programming problem on an integer convex set, a direct search approach is considered. Two types of search algorithms, gradient based and gradient-free, are investigated. An integer steepest-descent search with a sub-sequential interval search algorithm and a constrained discrete Rosenbrock search (CDRS) algorithm is proposed to solve the nonlinear integer problem. Both algorithms are shown to have low complexity and good convergence. The numerical results for a single user resource allocation are presented, which show that both algorithms outperform equal partitioning and random partitioning queue-aware scheduling. The dynamic programming (DP) solution given by the relative value iteration algorithm, which provides the true optima but has high complexity, is used as a benchmark. In the majority of the numerical examples, the performance of the CDRS algorithm is almost identical to that of the DP approach in terms of both the average queue length minimization and the average packet blocking plus packet retransmission minimization, but it is less complex, and thus has better scalability.

Read full abstract

Relative Value Iteration Research Articles

Related Topics

Articles published on Relative Value Iteration

Robust Average-Reward Markov Decision Processes

A Reinforcement Learning Approach for Optimizing the Age-of-Computing-Enabled IoT

Optimal Sampling and Scheduling for Timely Status Updates in Multi-Source Networks

User association and power allocation for UAV-assisted networks: A distributed reinforcement learning approach

Optimal Energy Cooperation Policy in Fusion Center-Based Sustainable Wireless Sensor Networks

Low Complexity Online Radio Access Technology Selection Algorithm in LTE-WiFi HetNet

On the relative value iteration with a risk-sensitive criterion

Open Problem—Convergence and Asymptotic Optimality of the Relative Value Iteration in Ergodic Control

CMDP-based intelligent transmission for wireless body area network in remote health monitoring

Strong convergence and dynamic economic models

Markov Decision Processes with Exogenous Variables

Structural Estimation Using Parametric Mathematical Programming with Equilibrium Constraints and Homotopy Path Continuation

A Continuous-Time Markov decision process-based resource allocation scheme in vehicular cloud for mobile video services

Optimal Dynamic Multicast Scheduling for Cache-Enabled Content-Centric Wireless Networks

Markov Decision Processes with Exogenous Variables

A Correction to “A Relative Value Iteration Algorithm for Nondegenerate Controlled Diffusions''

SMDP-Based Radio Resource Allocation Scheme in Software-Defined Internet of Things Networks

Stochastic Content-Centric Multicast Scheduling for Cache-Enabled Heterogeneous Cellular Networks

Stochastic Throughput Optimization for Two-Hop Systems With Finite Relay Buffers

Generalized Queue-Aware Resource Management and Scheduling for Wireless Communications

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Relative Value Iteration Research Articles

Related Topics

Articles published on Relative Value Iteration

Robust Average-Reward Markov Decision Processes

A Reinforcement Learning Approach for Optimizing the Age-of-Computing-Enabled IoT

Optimal Sampling and Scheduling for Timely Status Updates in Multi-Source Networks

User association and power allocation for UAV-assisted networks: A distributed reinforcement learning approach

Optimal Energy Cooperation Policy in Fusion Center-Based Sustainable Wireless Sensor Networks

Low Complexity Online Radio Access Technology Selection Algorithm in LTE-WiFi HetNet

On the relative value iteration with a risk-sensitive criterion

Open Problem—Convergence and Asymptotic Optimality of the Relative Value Iteration in Ergodic Control

CMDP-based intelligent transmission for wireless body area network in remote health monitoring

Strong convergence and dynamic economic models

Markov Decision Processes with Exogenous Variables

Structural Estimation Using Parametric Mathematical Programming with Equilibrium Constraints and Homotopy Path Continuation

A Continuous-Time Markov decision process-based resource allocation scheme in vehicular cloud for mobile video services

Optimal Dynamic Multicast Scheduling for Cache-Enabled Content-Centric Wireless Networks

Markov Decision Processes with Exogenous Variables

A Correction to “A Relative Value Iteration Algorithm for Nondegenerate Controlled Diffusions''

SMDP-Based Radio Resource Allocation Scheme in Software-Defined Internet of Things Networks

Stochastic Content-Centric Multicast Scheduling for Cache-Enabled Heterogeneous Cellular Networks

Stochastic Throughput Optimization for Two-Hop Systems With Finite Relay Buffers

Generalized Queue-Aware Resource Management and Scheduling for Wireless Communications