Bellman Optimality Research Articles

The object of this study is an approach to solving the problems of designing service-oriented networks that warn about emergencies using dynamic programming. The main issue is the complexity of algorithmization of processes that describe the achievement of an optimal solution in multi-stage nonlinear problems. The possibilities of applying the Bellman optimality principle for solving the set tasks for the purpose of their application in the field of engineering and technology are determined. Based on the Bellman functional equation, a model of the optimal number of sensors in the monitoring system for warning of emergencies was built. A feature of the design is that using the classical Bellman equation, it is proposed to solve problems of various technical directions, provided that the resource determines what exactly makes it possible to optimize work in any way. Important with this approach is the planning of the action as an element of some problem with the augmented state. After that, the proposed structure in formal form extends to other objects. A problem was proposed and considered, which confirmed the mathematical calculations, as a result of which an optimal plan for replacing the sensors of the system was obtained; and the possibilities of significant cost reduction were identified. In the considered example, an optimal plan for replacing the system sensors was compiled and the possibility of reducing costs by 31.9 % was proved. The proposed option was used in the development of information technology for modeling a service-oriented network based on energy-efficient long-range protocols; some of the identified features were further developed in the design of a recommendation system for issuing loans and developing an interactive personnel training system

Read full abstract

In reinforcement learning, the objective is almost always defined as a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">cumulative</i> function over the rewards along the process. However, there are many optimal control and reinforcement learning problems in various application fields, especially in communications and networking, where the objectives are not naturally expressed as summations of the rewards. In this paper, we recognize the prevalence of non-cumulative objectives in various problems, and propose a modification to existing algorithms for optimizing such objectives. Specifically, we dive into the fundamental building block for many optimal control and reinforcement learning algorithms: the Bellman optimality equation. To optimize a non-cumulative objective, we replace the original <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">summation operation</i> in the Bellman update rule with a generalized operation corresponding to the objective. Furthermore, we provide sufficient conditions on the form of the generalized operation as well as assumptions on the Markov decision process under which the globally optimal convergence of the generalized Bellman updates can be guaranteed. We demonstrate the idea experimentally with the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">bottleneck</i> objective, i.e., the objectives determined by the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">minimum</i> reward along the process, on classical optimal control and reinforcement learning tasks, as well as on two network routing problems on maximizing the flow rates.

Read full abstract

Bellman Optimality Research Articles

Related Topics

Articles published on Bellman Optimality

Optimal Model-Free Mean-Square Consensus for Multi-Agents with Markov Switching Topology

Dual-timescale resource management for multi-type caching placement and multi-user computation offloading in Internet of Vehicle

Optimized Backstepping Cooperative Control for Output-Constrained Stochastic Nonlinear Network Systems via a Multibridge-Hole Function.

Research and application of the flatness target curve discrete dynamic programming based on two-dimensional decision making

A review of dynamic optimization in aquaculture production economics

Differential graphical game‐based multi‐agent tracking control using integral reinforcement learning

Optimal control for both forward and backward discrete-time systems

Boundary Optimal Control for Parabolic Distributed Parameter Systems With Value Iteration.

Barrier Lyapunov Function-Based Safe Reinforcement Learning for Autonomous Vehicles With Optimized Backstepping.

Kernel-based diffusion approximated Markov decision processes for autonomous navigation and control on unstructured terrains

Multiagent, Multitarget Path Planning in Markov Decision Processes

Real-Time Leak Location of Long-Distance Pipeline Using Adaptive Dynamic Programming.

Traffic efficiency and fairness optimisation for autonomous intersection management based on reinforcement learning

Learning-Based Event-Triggered Tracking Control for Nonlinear Networked Control Systems With Unmatched Disturbance

Me, myself and I: A general theory of non-Markovian time-inconsistent stochastic control for sophisticated agents

A Minimum Principle for Stochastic Optimal Control Problem with Interval Cost Function

Features in solving individual tasks to develop service-oriented networks using dynamic programming

Integer optimal control problems with total variation regularization: Optimality conditions and fast solution of subproblems

Reinforcement Learning With Non-Cumulative Objective

Optimal Control and Adaptive Learning for Stabilization of a Quadrotor-type Unmanned Aerial Vehicle via Approximate Dynamic Programming

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Bellman Optimality Research Articles

Related Topics

Articles published on Bellman Optimality

Optimal Model-Free Mean-Square Consensus for Multi-Agents with Markov Switching Topology

Dual-timescale resource management for multi-type caching placement and multi-user computation offloading in Internet of Vehicle

Optimized Backstepping Cooperative Control for Output-Constrained Stochastic Nonlinear Network Systems via a Multibridge-Hole Function.

Research and application of the flatness target curve discrete dynamic programming based on two-dimensional decision making

A review of dynamic optimization in aquaculture production economics

Differential graphical game‐based multi‐agent tracking control using integral reinforcement learning

Optimal control for both forward and backward discrete-time systems

Boundary Optimal Control for Parabolic Distributed Parameter Systems With Value Iteration.

Barrier Lyapunov Function-Based Safe Reinforcement Learning for Autonomous Vehicles With Optimized Backstepping.

Kernel-based diffusion approximated Markov decision processes for autonomous navigation and control on unstructured terrains

Multiagent, Multitarget Path Planning in Markov Decision Processes

Real-Time Leak Location of Long-Distance Pipeline Using Adaptive Dynamic Programming.

Traffic efficiency and fairness optimisation for autonomous intersection management based on reinforcement learning

Learning-Based Event-Triggered Tracking Control for Nonlinear Networked Control Systems With Unmatched Disturbance

Me, myself and I: A general theory of non-Markovian time-inconsistent stochastic control for sophisticated agents

A Minimum Principle for Stochastic Optimal Control Problem with Interval Cost Function

Features in solving individual tasks to develop service-oriented networks using dynamic programming

Integer optimal control problems with total variation regularization: Optimality conditions and fast solution of subproblems

Reinforcement Learning With Non-Cumulative Objective

Optimal Control and Adaptive Learning for Stabilization of a Quadrotor-type Unmanned Aerial Vehicle via Approximate Dynamic Programming