13 - Direct Learning by Reinforcement

Jennie Si

doi:10.1016/b978-012170960-0/50090-6

Abstract

This chapter is an introduction to a learning control scheme that can be implemented in real time. It is viewed as a model independent approach to the adaptive critic designs. The chapter demonstrates the implementation details and learning results using two illustrative examples. Systematic treatment for developing a generic online learning control system based on the fundamental principle of reinforcement learning or, more specifically, neural dynamic programming is focused in this chapter. This online learning system improves its performance over time in two aspects. First, it learns from its own mistakes through the reinforcement signal from the external environment and tries to reinforce its action to improve future performance. Second, system states associated with the positive reinforcement are memorized through a network learning process where in the future, similar states are positively associated with a control action leading to a positive reinforcement. This discussion also introduces a successful candidate of online learning control design.

Full Text