Abstract

This paper focuses on a systematic treatment for developing a generic online learning control system based on the fundamental principle of reinforcement learning or more specifically neuro-dynamic programming. This real time learning system improves its performance over time in two aspects: it learns from its own mistakes through the reinforcement signal from the external environment and try to reinforce its action to improve future performance; and system's state associated with the positive reinforcement is memorized through a network learning process where in the future, similar states will be more positively associated with a control action leading to a positive reinforcement. Two successful candidates of online learning control designs are introduced. Real time learning algorithms can be derived for individual components in the learning system. Some analytical insights are provided to give some guidelines on the entire online learning control system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call