Abstract

In this article we compare the performance of a learning controller with and without a flexible and incrementally expandable distribution scheme. The learning method is modeled after Holland's learning classifier system, but it uses Q-learning as the reinforcement method instead of the “bucket-brigade.” The method for distributing the learning among sub-processes is modular, so that learning can be accomplished monolithically, as a redundant system of peers, or as a conceptual hierarchy of competence levels with varying degrees of abstraction. They can also communicate using short messages transmitted quickly between modules running in separate processes, on separate machines, or across a network connection. The system is applied to a mobile agent whose task is to defend itself against other mobile agents trying to attack it. As a simulation example, we consider a naval combat scenario wherein a surface vessel is considered as a mobile root with various radar and communications sensors. Its task is to detect other agents such as other ships and planes, identify and classify them, and devise optimal tactical decision policies. The results suggest that the learning method employed is useful in general for any sensor-based autonomous robot in dynamic, unstructured environments. © 1997 John Wiley & Sons, Inc.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.