In this article we compare the performance of a learning controller with and without a flexible and incrementally expandable distribution scheme. The learning method is modeled after Holland's learning classifier system, but it uses Q-learning as the reinforcement method instead of the “bucket-brigade.” The method for distributing the learning among sub-processes is modular, so that learning can be accomplished monolithically, as a redundant system of peers, or as a conceptual hierarchy of competence levels with varying degrees of abstraction. They can also communicate using short messages transmitted quickly between modules running in separate processes, on separate machines, or across a network connection. The system is applied to a mobile agent whose task is to defend itself against other mobile agents trying to attack it. As a simulation example, we consider a naval combat scenario wherein a surface vessel is considered as a mobile root with various radar and communications sensors. Its task is to detect other agents such as other ships and planes, identify and classify them, and devise optimal tactical decision policies. The results suggest that the learning method employed is useful in general for any sensor-based autonomous robot in dynamic, unstructured environments. © 1997 John Wiley & Sons, Inc.
Read full abstract