Multiagent reactive plan application learning in dynamic environments

Hüseyin Sevay,Costas Tsatsoulis

doi:10.1145/544932.544937

Abstract

This dissertation studies how we can build a multiagent system that can learn to execute high-level strategies in complex, dynamic, and uncertain domains. We assume that agents do not explicitly communicate and operate autonomously only with their local view of the world. Designing multiagent systems for real-world applications is challenging because of the prohibitively large state and action spaces. Dynamic changes in an environment require reactive responses, and the complexity and the uncertainty inherent in real settings require that individual agents keep their commitment to achieving common goals despite adversities. Therefore, a balance between reaction and reasoning is necessary to accomplish goals in real-world environments. Most work in multiagent systems approaches this problem using bottom-up methodologies. However, bottom-up methodologies are severely limited since they cannot learn alternative strategies, which are essential for dealing with highly dynamic, complex, and uncertain environments where convergence of single-strategy behavior is virtually impossible to obtain. Our methodology is knowledge-based and combines top-down and bottom-up approaches to problem solving in order to take advantage of the strengths of both. We use symbolic plans that define the requirements for what individual agents in a collaborative group need to do to achieve multi-step goals that span through time, but, initially, they do not specify how to implement these goals in each given situation. During training, agents acquire application knowledge using case-based learning, and, using this training knowledge, agents apply plans in realistic settings. During application, they use a naive form of reinforcement learning to allow them to make increasingly better decisions about which specific implementation to select for each situation. Experimentally, we show that, as the complexity of plans increases, the version of our system with naive reinforcement learning performs increasingly better than the version that retrieves and applies unreinforced training knowledge and the version that reacts to dynamic changes using search.

Full Text