A study on foraging behavior of swarm robots using reinforcement learning techniques

Mohan Yogeswaran

doi:10.4225/03/5893fe61e42bf

Abstract

Swarm robotics is the study of how large number of relatively simple physically embodied agents can be designed such that a desired collective behavior emerges from the local interactions among agents and between the agents and the environment. It is a novel approach to the coordination of large numbers of robots. It is inspired from the observation of social insects such as ants, termites, wasps and bees which stand as fascinating examples of how a large number of simple individuals can interact to create collectively intelligent systems. Social insects are known to coordinate their actions to accomplish tasks that are beyond the capabilities of a single individual: termites build large and complex mounds, army ants organize impressive foraging raids and ants can collectively carry large preys. Such coordination capabilities are still beyond the reach of current multirobot systems. In this research, most recent developments of swarm robot systems are addressed by classifying the primary research axes in terms of principal topic areas that have generated significant levels of research. Specific research scopes in each of primary research axes are also classified and the key open issues in this research scopes are identified. With the goal of bringing some objective grounding to the important areas of the mentioned research scopes, this dissertation presents an empirical analysis on multi-agent foraging task using reinforcement learning algorithms. Foraging task is one of the mostly used test applications in multi-agent systems. The aim of agents in foraging task is to find the pucks (prey) and bring them to the home (nest) location in a environment while satisfying the given constraints. This task is also known as prey retrieval task. Reinforcement learning is used to tackle the modeled multi-agent foraging task. Reinforcement learning has been extensively used in many applications such as industrial control, time sequence prediction, robot soccer competition and more. In this thesis, a multi-agent foraging task is modeled using Webots (1996) simulation software and the reinforcement learning algorithms and policies are tested. One of the challenges that arise in reinforcement learning is the exploration and exploitation dilemma. A novel learning policy (FIFO-list learning policy) is proposed and compared against the available learning policies reported in the literature to tackle the exploration and exploitation dilemma. An improved reinforcement learning algorithm (Cautious-Q learning algorithm) is also proposed and the performance is compared with the available learning algorithms. The proposed learning algorithm is a combined strategy of on-policy learning and off-policy learning. The improved learning algorithm and the learning policy is implemented on real environment with Khepera 2 mobile robots and results are presented.

Full Text