Transfer learning in spatial reasoning puzzles

Baylor Wetzel

doi:10.5591/978-1-57735-516-8/ijcai11-504

Abstract

Transfer learning is the process of using knowledge gained while solving one problem to solve a new, previously unencountered problem. Current research has concentrated on analogical transfer – a mechanic is able to fix a type of car he has never seen before by comparing it to cars he has fixed before. This approach is typical of case-based reasoning systems and has been successful on a wide variety of problems [Watson, 1997]. When a new problem is encountered, a database of previously solved problems is searched for a problem with similar features. The solution to the most similar problem is selected, adapted and then applied to the new problem. Similar methods exist for adapting reinforcement learning policies [Taylor and Stone, 2009]. We refer to the above approaches as solution adaptation algorithms – a pair of problems are matched on similarity and the solution to the first problem, after some adaptation, is applied to the second problem. The solution adaptation approach requires three things. First, the two problems must be substantially similar in surface or structural features. Second, there must exist a clear method of adapting one solution to another. This is typically done through manually-authored feature mappings or adaptation rules. Third, problem similarity must imply solution similarity. The “similar feature, similar solution” assumption does not hold for all domains. One such domain is tower defense. Tower defense (TD) is a broad category of spatial reasoning puzzles that share a common theme – a set of agents follow a path through a maze (the map) and the player must prevent them from making it to the exit by placing “defense towers” at strategic locations. There are many different tower defense games, each with their own set of towers. The majority of these games have pieces that fit into one of five archetypes: slow but strong, fast but weak, area of effect (AoE), damage over time (DoT) and slowing. The specific traits of each tower and their relative strengths vary from game to game. There are hundreds of TD games, most with multiple maps, making it a good domain for investigating multiple levels of transfer learning. It is not, however, a good candidate for solution-adaptation algorithms. Problem (maze) similarity does not imply solution (piece placement) similarity. Small differences in mazes can lead to qualitatively different solutions and there is no clear way to adapt one maze’s solution to another. Figure 1: Training map in GopherTD. Yellow objects are agents trying to move from Start to End. Circles show range of three towers in three qualitatively different positions.

Full Text