Algorithm for Base Action Set Generation Focusing on Undiscovered Sensor Values

Keiji Suzuki,Sho Yamauchi

doi:10.3390/app9010161

Abstract

Previous machine learning algorithms use a given base action set designed by hand or enable locomotion for a complicated task through trial and error processes with a sophisticated reward function. These generated actions are designed for a specific task, which makes it difficult to apply them to other tasks. This paper proposes an algorithm to obtain a base action set that does not depend on specific tasks and that is usable universally. The proposed algorithm enables as much interoperability among multiple tasks and machine learning methods as possible. A base action set that effectively changes the external environment was chosen as a candidate. The algorithm obtains this base action set on the basis of the hypothesis that an action to effectively change the external environment can be found by observing events to find undiscovered sensor values. The process of obtaining a base action set was validated through a simulation experiment with a differential wheeled robot.

Highlights

Previous machine learning algorithms [1,2] such as Q-learning use a given base action set and choose an action from the set repeatedly [3–7]
We found that our proposed algorithm, which features action extraction and combination mechanisms of contributed action fragments, can find more undiscovered sensor values than a random action generation algorithm that is equal to the initial state of the proposed algorithm
The action discarding mechanism is effective for removing action fragments that do not contribute to finding undiscovered sensor values and helps stabilize the performance

Summary

Introduction

Previous machine learning algorithms [1,2] such as Q-learning use a given base action set and choose an action from the set repeatedly [3–7]. Pre-defined action sets are commonly used in reinforcement learning, where a robot will choose one action from the set and use it to execute one action. Conventional reinforcement learning methods use a pre-defined action set or acquire actions that depend on the specific task through trial and error processes. Neural networks have been used to obtain actions for achieving specific tasks [10–12]. With this approach, neural networks are given an evaluation function and they decide actions in accordance with this function when a correct teacher signal is unknown. A base action set is given in such cases

Objectives

Methods

Results

Discussion

Conclusion