Performance in Multi-Armed Bandit Tasks in Relation to Ambiguity-Preference Within a Learning Algorithm

Song-Ju Kim,Taiki Takahashi

doi:10.3389/fams.2018.00027

Song-Ju Kim, Taiki Takahashi

Open Access

https://doi.org/10.3389/fams.2018.00027

Copy DOI

Abstract

Ellsberg paradox in decision theory posits that people will inevitably choose a known probability of winning over an unknown probability of winning even if the known probability is low. One of prevailing theories which addresses the Ellsberg paradox is known as ’ambiguity-aversion’. In this study, we investigate the properties of ambiguity-aversion in four distinct types of reinforcement learning algorithms: ucb1-tuned, modified ucb1-tuned, softmax, and tug-of-war. We take as our sample a scenario in which there are two slot machines and each machine dispenses a coin according to a probability that is generated by its own probability density function (PDF). We then investigate the choices of a learning algorithm in such multi-armed bandit tasks. There are different reactions in multi-armed bandit tasks, depending on the ambiguity-preference in the learning algorithms. Notably, we discovered clear performance enhancement related to ambiguity-preference in a learning algorithm. Although this study does not directly address the issue of ambiguity-aversion theory highlighted in Ellsberg paradox, the differences between different learning algorithms suggests that there is room for further study regarding the Ellsberg paradox and decision theory.

Highlights

Neuroeconomics has been developing into an increasingly important academic discipline that helps to explain human behavior
Ellsberg paradox is a crucial topic in neuroeconomics, and researchers have employed various theories to approach and to resolve the paradox
2ln(t), s where xj(t) is the average reward obtained from machine j, nj is the number of times machine j has been played so far, and n is the overall number of plays done so far

Summary

INTRODUCTION

Neuroeconomics has been developing into an increasingly important academic discipline that helps to explain human behavior. [Gamble A] You receive $100 if you draw a red ball, [Gamble B] You receive $100 if you draw a black ball. [Gamble C] You receive $100 if you draw a red or yellow ball, [Gamble D] You receive $100 if you draw a black or yellow ball. There is tremendous potential for neuroeconomic studies to investigate the properties of decision-making through the use of AI (learning) algorithms. This study is the first attempt to investigate the properties of learning algorithms with regards to the ambiguity-preference point of view. Each machine gave rewards with individual probability density function (PDF) whose mean and standard deviations were μA (μB) and σA (σB), respectively. We hypothesize that the total rewards from probabilities generated by a PDF is the same as the total rewards directly from the same.

Ambiguity-Neutral

Ambiguity-Preference

Ambiguity-Aversion

RESULTS

CONCLUSION AND DISCUSSION