Abstract
Ellsberg paradox in decision theory posits that people will inevitably choose a known probability of winning over an unknown probability of winning even if the known probability is low. One of prevailing theories which addresses the Ellsberg paradox is known as ’ambiguity-aversion’. In this study, we investigate the properties of ambiguity-aversion in four distinct types of reinforcement learning algorithms: ucb1-tuned, modified ucb1-tuned, softmax, and tug-of-war. We take as our sample a scenario in which there are two slot machines and each machine dispenses a coin according to a probability that is generated by its own probability density function (PDF). We then investigate the choices of a learning algorithm in such multi-armed bandit tasks. There are different reactions in multi-armed bandit tasks, depending on the ambiguity-preference in the learning algorithms. Notably, we discovered clear performance enhancement related to ambiguity-preference in a learning algorithm. Although this study does not directly address the issue of ambiguity-aversion theory highlighted in Ellsberg paradox, the differences between different learning algorithms suggests that there is room for further study regarding the Ellsberg paradox and decision theory.
Highlights
Neuroeconomics has been developing into an increasingly important academic discipline that helps to explain human behavior
Ellsberg paradox is a crucial topic in neuroeconomics, and researchers have employed various theories to approach and to resolve the paradox
2ln(t), s where xj(t) is the average reward obtained from machine j, nj is the number of times machine j has been played so far, and n is the overall number of plays done so far
Summary
Neuroeconomics has been developing into an increasingly important academic discipline that helps to explain human behavior. [Gamble A] You receive $100 if you draw a red ball, [Gamble B] You receive $100 if you draw a black ball. [Gamble C] You receive $100 if you draw a red or yellow ball, [Gamble D] You receive $100 if you draw a black or yellow ball. There is tremendous potential for neuroeconomic studies to investigate the properties of decision-making through the use of AI (learning) algorithms. This study is the first attempt to investigate the properties of learning algorithms with regards to the ambiguity-preference point of view. Each machine gave rewards with individual probability density function (PDF) whose mean and standard deviations were μA (μB) and σA (σB), respectively. We hypothesize that the total rewards from probabilities generated by a PDF is the same as the total rewards directly from the same.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.