Placing Approach-Avoidance Conflict Within the Framework of Multi-objective Reinforcement Learning.

Enkhzaya Enkhtaivan,Joel Nishimura,Amy Cochran

doi:10.1007/s11538-023-01216-6

Abstract

Many psychiatric disorders are marked by impaired decision-making during an approach-avoidance conflict. Current experiments elicit approach-avoidance conflicts in bandit tasks by pairing an individual's actions with consequences that are simultaneously desirable (reward) and undesirable (harm). We frame approach-avoidance conflict tasks as a multi-objective multi-armed bandit. By defining a general decision-maker as a limiting sequence of actions, we disentangle the decision process from learning. Each decision maker can then be identified as a multi-dimensional point representing its long-term average expected outcomes, while different decision making models can be associated by the geometry of their 'feasible region', the set of all possible long term performances on a fixed task. We introduce three example decision-makers based on popular reinforcement learning models and characterize their feasible regions, including whether they can be Pareto optimal. From this perspective, we find that existing tasks are unable to distinguish between the three examples of decision-makers. We show how to design new tasks whose geometric structure can be used to better distinguish between decision-makers. These findings are expected to guide the design of approach-avoidance conflict tasks and the modeling of resulting decision-making behavior.

Full Text