| Last Updated:
Creating DOI. Please wait...
In everyday life people need to make choices without full information about the environment, which poses an explore-exploit dilemma in which one needs to balance the need to learn about the world and the need to obtain rewards from it. The explore-exploit dilemma is often studied using the multi-armed restless bandit task, in which people repeatedly select from multiple options, and human behaviour is modelled as a form of reinforcement learning via Kalman filters. Inspired by work in the judgment and decision-making literature, we present two experiments using multi- armed bandit tasks in both static and dynamic environments, in situations where options can become unviable and vanish if they are not pursued. A Kalman filter model using Thompson sampling provides an excellent account of human learning in a standard restless bandit task, but there are systematic departures in the vanishing bandit task. We estimate the structure of this loss aversion signal and consider theoretical explanations for the results.
CC-By Attribution 4.0 International