Reinforcement learning is a fundamental cognitive process operating pervasively, from our birth to our death. The core idea is that past experience gives us the ability of learning to improve our future choices in order to maximize the occurrence of pleasant events (rewards) and to minimize the occurrence of unpleasant events (punishments). Within the reinforcement learning framework, one of the most fundamental and timely questions is whether or not the values are learned and represented on an absolute or relative (i.e., context-dependent) scale. The answer to this question is not only central at the fundamental and theoretical levels, but also necessary to understand and predict why and how human decision-making often deviates from normative models, leading to sub-optimal behaviors as observed in several psychiatric diseases, such as addiction.
In an attempt to fill this gap, throughout the work carried out during this PhD, we developed existing models and paradigms to probe context-dependence in human reinforcement learning. Across two experiments, using probabilistic selection tasks, we showed that the choices of healthy volunteers displayed clear evidence for relative valuation, at the cost of making sub-optimal decisions when the options are extrapolated from their learning context, suggesting that economic values are rescaled as a function of the range of the available options. Moreover, results confirmed that this range-adaptation induces systematic extrapolation errors and is stronger when decreasing task difficulty. Behavioral analyses, model fitting and model simulations convergently led to the validation of a dynamically range-adapting model and showed that it is able to parsimoniously capture all the behavioral results. Our results clearly indicate that values are not encoded on an absolute scale in human reinforcement learning, and that this computational process has both positive and negative behavioral effects. In an attempt to explore the link to -an impairment of- this process in reward-related psychiatric diseases, we performed a meta-analysis based on the valence bias observable in several pathologies. Preliminary results suggest that healthy volunteers learn similarly from rewards and punishments, whereas it is not the case for pathologies such as Parkinson’s disease or substance-related disorders. In a large-scale experiment, coupled with a transnographic approach used in computational psychiatry, we found that the parameters of our model could not be directly linked with different dimensions of psychiatric symptoms, including obsessive compulsive disorders, social anxiety, and addiction. Further work will improve our modeling tools to better account for behavioral variance. In the long term, these analyses will potentially help to develop new tools to characterize phenotypes of several pathologies and behavioral disorders, as well as improve patients’ treatment at the individual level.
Composition du jury
Président : Mathias PESSIGLIONE, Sorbonne University
Rapporteur : Claire GILLAN, University of Dublin
Rapporteur : Sebastian GLUTH, University of Hamburg
Directeur de thèse : Stefano PALMINTERI, ENS - PSL Research University
Informations de connexion
PhD Defense - Sophie BAVARD
ven. 9 avr. 2021 14:00
Code d'accès: 340-681-133