Computational Model-Based Functional Magnetic Resonance Imaging of Reinforcement Learning in Humans
Abstract
The aim of this thesis is to determine the changes in BOLD signal of the human
brain during various stages of reinforcement learning. In order to accomplish that
goal two probabilistic reinforcement-learning tasks were developed and assessed
with healthy participants by using functional magnetic resonance imaging (fMRI).
For both experiments the brain imaging data of the participants were analysed by
using a combination of univariate and model–based techniques.
In Experiment 1 there were three types of stimulus-response pairs where
they predict either a reward, a neutral or a monetary loss outcome with a certain
probability. The Experiment 1 tested the following research questions: Where does
the activity occur in the brain for expecting and receiving a monetary reward and a
punishment ? Does avoiding a loss outcome activate similar brain regions as gain
outcomes and vice a verse does avoiding a reward outcome activate similar brain
regions as loss outcomes? Where in the brain prediction errors, and predictions
for rewards and losses are calculated? What are the neural correlates of reward and
loss predictions for reward and loss during early and late phases in learning? The
results of the Experiment 1 have shown that expectation for reward and losses
activate overlapping brain areas mainly in the anterior cingulate cortex and basal
ganglia but outcomes of rewards and losses activate separate brain regions,
outcomes of losses mainly activate insula and amygdala whereas reward activate
bilateral medial frontal gyrus. The model-based analysis also revealed early versus
late learning related changes. It was found that predicted-value in early trials is
coded in the ventro-medial orbito frontal cortex but later in learning the activation
for the predicted value was found in the putamen.
The second experiment was designed to find out the differences in
processing novel versus familiar reward-predictive stimuli. The results revealed that dorso-lateral prefrontal cortex and several regions in the parietal cortex
showed greater activation for novel stimuli than for familiar stimuli. As an
extension to the fourth research question of Experiment 1, reward predictedvalues
of the conditional stimuli and prediction errors of unconditional stimuli
were also assessed in Experiment 2. The results revealed that during learning there
is a significant activation of the prediction error mainly in the ventral striatum with
extension to various cortical regions but for familiar stimuli no prediction error
activity was observed. Moreover, predicted values for novel stimuli activate mainly
ventro-medial orbito frontal cortex and precuneus whereas the predicted value of
familiar stimuli activates putamen. The results of Experiment 2 for the predictedvalues
reviewed together with the early versus later predicted values in Experiment
1 suggest that during learning of CS-US pairs activation in the brain shifts from
ventro-medial orbito frontal structures to sensori-motor parts of the striatum.