Introduction - SINGLE NEURON CORRELATES OF MODEL-BASED PAVLOVIAN CONDITIONING IN THE HUMAN BRAI

SINGLE NEURON CORRELATES OF MODEL-BASED PAVLOVIAN CONDITIONING IN THE HUMAN BRAIN

3.2 Introduction

C h a p t e r 3

SINGLE NEURON CORRELATES OF MODEL-BASED

humans, the role of Pavlovian conditioning has been described in processes such as phobias (Davey, 1992) and addiction (Poulos, Hinson, and Siegel, 1981), while learning theories derived from Pavlovian conditioning, such as the Rescorla-Wagner model (R. Rescorla and Wagner, 1972), have been successfully adapted to describe a variety of associative learning processes (Siegel and Allan, 1996).

Additionally, Pavlovian conditioning behavior exhibits a number of core properties, all of which have been documented in human studies: latent inhibition, which consists in hampered conditioning when a stimulus-outcome pairing is attempted after the stimulus has been previously presented without the outcome (Siddle, Rem- ington, and Churchill, 1985; Lubow and Gewirtz, 1995); blocking, which inhibits the association between a new stimulus and an outcome when the new stimulus is presented alongside another cue which is already fully predictive for the outcome (Arcediano, Matute, and R. R. Miller, 1997); sensory pre-conditioning, in which stimulus A can start eliciting a conditioned response if it had been previously paired with an initially neutral stimulus B, when stimulus B itself is subsequently associated with the outcome (Brogden, 1947; White and Davey, 1989); and higher order conditioning, which allows for stimuli to elicit a conditioned response when they are presented in a sequence with other stimuli which are themselves predictive of the outcome (Seymour et al., 2004; Pauli, Larsen, et al., 2015; Pauli, Gentile, et al., 2019). Ultimately, a unifying framework for these properties is that Pavlovian associative learning hinges on how informative a stimulus is about the probability of a subsequent outcome.

Computational theories of Pavlovian conditioning have been derived from models of instrumental conditioning, most prominently through the class of model-free (MF) reinforcement learning models (Daw, Niv, and Dayan, 2005; O’Doherty, Cockburn, and Pauli, 2017). In this framework, stimulus values are learned based on stimulus-outcome or action-outcome associations alone, without depending on a cognitive map for the transition structure between stimuli. This class of models can be thought of as developments from Thorndike’s law of effect, stating that rewarded actions are more likely to be repeated, while punished actions are more likely to be avoided (Thorndike, 1898). However, a learning framework which solely employs stimulus-outcome associations is insufficient to explain Pavlovian phenomena such as sensory pre-conditioning, which requires stimulus-stimulus associations to occur, independent of outcome presentation, as well as sensitivity to revaluation or devaluation, which may occur regardless of new outcome pairings

(Dayan and Berridge, 2014; Pool et al., 2019). Such phenomena can be modeled more accurately by a model-based (MB) account of learning, in which a internal cognitive map (Tolman, 1948) is developed to provide the agent with an internalized transition structure describing the probabilities of moving between states, requiring that the identities of stimuli and outcomes are tracked. In this framework, an agent can seek out stimuli which lead to newly valued outcome states even before pairing these stimuli with rewards, as long as the transition probabilities between states are known. In a purely model-free learning framework, on the other hand, an agent would need to be re-exposed to newly valued outcomes, and only then it would be possible to update the values of the cues associated with these outcomes.

Even though behavioral evidence suggests the model-based framework can provide an accurate account of Pavlovian conditioning, most previous work has focused on model-free mechanisms such as the Rescorla-Wagner model (R. Rescorla and Wagner, 1972), and temporal difference (TD) learning (Sutton, 1988), already yielding valuable insight about the neural implementation of learning processes.

For instance, in TD models, the learning signal which updates stimulus values following outcome is a reward prediction error signal (RPE), which has been found to correlate with the activity of dopaminergic neurons in the ventral tegmental area (VTA) and the substantia nigra of rats (Schultz, Dayan, and Montague, 1997), and with BOLD signal in the human ventral striatum (O’Doherty, Dayan, et al., 2003), as obtained with fMRI.

Still, more recent studies have been starting to map how model-based Pavlovian conditioning occurs in the brain. In rats, a Pavlovian paradigm revealed that intact activity in both ventral striatum and orbitofrontal cortex (OFC) was necessary for model-based learning to take place (McDannald et al., 2011). Another rat study found that previously unpleasant Pavlovian cues associated with a salty stimulus could instantly become appetitive when the animal encountered them in a state of sodium depletion (M. J. Robinson and Berridge, 2013). This behavioral change, which is consistent with model-based Pavlovian conditioning, was accompanied by an increase in Fos activation in a mesocorticolimbic circuit including VTA, nucleus accumbens and OFC.

In humans, an fMRI study found correlations between amygdala activity and com- ponents of a model-based Pavlovian inference model (Prévost, McNamee, et al., 2013). Specifically, model-based estimates of a cue’s expected value (EV) correlated with activity in basolateral amygdala (BLA) during an appetitive session,

and correlated with activity in the centromedial complex of the amygdala during an aversive session. Another human study investigated the neural representation of stimulus-stimulus associations which could be a substrate for model-based Pavlo- vian conditioning (Pauli, Gentile, et al., 2019). This study used a sequence of two cues (distal and proximal) which had a probabilistic transition structure, followed by an appetitive or neutral outcome. The authors found that the decoding accuracy for stimulus identity in caudate nucleus correlated with the explicit knowledge that participants had about stimulus-stimulus associations. Crucially, a classifier trained using OFC activity to decode the identity of proximal cues during proximal cue presentation performed better than chance when tested during distal cue presentation, indicating that OFC already encoded predictive information about the identity of the proximal cue at the time of the distal cue, which suggests a neural substrate for stimulus-stimulus associations. Other studies in humans also found a link between OFC activity and outcome identity representation during reward learning (Klein-Flügge et al., 2013; Howard et al., 2015).

In this study, we leveraged single neuron recordings in patients undergoing treatment for refractory epilepsy to investigate a number of open questions on how model- based Pavlovian conditioning occurs in the human brain. Specifically, is there evidence for encoding of stimulus-stimulus associations and stimulus identities in vmPFC neurons, which are fundamental in the construction of cognitive maps? Can we map how amygdala neurons act in tandem with prefrontal neurons in predictive value coding, which is a key feature of Pavlovian conditioning? Additionally, how is outcome feedback encoded, alongside with the surprise signals which are required to update cognitive maps during learning?

Dalam dokumen in the human brain (Halaman 65-68)