CONCLUSION
5.3 Discussion and future research directions
Our results suggest a number of possible future research directions in human elec- trophysiology and value-based decision making. As I previously discussed, dlPFC is a brain area which provides a cortical bridge between the value-coding prefrontal regions and preSMA (Luppino et al., 1993), which we found to encode an inte- grated utility signal using value-based variables such as uncertainty and novelty as components, as well as decisions themselves. Together with our finding that vmPFC predominantly encoded integrated utilities conditioned on decisions, unlike
preSMA, which encoded pre-decision components of utility for individual stimuli, this raises a number of questions on the functional organization of prefrontal cortex.
Previous neuroimaging evidence did implicate vmPFC/medial OFC in encoding integrated utility values and suggested this integration can leverage components en- coded in areas such as lateral OFC and posterior parietal cortex (Suzuki, Cross, and O’Doherty, 2017; Iigaya et al., 2020). Is it possible that this discrepancy arises from a difference in the nature of the tasks utilized? In the aforementioned examples, subjects performed behavioral paradigms mostly focused on valuation alone (e.g.
willingness-to-pay for food; subjective art appraisals), while our explore-exploit paradigm encouraged subjects to accumulate value-based evidence to perform re- ward maximizing decisions under uncertainty. Therefore, one hypothesis is that exploratory decision making induces pre-decision value integration to occur more prominently in either dlPFC or preSMA than in vmPFC/OFC, which then receive utility feedback from the more posterior value regions. Also note that dlPFC con- nects recurrently to preSMA, vmPFC, and OFC in such a way that a one-directional cascade of events across these brain regions, while appealing, is likely a functional oversimplification of this system. Another possibility is that these results are actu- ally congruent, and the previously described role of vmPFC in encoding integrated utility concerns selected stimuli rather than individual options being considered prior to a decision, as these paradigms did not have a multi-option selection com- ponent to disambiguate the two possibilities. One key testable hypothesis is that utility integration prior to value-based decision first occurs in dlPFC rather than in preSMA, which we could not probe due to limitations in choosing recording sites in human patients. Admittedly, it is also possible that pre-decision utility integration during exploration occurs in a different subset of vmPFC neurons, which we simply did not sample.
While the present thesis focused predominantly on a number of cortical areas and cortico-cortical relationships, it is crucial to note that value learning and decision making take place in a much broader circuitry involving subcortical areas such as substantia nigra, ventral tegmental area, subthalamic nucleus, striatum and thala- mus (O’Doherty, Cockburn, and Pauli, 2017). For instance, the striatum receives topographically organized cortical inputs from motor and premotor areas, OFC, and ACC, and is also one of the main targets for dopaminergic neurons from SN/VTA, which are known to encode reward prediction errors (Haber and Knutson, 2010).
Activity in human ventral striatum reflected model-free encoding of expected re- wards (Tobler et al., 2006), while dorsomedial striatum in rodents is necessary
for goal-directed learning to take place (Yin et al., 2005) altogether. Additionally, evidence from monkey electrophysiology suggests that spatial representations in striatal activity are least partially modulated by uncertainty (Yanike and Ferrera, 2014). The striatum receives projections from thalamus but also indirectly projects back to it, mediated by either the internal segment of the globus pallidus (GPi), in thedirect pathway, or the external segment of the globus pallidus (GPe), followed by the subthalamic nucleus (STN) and then GPi, in theindirect pathway(Haber, 2016).
The thalamus then projects back to cortex, thus completing the cortical-striatal- thalamic loop (Haber and Knutson, 2010). Specifically, the medial-dorsal nucleus projects substantially to OFC and dlPFC, while the ventral-anterior nucleus projects to preSMA (McFarland and Haber, 2002). The STN also receives topographically organized projections of special interest from value coding areas of cortex (including vmPFC/OFC, dlPFC, and dACC), which define the hyperdirect pathway (Haynes and Haber, 2013). Mapping the role of the human hyperdirect pathway in value learning and decision making will be a crucial endeavor for the field. Recent deep- brain stimulation work in Parkinson’s disease patients has allowed investigators to probe these subcortical targets in the human brain directly (Pouratian et al., 2012;
C. P. Mosher et al., 2021), creating an exciting new venue for reward learning studies to take place.
Additionally, while our findings support a model-based account for Pavlovian con- ditioning and the usage of learning heuristics that go beyond model-free learning in the context of social cognition, the present work did not sufficiently explore the neu- ral implementation of arbitration between multiple controllers, such as model-free vs. model-based learning algorithms for value learning or emulation vs. imitation strategies for observational learning. On the one hand, we were able to measure evi- dence for both reward prediction error and state prediction error signals in preSMA, which can serve as a substrate for computing reliability signals to mediate the ar- bitration between different learning systems. On the other hand, we did not record from key areas suggested by neuroimaging studies to be directly involved in arbi- trating multiple controllers, such as inferior lateral PFC (S. W. Lee, Shimojo, and O’Doherty, 2014), which could be a potential future target for local field potential (LFP) recordings.
Finally, the field of reinforcement learning itself has rapidly developed in the recent years, providing a theoretical basis to explain more complex phenomena, open- ing new possibilities for neuroscience to explore. For instance, distributional RL
proposes a framework for learning entire distributions for probabilistic variables, beyond expected values alone (Bellemare, Dabney, and Munos, 2017). This frame- work can be used to make neuroscientific predictions, some of which already have tentative evidence in their support. For example, one way to support distributional learning would be having a diversity of optimistic and pessimistic dopaminergic neurons, which collectively generate a distribution of reward prediction error for the same outcome; this prediction has found initial support from VTA recordings in rodents (Dabney, Kurth-Nelson, et al., 2020). Other implications from this theory still must be tested, especially as far as the human brain is concerned, including how the brain leverages the knowledge entire reward distributions to make decisions, beyond their expected value, or how distributional learning interfaces with model- based reward learning and goal-directed planning (Lowet et al., 2020). Additionally, deep reinforcement learning has developed as a method to provide new solutions to complex problem solving in high dimensional environments (Mnih et al., 2013;
Silver et al., 2016). While deep neural networks have been successfully used as a structural and functional model for the processing of visual information (Yamins and DiCarlo, 2016), whether deep reinforcement learning models are efficient at predicting decision-making activity in the human brain is an active research area, in which encouraging results started to be established, connecting artificial value representations in neural activity across cortex (Cross et al., 2021).