The Neural Mechanisms of Value Construction

I am also incredibly grateful for all the friends I made in the O'Doherty lab and the guidance I received from them at every step of the process. I also want to thank Dalton Combs and Glenn Fox, who were incredible mentors to me at USC and motivated me to go to school in the first place. However, much less is known about how this value code is constructed by the brain in the first place.

A linear attribute integration model succinctly illustrates how subjective values can be calculated based on a weighted combination of the food's constituent nutritional attributes. When valuing bundles of items, the constituent items themselves become the attributes and their values are integrated with a subadditive function to construct the value of the bundle. These features arise from nonlinear transformations of the sensory input that connect perception to action and reward.

LIST OF TABLES

INTRODUCTION

Furthermore, before making a decision, the brain must construct a representation of the decision problem (Rangel, Camerer, & Montague, 2008). Therefore, the value in each state is equal to the reward received in that state plus the value of the successor state s𝑡+1. Because of the recursion, the value of the s𝑡+1 includes the sum of all future rewards on s𝑡+1, s𝑡+2.

By starting from the evaluation of the value of the next state, learning can efficiently propagate backward from rewarding states to the states before them. BOLD signals increase the brightness of volumetric pixels (voxels) in an MRI image, allowing cognitive neuroscientists to record from the entire brain simultaneously into the scanner, with spatial resolution far superior to other neuroimaging methods such as electroencephalography (EEG) and magnetoencephalography. (MEG). For example, this method identified neural correlates of prediction errors from a temporal difference learning model in the ventral striatum and OFC (John P. O'Doherty, Dayan, Friston, et al., 2003).

ELUCIDATING THE UNDERLYING COMPONENTS OF FOOD VALUATION IN THE HUMAN ORBITOFRONTAL CORTEX

Additionally, we examined whether distinct patterns of voxel activity in the lateral OFC represent each of the four subjective nutrient factors. Supplementary Figure 4l), although a subset of individual target factors could be significantly deciphered in the lateral OFC. Second, we compared the decoding accuracy of low-level visual features with those of subjective nutrient factors.

A behavioral RDM was created based on the correlation distance for each item pair in bundles of the four subjective dietary factors (fat, carbohydrate, protein, and vitamins; and for each nutritional factor, rating values were z-normalized across items). Weights of voxels in the classifiers for each of the subjective diet factors obtained from the ROI analyzes (see Figure 2.3a). Weights of voxels in the classifiers for each of the subjective dietary factors obtained from the search analyzes (see Figure 2.3c).

The behavioral RDM is created from the correlation in bundles of the four subjective nutritional factors for each commodity pair. The decoding accuracy of the low-level visual features (average of the eight features) and the subjective nutritional factors (average of the four factors identified as value predictors) are plotted for the lOFC, PPC, and V1 (BA17) anatomical ROIs ( n = 23 participants).

Figure 2.1: Experimental task and behavior.

INVESTIGATING THE NEURAL MECHANISMS OF BUNDLE VALUATION

Recent evidence suggests that decision-making regions in the brain implement a relative value code by adapting to the temporal and spatial context of the choice set (Louie, Glimcher, & Webb, 2015). Instead, there is more density below the Y=X diagonal, suggesting that bundle value is an additive function of the individual item values. Altogether, these behavioral results show that the value of a bundle is calculated as a subadditive combination of the individual item values.

A central question about value computations in the brain is whether the same neural resources encode the value of a stimulus independent of the category of the stimulus or the broader context. It was found that bundle value is calculated with a subadditive function of the individual item values. Linear and non-linear regression analyzes were performed to model how bundle value is calculated as a function of the values of the constituent elements of the bundle (𝐵𝑢𝑛𝑑 𝑙 𝑒𝑉 𝑎𝑙 𝑢 𝑚 𝑒 𝑒 𝑒 = 𝑒 𝑎𝑙 𝑢 𝑒 𝑠)).

Differences in prediction accuracy between absolute value and relative value were tested against chance with a nonparametric version of the Paired T-test, the two-sided two-sample Wilcoxon signed rank test P < 0.05 and FDR-corrected for multiple comparisons (ROIs) ( q = 0.05). Differences in the correlations between absolute value and relative value were tested against chance with a nonparametric version of the Paired T-test, the two-sided two-sample Wilcoxon signed rank test P<0.05 and FDR-corrected for multiple comparisons (ROIs) ( q = 0.05). Three models were constructed to predict the value of a bundle as a function of the individual item values: a linear model, a nonlinear power model, and a nonlinear logarithmic model.

All three models show that the bundle value is a subadditive function of the individual item values as they extend below the Y=X line. A depiction of the distribution of value by state with absolute value and relative value codes. An absolute value code (left) represents value according to participants' WTP bids, an incentive-compatible measure of the subjective value of an item or bundle.

A relative code adapts to the context of the task and puts values from different distributions on the same scale.

USING DEEP REINFORCEMENT LEARNING TO REVEAL HOW THE BRAIN ENCODES ABSTRACT STATE-SPACE

REPRESENTATIONS IN HIGH-DIMENSIONAL ENVIRONMENTS

We next investigated the relationship between the features encoded in the hidden layers of the DQN and activity patterns in the human brain while human participants played the Atari games. In addition, UAE and PCA models explain significant variance after controlling for the effects of the DQN layers. An exemplary disparity matrix (DSM; see Methods) for these hand-drawn features is illustrated in Figure 4.5A next to the DSM of the last convolutional layer in DQN (layer 3) for the same game frames.

This suggests that similarly to DQN, the representation of brain state space in Pong involves the encoding of high-level features that track the spatial positions of relevant objects. The hidden layers of the DQN encode a state space to compute the Q values at the output of the action evaluation network. Through backpropagation (Springenberg et al., 2014), we can see that the filter detects cars and road sides, which are useful features to act on in the game (Figure 4.7E).

Thus, it often has to project inputs that are far apart in pixel space into similar regions of latent state space if an agent is to act similarly across them (illustrated in Figure 4.8A). This leads to a lower-dimensional, compact, abstract representation that projects similar game situations into the same part of the state space as depicted in Figure 4.8A. All regions of interest consistently preferred DQN layers 3 and 4 over the first two network layers.

For Pong, the filter analysis provided more evidence that this common state space represents high-level features, such as the spatial positions of the relevant objects. The accuracy of each model in each participant exceeded the accuracy of the maximum value in the null distributions. The absolute value of the coefficients in the logistic regression model for decoding human actions averaged across layers.

The absolute value of the coefficients by stratum in the coding model analysis averaged across participants. Models were trained to maximize the lower evidence bound (ELBO) on the log-likelihood of the data. Boxplots show distributions of the upper 20th percentile prediction accuracies for each model.

DISCUSSION

Similarly, the broader implications of each of the projects presented in this thesis raise exciting new research questions. In Chapter 3, we describe how sets of items are evaluated as an additive function of the values of the constituent items. Our results also show that value representation in the PFC rescales the distribution of values in a context when one moves between levels of the valuation hierarchy.

Therefore, this canonical neural computation can be applied to multiple steps of the decision-making process (Carandini and Heeger, 2012). This process is repeated over and over to build a hierarchical set of features that are useful for the task objectives. For Pong, this shared space represents high-level features such as the spatial positions of relevant objects in the game.

These characteristics are representative of the factors that generate the data and the elements of the environment that can be influenced or controlled. Important factors for generating the data encoded in Pong's abstract state space include the spatial positions of the ball and the paddle. This information must be carefully extracted from the input, as the geometry of the pixel space reflects low-level visual properties more than these object features.

In contrast, if it has an internal representation of higher-level concepts such as the positions of cars on the road in Enduro and the balls and paddles in Pong, it is more robust to dramatic changes in low-level visual properties (which happen to people all the time, even because of something like something as simple as a change in ambient light levels). This concept ties in with the idea of factorized representations, which isolate structural representations of the world from the raw sensory information they are associated with. Our findings highlight the contribution of the dorsal visual stream, particularly the parietal cortex, as encoding abstract representations of state space during Atari game play.

Thus, the involvement of the parietal cortex may be particularly important in conditions where rapid visuomotor integration is required to perform the task.

BIBLIOGRAPHY

An integrative framework for sensory, motor, and cognitive functions in the posterior parietal cortex”. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream." Aesthetic preference for art arises from a weighted integration rather than hierarchically structured visual features in the brain".

Comparing apples and oranges: Using reward-specific and reward-general subjective value representation in the brain”. The representation of economic value in the orbitofrontal cortex is invariant to menu changes. In vivo delineation of subdivisions of the human amygdaloid complex in a high-resolution array template.

Automatic anatomical labeling of activations in SPM using macroscopic anatomical parcellation of MNI MRI of a single subject's brain.