256 x 128
128 x 64
64 x 32
32 x 16
16 x 8
8 x output
OutputOutput
In pu t fe atu re s In pu t fe atu re s
Select Layer
Se le ct fe atu re s
Input/Output features Selected layer based on Gumbel Softmax
FC layer + LeakyRelu
d x k
(a) (b)
(c)
Figure 6.3: The model used to characterize brain and body relationship (a) shows the legend of each block used in this figure. (b) shows the plain deep neural network including 6 blocks. Each block includes a fully connected layer with 256, 128, 64, 32, 16, 8 input features respectively. (c) shows the proposed model architecture. Except for the same blocks, the select layer is inserted between input features and the first block to select associated features. d represents the number of input features per each observation and k represents the number of selected features decided by the user.
are divided into training, validation, and test cohorts. The tested epoch is selected based on the performance of the validation cohort. To reduce the effect of feature scaling, each feature and predicted response within the training cohort is normalized by using Eq. 6.3.
FF
Fnorm=FFF−f¯
σ (6.3)
where ¯f is the average value ofFFF andσ is the standard deviation ofFFF. By performing normalization, the feature is normalized to a distribution with a mean of 0 and a standard deviation of 1. The mean ¯fand standard deviationσ are applied to validation and test cohort to normalize the corresponding feature to alleviate data distribution shift between training and other cohorts.
6.3.2 Characterize linear relationship
Regarding linear characterization, the linear regression model from the statsmodel package [140] is adopted to characterize linear relationships. The control variables are demographic information, such as sex, age, and mild cognitive impairment (MCI) status. Additionally, when estimating the volume of the hippocampus, whole brain volume is included as a control variable. The input features are potentially high-dimensional, with up to 133 dimensions. To address this, Principal Component Analysis (PCA) is applied to perform feature reduction by extracting the component with the largest variance. PCA is performed on the training dataset, and the resulting transformation matrix is applied to the validation and test cohorts to ensure con- sistent features are obtained. The linear regression formula for predicting body features from brain features takes the form:
out put f eature=Intercept+β0age+β1sex+β2MCI+β3input f eature+ε (6.4)
6.3.3 Metrics
Explained variance score (EVS) is used to explain the dispersion of errors of prediction for a given dataset prediction, which is suitable for regression tasks. EVS has the upper bound of 1. When EVS equals 1, it means perfect prediction. If EVS is smaller than 0, it means the prediction is worse than using the average of ground truth as the prediction. ˆyyyrepresents the prediction, andyyyrepresents the ground-truth. The formula for explaining variance is written as:
EV S(yyy,yyy) =ˆ 1−Var(yyy−yyy)ˆ Var(yyy) Var(yyy) =
∑
i
(yi−y)¯2 (6.5)
where ¯yis the average ofyyyandyiis theith element ofyyy.
6.3.4 Validation of the Gumbel softmax
The purpose of this section is to investigate the ability of Gumbel softmax to separate meaningful signals from noise, as well as the potential to enhance the regularization of neural networks and improve the understanding of the input-output relationship. To test the hypothesis, we followed the approach proposed in [26] and created synthetic data with an explicit non-linear mapping from input to output. Specifically, we randomly generated 133 input features from a standard Gaussian distribution and selected 5 out of the 133 features that are related to 5 output responses. We then used the following formula to express the non-linear relationship between the input and output.
y1=sin(x1) +3 exp(x4) +η y2=4 cos(x1) +3|x2|+η
y3=exp(x1) +5 sin(x2) +6|x3|+η y4=exp(x1) +10 sin(x4) +6 exp(x5) +η
y5=4|x5|+4 exp(x3) +η (6.6)
whereη∼N (0,1)is regarded as noise added into the non-linear relationship.
To train our model, we generated 15,000 samples for the training cohort, 1,000 samples for the validation cohort, and 9,000 samples for the test cohort. The Eq 6.3 is used to normalize features and responses. We varied the percentage of training data used to train the model and analyzed its performance. Figure 6.4 displays the results. As depicted in Figure 6.4 (a), our proposed model consistently outperformed the plain MLP model in terms of EVS, except when using only 1 percent of the training data. Gumbel Softmax is employed to select 10 relevant features and selected features are always the real signals after removing duplicates, except when using only 1 percent of the training data. In conclusion, the Gumbel Softmax method improved our model’s prediction ability, by accurately selecting real signals, as evidenced by the results in Figure 6.4(b).
(a) (b) Data Percentage
Explain variance score
Data Percentage
Success rates of extracting real signals
Figure 6.4: The resulting plot for the toy example and success rate of selecting features in each run with different percentages of toy data. (a) shows the mean explain variance score across 5 output variables. Com- pared with the MLP model, the only difference is that Gumbel has a selected layer to subset real signals. (b) shows the success rate of real signals chosen by Gumbel Softmax.
6.3.5 Choice of number of selecting features
The number of subset features is an important hyperparameter that might influence the performance of neural networks. To choose suitablek, we performed the experiment withkvalues starting from 10 to 130 with step 10 to predict body region features(muscle area, cortical bone area, internal bone area, subcutaneous fat area, and intermuscular fat area) from brain volumes. The result can be found in Fig. 6.5. It can be observed that even though we plan to subset 130 brain features, only 12, 8, 15, 20, and 30 unique features are selected for predicting muscle, cortical bone, internal bone, subcutaneous fat, and intermuscular fat respectively. Most of the selected features are repeated. In such a scenario, the input features are over-trained and cannot reflect the effect of real signals. Based on this observation, we choose the upper bound of k as 10 when we use brain features to predict body features.
Input brain features Input brain features Input brain features Input brain features Input brain features
Number of selected features
Muscle area prediction Internal bone prediction Cortical bone prediction Subcutaneous fat prediction Intermuscular fat prediction
Figure 6.5: The bar plot of expected selecting features and exact features Gumbel-softmax select. The five bar plots represent using brain features to predict body features including muscle, cortical bone, internal bone, subcutaneous fat, and intermuscular respectively. When we increase the number of selected features, the exact number of selected features is less than expected since there are duplicate selected features
6.3.6 Real dataset
The BLSA dataset (Philips 3T Achieva) includes T1-weighted images acquired using an MPRAGE sequence (TE = 3.1 ms, TR = 6.8 ms, slice thickness = 1.2 mm, number of Slices = 170, flip angle = 8 deg, FOV
= 256×240mm, acquisition matrix = 256×240, reconstruction matrix = 256×256, reconstructed voxel size = 1×1mm). The thigh single slice has resolution 512×512 and pixel size = 1x1mm. We follow the pipeline in Chapter 4 to divide the thigh slice into left and right images. Then the segmentation model is deployed to perform body composition. The demographic information for the BLSA dataset can be found in Figure 6.6.
20 40 60 80 100
age 0
50 100 150 200 250
Count
male female
sex 0
200 400 600 800 1000 1200
Count
2 4 6 8 10
sess 0
50 100 150 200 250 300 350
Count
Figure 6.6: The histogram for demographic information of BLSA dataset. (a) shows the age distribution of BLSA subjects. The BLSA dataset is used to investigate the aging effect. Most people are elder. (b) shows the sex distribution of BLSA subjects. (c) shows the distribution of visit numbers per subject. The maximal number of visits of one subject is 10.
6.3.7 Predict body composition area using brain region volumes
The body composition refers to muscle, cortical bone, internal bone, subcutaneous fat, and intermuscular fat.
In this section, brain region volumes are used to predict those areas by using the BLSA dataset. The whole BLSA dataset is divided into training, validation, and test cohorts randomly based on subjects. The training cohort has 1295 sessions from 553 subjects. The validation cohort includes 412 sessions from 184 subjects and the test cohort includes 460 sessions from 186 subjects. The input features and predicted variables are normalized using Eq 6.3. To have the same experimental setting as linear regression, we concatenate age, sex, and MCI to the feature map after the select layer. We feed 132 brain region features to the select layer and the following architecture. We iterate the number of select features from 1 to 10 with step 1 and choose the best performance of the validation dataset as the optimal model. The prediction results are evaluated by EVS. The Atlman-Blank plot is shown in Fig 6.7 to visualize the relationship between prediction and target ground truth in linear and non-linear. The explained variance score is shown in table 6.1
Figure 6.7: The bland-Altman plot of linear regression and proposed nonlinear methods. Each plot has the difference between prediction and truth as the y-axis and ground truth as the x-axis. The gray color represents the linear regression and the red color represents the proposed methods. We can observe the proposed meth- ods have a smaller limit of agreement in muscle area prediction compared with linear regression as pointed out by the yellow arrow. We also find that there are two obvious clusters controlled by sex in cortical bone and subcutaneous fat prediction.
Region Linear regression
(Linear relationship)
Proposed model (Non-linear relationship)
Muscle area 0.598* 0.639
Cortical bone area 0.527* 0.542
Internal bone area 0.213 0.215
Subcutaneous fat area 0.282 0.226
Intermuscular fat area 0.069 0.050
Table 6.1: The explain variance score for linear and non-linear model. * represents p-value <0.05 and indicates transformed brain features are related to the target variable significantly. The bold explain variance score means better prediction ability.
From Table 6.1, we observe that from linear regression transformed brain features are significantly asso-
Region Female Linear Regression (Linear relationship)
Female Proposed Model (Non-linear relationship)
Male Linear Regression (Linear relationship)
Male Proposed Model (Non-linear relationship)
Muscle area 0.413 0.395 0.395* 0.373
Cortical bone area -0.031 -0.092 0.044* -0.122
Internal bone area 0.009 0.074 0.121* 0.188
Subcutaneous fat area 0.009 0.020 -0.031 0.083
Intermuscular fat area -0.082 -0.051 0.003 0.007
Table 6.2: The EVS for linear and non-linear model based on male and female. * represents p-value<
0.05 and indicates transformed brain features are related to the target variable significantly. The bold explain variance score means better prediction ability.
ciated with muscle and cortical bone area. The proposed model has a better EVS of 0.639, 0.542, and 0.215 than the linear model in muscle, cortical bone, and internal bone area. From Fig 6.7, the linear and non-linear plot of cortical bone and subcutaneous fat has two obvious clusters, which are separated by sex.
To remove the sex effect, we divided the whole BLSA database based on sex and followed previous data split rules. The linear regression formula is shown as:
out put f eature=intercept+β0age+β1MCI+β2input f eature+ε (6.7) Table 6.2 displays the EVS for males and females in both linear and non-linear relationships for four different body composition areas. With the exception of muscle, the EVS for all other body compositions is either close to zero or negative, regardless of whether a linear or non-linear model is used. For the muscle area, the linear model has a slightly better explained variance score of 0.413 and 0.395 compared with the non-linear model.
6.3.8 Predict hippocampus volume using body composition area
As mentioned in [134, 173, 114], body composition is an important indicator of cognitive function. The hip- pocampus is highly associated with cognitive function[14]. We hypothesize that body metrics are associated with hippocampus structure. Thus, the body features are used to predict hippocampus volume to test this hy- pothesis. Similar to section 6.3.5, the whole brain volume (aggregating all region volumes of SLANT), age, sex, and MCI are concatenated to the feature map after the select layer in the non-linear model. We iterate the number of select features from 1 to 5 with step 1 and choose an optimal model based on performance on the validation cohort. As for the linear model, PCA is applied to perform feature reduction on body composition features and takes the form of Eq. 6.8
out put f eature=intercept+β0age+β1sex+β2MCI+β3W holebrain+β4input f eature+ε (6.8)
Two experiments are conducted to examine the impact of whole brain volume on the relationship between transformed body features and left and right hippocampus volume. The first experiment does not include whole brain volume, while the second experiment does. The resulting EVS is presented in Table 6.3 and bland-Altman plot is shown in Figure 6.8
Wi th ou t w ho le b ra in v ol um e Wi th w ho le b ra in v ol um e
Figure 6.8: The bland-Altman plot of linear regression and proposed nonlinear methods. Each plot has the difference between prediction and truth as the y-axis and ground truth as the x-axis. The gray color represents the linear regression and the red color represents the proposed methods. As pointed out by the arrow, we can observe the proposed methods have a smaller limit of agreement in left hippocampus prediction compared with linear regression when we include whole brain volume into the estimation process.
From table 6.3, it is apparent that without whole brain volume, transformed body features are significantly associated with both left and right hippocampus volume in the linear model. After introducing whole brain volume, both the linear and non-linear models produce better EVS. The non-linear model outperforms the linear model in terms of EVS, except for right hippocampus volume.
Linear regression with whole brain volume
Proposed model with whole brain volume
Linear regression without whole brain volume
Proposed model without whole brain volume
Left hippocampus volume 0.415 0.443 0.215* 0.221
Right hippocampus volume 0.372 0.446 0.227* 0.194
Table 6.3: The explain variance score for linear and non-linear model for left and right hippocampus volume prediction based on body composition areas. * represents p-value<0.05 and indicates transformed body features are related to the target variable significantly. The bold explain variance score means better prediction ability.