Explainable AI in Medical Imaging: Interpreting Multi-Modality Inference with Neuroimaging and EHR

Subject loadings for this component represent a statistically significant (p < 0.05, corrected) difference between the control and mTBI populations. C) An independent component projected back into the connectivity data space; component is visualized on a representative T1w volume where similarly stained areas are connected (see Section 2.4 for a full description of the visualization procedure). Latent space visualizations consisted of a 2-dimensional tSNE projection of 32-dimensional latent space embeddings of the test cohort; the color in this visualization indicated the individual gender.

Introduction

Overview
Machine Learning Inference for Medical Image Analysis

Classic machine learning approaches
Deep learning approaches

Multi-Modal Image Processing and Patient Context

Multi-modal image processing
Patient context: analyzing electronic health records
Challenges

Explainable AI in Medical Image Analysis

Explanation vs. Interpretation
Post-hoc explainability
Intrinsic explainability
Challenges

Test Bed Applications

Mild traumatic brain injury
Developmental disabilities
Mild cognitive impairment

Contributed Work

Contribution 1: Interpretable Machine Learning with MRI and EHR
Contribution 2: Interpretable Deep Learning with MRI and EHR
Contribution 3: Interpretable Multi-Modal Modeling for MRI and EHR
Outline of Dissertation

To overcome these challenges, we will delve into the topic of deep learning in the next section. We next translate this work into interpretability to deep learning models of MRI and EHR (Contribution 2).

Figure I-1 Supervised learning models for classification

MRI Correlates of Chronic Symptoms in Mild Traumatic Brain Injury

Introduction
Methods

Data collection and preprocessing
Imaging metric extraction
Imaging metric analysis

Results

SVM classifier performance
Symptom score correlations

Discussion
Conclusion

Next, each metric set's ability to discriminate between mTBI and control subjects was assessed individually. To this end, an SVM classifier [136] was trained on the PCA space of each metric set combined with subject age.

Figure II-1 An overview of imaging metric generation is presented. Full-brain tractography is performed on the preprocessed DWI volume, and four streamline bundles are extracted using the BrainCOLOR labels

Joint Analysis of Structural Connectivity and Cortical Surface Features: Correlated with

Imaging and symptom data
Metric generation
Metric analysis
Visualizing independent components
Group differences
Symptom correlations

Castelli, "The effect of cluster size on the generalization of convolutional neural networks to a histopathological dataset," ICT Express, vol. Kohane et al., “The burden of comorbidity of children and youth with autism spectrum disorders,” PLoS One, vol.

Figure III-1 Surface and connectivity metric generation. The T1w volume is segmented into 132 BrainColor regions, out of which 98 cortical surface regions are kept

Requirements and installation

Inputs to the pipeline include EMR data (ICD-9, ICD-10, or CPT codes) and group data (disease group, gender, race, etc.). Two primary files are expected by pyPheWAS tools: the phenotype file (EMR data) and the cluster file (demographic data). The phenotype file contains EMR events for all subjects in the group file, with a single line for each event.

The phenotype and group files are linked by a column labeled 'id', which contains a unique identifier for each subject in the cohort.

Data preparation

In the basic setup described above, the control group consists of all non-random and non-ambiguous subjects. The first method excludes subjects from the control group based on given case codes and codes associated with those case codes; this prevents the control group from being contaminated by conditions similar to the target condition. In this case, the user would supply createPhenotypeFile with lists of ICD-9 and ICD-10 codes for both the case and control groups.

The control group is then composed of subjects who are not in the case group and who have at least the minimum frequency of specified control group codes in their records.

Scanning the ICD phenome

From this data, it creates three complementary views of the PheDAS analysis using the Python library matplotlib [189]. The volcano plot allows the user to see a summary of the entire experiment, with Manhattan and Log Odds. In addition to the volcano plot, Manhattan and Log Odds plots are generated by default for FDR and Bonferroni corrections.

Optional arguments allow users to modify any step of the pipeline (add covariates, specify significance level, etc.).

Figure IV-4 Detailed look at phenotype mapping, aggregation, and regression in pyPhewasLookup

Scanning the CPT phenome

In this section, we demonstrate the utility of the pyPheWAS package through two example PheDAS experiments.

Experiment 1: Synthetic dataset

To produce the confounding effect, ICD events were generated such that all women in the data set had equal odds of having PheCode 174.1 in their record; event ages were generated in the same way as primary PheCodes. However, because women were disproportionately represented across the case and control groups, PheCode's cohort-wide effect size is positively skewed at a 0.6 log odds ratio. ICD events were generated such that PheCode 292.2 would have a log odds ratio of −0.2; however, event ages were randomly generated using a uniform distribution over the higher age range [65,70].

Reg A successfully estimated the log odds ratio for all nine primary PheCodes and determined them to be statistically significant after Bonferroni multiple comparison correction.

Experiment 2: Down syndrome case study

Selecting the Run button in the model building panel activates a real-time estimate of the user's model. Third, PheDAS was used in the current study given its tendency to evaluate the “whole phenomenon” and reveal several co-occurring health problems in individuals with Down syndrome. We observe the best separability of right-left tumor laterality in the latent space with a batch size of 1 (Figure VII-4C).

We identified four unique subtypes of autism symptoms and co-occurring conditions in the EHR.

Figure IV-5 PheDAS applied to a synthetic dataset. a) Volcano plot resulting from a PheDAS without covariates

Materials and Methods

A brief description of PheDAS
Input and preprocessing
Building a PheDAS model
Evaluating a PheDAS model
Installation and Use
Software evaluation

Our model-based findings indeed confirmed the presence of congestive heart failure in the predicted probability of surgical needs in individuals with DS and CHD. Dehaene, “The visual word form area: expertise for reading in the fusiform gyrus,” Trends Cogn. Wei et al., “Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenomenon-wide association studies in the electronic health record,” PLoS One, vol.

Bakas et al., "Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge," 2018.

Phenotyping Down Syndrome: Discovery and Predictive Modeling with Electronic

Background

Study 1: Characterizing as-yet unidentified co-occurring health conditions in DS
Study 2: Congenital heart disease and surgical needs in DS
Characterizing as-yet unidentified co-occurring health conditions in DS
Study 2: Longitudinal predictors of surgical intervention among DS cases with CHD
Known versus as-yet unidentified co-occurring health conditions in DS
Health conditions in the likelihood of surgery among DS cases with CHD
Limitations and future directions

For example, hypothyroidism is among the most common endocrine comorbidities in individuals with DS [211]. Less "well-studied" (or more recent) co-morbid medical conditions could provide insight into potentially unmet health needs among individuals with DS. Findings on co-morbid medical conditions in individuals with DS were further examined to reveal which are as yet unidentified or less discussed in the relevant literature.

First, differences in co-occurring health conditions among people with DS have been linked to access to care, health insurance and socioeconomic differences [246].

Figure VI-1 Overall two-part study design and flow charts for Studies 1 and 2.

Batch Size: Go Big or Go Home? Counterintuitive Improvement in Medical Autoencoders

Data overview and preparation
Autoencoder training protocols
Latent space evaluation
Training performance
Qualitative analysis of latent space separability and data reconstruction
Latent space performance on secondary tasks

We plotted the residuals as absolute percent difference between samples as a function of group size. Additionally, in Figure VII-3 we present tSNE visualizations of latent space as a function of cluster size for EHR data. We observe improved gender separation with decreasing group size, especially when the group size is reduced from 100 and 50 to 25.

First, we do not take into account the effect of other parameters in connection with the batch size.

Figure VII-1 Medical autoencoders seek to derive latent spaces that capture clinically or biologically meaningful information about the cohort

Unsupervised Hard Case Mining for Medical Autoencoders

Unsupervised Hard Case Mining
Autoencoder Experiments
MNIST
EHR
MRI

Pereira et al., "Longitudinal degeneration of the basal forebrain predicts later dementia in Parkinson's disease," Neurobiol. Tang et al., “Explanable classification of Alzheimer's disease pathologies using a convolutional neural network pipeline,” Nat. Kerley et al., “MRI correlates of chronic symptoms in mild traumatic brain injury,” in Medical Imaging 2020: Image Processing, Mar.

Parvathaneni et al., "Cortical Surface Parcellation Using Spherical Convolutional Neural Networks," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Oct.

Figure VIII-1 Network architectures for fully connected and convolutional autoencoders

EHR-Defined Subtypes of Autism in Children and their Associations with Structural MRI

Material and Methods

EHR processing
MRI processing
Clustering and brain volume models
EHR clustering
Brain volume models

This PCA procedure was used to reduce the noise inherent in the EHR [ 49 ] and to balance the contributions of PheCode and ProCode data for the subsequent clustering analysis. Three of the regions were in the basal ganglia (Figure IX-4): left area accumbens, left pallidum, and left putamen. Regions in the basal ganglia showed more dramatic associations with age than the other three regions.

This potential lack of brain associations may be surprising given the cognitive focus of identified subtypes of autism and the high incidence of significant associations between brain volume and autism in the literature.

Conclusions

Chaganti et al., “Context signatures of electronic medical records improve diagnostic classification using medical image computing,” IEEE J. Jack et al., “Prediction of AD with MRI-based hippocampal volume in mild cognitive impairment ". Georgiades et al., "Investigating phenotypic heterogeneity in children with autism spectrum disorder: a factorial mixture modeling approach," J.

O’Dwyer et al., »Brain Volumetric Correlates of Symptoms Spectrum Disorder of Autism Spectrum Disorder in Attention Deficit/Hyperactivity Disorder,« PLoS One, vol.

Conclusions & Future Work

Introduction

One exciting facet of this field is multi-modal modeling: combining multiple sources of medical data into a single model of a disease. Limited data availability in medical imaging produces models that are biased and difficult to generalize; the accumulation of multiple data sources for multi-modal modeling further limits data availability and may exacerbate these biases. Finally, this work culminated in the development of an interpretable framework for multi-modal modeling of brain MRI and EHR (Chapter IX).

Collectively, this work expands the growing field of model interpretation and contributes new methods for multimodal medical inference with limited data.

Interpretable Machine Learning with MRI and EHR

Summary
Technical Innovations
Clinical Impacts
Future Directions

We extended the interpretability of PheWAS models and integrated explainable machine learning principles into this core big data technology. We have found the first evidence of combined structural and diffusion biomarkers on small data in mild traumatic brain injury. Although we have made measurable progress in interpretable small-data machine learning for multicontrast MRI, there are still open questions that need to be explored.

Our innovations in machine learning for EHR analysis have increased both the interpretability and accessibility of PheWAS.

Interpretable Deep Learning with MRI and EHR

Summary
Clinical Impact
Future Directions

We characterized the significant impact that batch size has on the interpretation of deep neural network embeddings of medical data. By improving the interpretation of unsupervised deep learning models, we have increased their potential for novel abnormality detection and phenotype discovery in EHR and MRI datasets. Despite the encouraging results seen in this work, there are still many opportunities to explore in creating interpretable deep learning models for MR and EHR.

Our proposed unsupervised framework for hard case mining showed positive results in accelerating model convergence and improving latent embedding interpretation, but these effects were not seen uniformly across both EHR and MR.

Interpretable Multi-Modal Modeling for MRI and EHR

Summary
Clinical Impact
Future Directions

Our latest paper combined these innovations to propose an interpretable framework for the joint analysis of MRI and EHR in a data-limited cohort of children with autism spectrum disorder (Chapter IX). This framework involved synthesizing different sources of EHR data to identify subtypes of autism spectrum disorders. We identified novel EHR subtypes in a cohort of patients with autism spectrum disorders and found significant associations between these subtypes and six brain regions.

There are still many opportunities for interpretable multimodal modeling of MRI and EHR.

ICD codes for defining down syndrome and intellectual and developmental disability groups

Carmichael et al., “Joint and individual analysis of histological images and genomic covariates of breast cancer.”. Stoub et al., “MRI predictors of Alzheimer's disease risk: a longitudinal study,” Neurology, vol. Kerley et al., “pyPheWAS: a phenomenon-disease association tool for electronic health record analysis,” Neuroinformatics, vol.

Dingen et al., “RegressionExplorer: Interactive Exploration of Logistic Regression Models with Subgroup Analysis,” IEEE Trans.