• Tidak ada hasil yang ditemukan

Explainable AI in Medical Imaging: Interpreting Multi-Modality Inference with Neuroimaging and EHR

N/A
N/A
Protected

Academic year: 2023

Membagikan "Explainable AI in Medical Imaging: Interpreting Multi-Modality Inference with Neuroimaging and EHR"

Copied!
166
0
0

Teks penuh

Subject loadings for this component represent a statistically significant (p < 0.05, corrected) difference between the control and mTBI populations. C) An independent component projected back into the connectivity data space; component is visualized on a representative T1w volume where similarly stained areas are connected (see Section 2.4 for a full description of the visualization procedure). Latent space visualizations consisted of a 2-dimensional tSNE projection of 32-dimensional latent space embeddings of the test cohort; the color in this visualization indicated the individual gender.

Introduction

  • Overview
  • Machine Learning Inference for Medical Image Analysis
    • Classic machine learning approaches
    • Deep learning approaches
  • Multi-Modal Image Processing and Patient Context
    • Multi-modal image processing
    • Patient context: analyzing electronic health records
    • Challenges
  • Explainable AI in Medical Image Analysis
    • Explanation vs. Interpretation
    • Post-hoc explainability
    • Intrinsic explainability
    • Challenges
  • Test Bed Applications
    • Mild traumatic brain injury
    • Developmental disabilities
    • Mild cognitive impairment
  • Contributed Work
    • Contribution 1: Interpretable Machine Learning with MRI and EHR
    • Contribution 2: Interpretable Deep Learning with MRI and EHR
    • Contribution 3: Interpretable Multi-Modal Modeling for MRI and EHR
    • Outline of Dissertation

To overcome these challenges, we will delve into the topic of deep learning in the next section. We next translate this work into interpretability to deep learning models of MRI and EHR (Contribution 2).

Figure I-1 Supervised learning models for classification
Figure I-1 Supervised learning models for classification

MRI Correlates of Chronic Symptoms in Mild Traumatic Brain Injury

  • Introduction
  • Methods
    • Data collection and preprocessing
    • Imaging metric extraction
    • Imaging metric analysis
  • Results
    • SVM classifier performance
    • Symptom score correlations
  • Discussion
  • Conclusion

Next, each metric set's ability to discriminate between mTBI and control subjects was assessed individually. To this end, an SVM classifier [136] was trained on the PCA space of each metric set combined with subject age.

Figure  II-1 An  overview  of  imaging  metric  generation  is  presented.  Full-brain  tractography  is  performed on the preprocessed DWI volume, and four streamline bundles are extracted using the  BrainCOLOR labels
Figure II-1 An overview of imaging metric generation is presented. Full-brain tractography is performed on the preprocessed DWI volume, and four streamline bundles are extracted using the BrainCOLOR labels

Joint Analysis of Structural Connectivity and Cortical Surface Features: Correlated with

  • Imaging and symptom data
  • Metric generation
  • Metric analysis
  • Visualizing independent components
  • Group differences
  • Symptom correlations

Castelli, "The effect of cluster size on the generalization of convolutional neural networks to a histopathological dataset," ICT Express, vol. Kohane et al., “The burden of comorbidity of children and youth with autism spectrum disorders,” PLoS One, vol.

Figure  III-1  Surface  and connectivity  metric  generation. The T1w  volume  is segmented  into  132  BrainColor regions, out of which 98 cortical surface regions are kept
Figure III-1 Surface and connectivity metric generation. The T1w volume is segmented into 132 BrainColor regions, out of which 98 cortical surface regions are kept

Requirements and installation

Inputs to the pipeline include EMR data (ICD-9, ICD-10, or CPT codes) and group data (disease group, gender, race, etc.). Two primary files are expected by pyPheWAS tools: the phenotype file (EMR data) and the cluster file (demographic data). The phenotype file contains EMR events for all subjects in the group file, with a single line for each event.

The phenotype and group files are linked by a column labeled 'id', which contains a unique identifier for each subject in the cohort.

Figure  IV-3  pyPheWAS  package  tools.  The  package  is  composed  of  three  main  tool  sets:  data  preparation, ICD analysis, and CPT analysis
Figure IV-3 pyPheWAS package tools. The package is composed of three main tool sets: data preparation, ICD analysis, and CPT analysis

Data preparation

In the basic setup described above, the control group consists of all non-random and non-ambiguous subjects. The first method excludes subjects from the control group based on given case codes and codes associated with those case codes; this prevents the control group from being contaminated by conditions similar to the target condition. In this case, the user would supply createPhenotypeFile with lists of ICD-9 and ICD-10 codes for both the case and control groups.

The control group is then composed of subjects who are not in the case group and who have at least the minimum frequency of specified control group codes in their records.

Scanning the ICD phenome

From this data, it creates three complementary views of the PheDAS analysis using the Python library matplotlib [189]. The volcano plot allows the user to see a summary of the entire experiment, with Manhattan and Log Odds. In addition to the volcano plot, Manhattan and Log Odds plots are generated by default for FDR and Bonferroni corrections.

Optional arguments allow users to modify any step of the pipeline (add covariates, specify significance level, etc.).

Figure IV-4 Detailed look at phenotype mapping, aggregation, and regression in pyPhewasLookup
Figure IV-4 Detailed look at phenotype mapping, aggregation, and regression in pyPhewasLookup

Scanning the CPT phenome

In this section, we demonstrate the utility of the pyPheWAS package through two example PheDAS experiments.

Experiment 1: Synthetic dataset

To produce the confounding effect, ICD events were generated such that all women in the data set had equal odds of having PheCode 174.1 in their record; event ages were generated in the same way as primary PheCodes. However, because women were disproportionately represented across the case and control groups, PheCode's cohort-wide effect size is positively skewed at a 0.6 log odds ratio. ICD events were generated such that PheCode 292.2 would have a log odds ratio of −0.2; however, event ages were randomly generated using a uniform distribution over the higher age range [65,70].

Reg A successfully estimated the log odds ratio for all nine primary PheCodes and determined them to be statistically significant after Bonferroni multiple comparison correction.

Experiment 2: Down syndrome case study

Selecting the Run button in the model building panel activates a real-time estimate of the user's model. Third, PheDAS was used in the current study given its tendency to evaluate the “whole phenomenon” and reveal several co-occurring health problems in individuals with Down syndrome. We observe the best separability of right-left tumor laterality in the latent space with a batch size of 1 (Figure VII-4C).

We identified four unique subtypes of autism symptoms and co-occurring conditions in the EHR.

Figure IV-5 PheDAS applied to a synthetic dataset. a) Volcano plot resulting from a PheDAS without  covariates
Figure IV-5 PheDAS applied to a synthetic dataset. a) Volcano plot resulting from a PheDAS without covariates

Materials and Methods

  • A brief description of PheDAS
  • Input and preprocessing
  • Building a PheDAS model
  • Evaluating a PheDAS model
  • Installation and Use
  • Software evaluation

Our model-based findings indeed confirmed the presence of congestive heart failure in the predicted probability of surgical needs in individuals with DS and CHD. Dehaene, “The visual word form area: expertise for reading in the fusiform gyrus,” Trends Cogn. Wei et al., “Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenomenon-wide association studies in the electronic health record,” PLoS One, vol.

Bakas et al., "Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge," 2018.

Figure  V-1  pyPheWAS  Explorer  Workflow. All  data  preprocessing  is  done  automatically  in  the  background;  feature  matrices  are  saved  for  faster  startup  in  subsequent  sessions
Figure V-1 pyPheWAS Explorer Workflow. All data preprocessing is done automatically in the background; feature matrices are saved for faster startup in subsequent sessions

Phenotyping Down Syndrome: Discovery and Predictive Modeling with Electronic

Background

  • Study 1: Characterizing as-yet unidentified co-occurring health conditions in DS
  • Study 2: Congenital heart disease and surgical needs in DS
  • Characterizing as-yet unidentified co-occurring health conditions in DS
  • Study 2: Longitudinal predictors of surgical intervention among DS cases with CHD
  • Known versus as-yet unidentified co-occurring health conditions in DS
  • Health conditions in the likelihood of surgery among DS cases with CHD
  • Limitations and future directions

For example, hypothyroidism is among the most common endocrine comorbidities in individuals with DS [211]. Less "well-studied" (or more recent) co-morbid medical conditions could provide insight into potentially unmet health needs among individuals with DS. Findings on co-morbid medical conditions in individuals with DS were further examined to reveal which are as yet unidentified or less discussed in the relevant literature.

First, differences in co-occurring health conditions among people with DS have been linked to access to care, health insurance and socioeconomic differences [246].

Figure VI-1 Overall two-part study design and flow charts for Studies 1 and 2.
Figure VI-1 Overall two-part study design and flow charts for Studies 1 and 2.

Batch Size: Go Big or Go Home? Counterintuitive Improvement in Medical Autoencoders

  • Data overview and preparation
  • Autoencoder training protocols
  • Latent space evaluation
  • Training performance
  • Qualitative analysis of latent space separability and data reconstruction
  • Latent space performance on secondary tasks

We plotted the residuals as absolute percent difference between samples as a function of group size. Additionally, in Figure VII-3 we present tSNE visualizations of latent space as a function of cluster size for EHR data. We observe improved gender separation with decreasing group size, especially when the group size is reduced from 100 and 50 to 25.

First, we do not take into account the effect of other parameters in connection with the batch size.

Figure VII-1 Medical autoencoders seek to derive latent spaces that capture clinically or biologically  meaningful  information  about  the  cohort
Figure VII-1 Medical autoencoders seek to derive latent spaces that capture clinically or biologically meaningful information about the cohort

Unsupervised Hard Case Mining for Medical Autoencoders

  • Unsupervised Hard Case Mining
  • Autoencoder Experiments
  • MNIST
  • EHR
  • MRI

Pereira et al., "Longitudinal degeneration of the basal forebrain predicts later dementia in Parkinson's disease," Neurobiol. Tang et al., “Explanable classification of Alzheimer's disease pathologies using a convolutional neural network pipeline,” Nat. Kerley et al., “MRI correlates of chronic symptoms in mild traumatic brain injury,” in Medical Imaging 2020: Image Processing, Mar.

Parvathaneni et al., "Cortical Surface Parcellation Using Spherical Convolutional Neural Networks," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Oct.

Figure VIII-1 Network architectures for fully connected and convolutional autoencoders
Figure VIII-1 Network architectures for fully connected and convolutional autoencoders

EHR-Defined Subtypes of Autism in Children and their Associations with Structural MRI

Material and Methods

  • EHR processing
  • MRI processing
  • Clustering and brain volume models
  • EHR clustering
  • Brain volume models

This PCA procedure was used to reduce the noise inherent in the EHR [ 49 ] and to balance the contributions of PheCode and ProCode data for the subsequent clustering analysis. Three of the regions were in the basal ganglia (Figure IX-4): left area accumbens, left pallidum, and left putamen. Regions in the basal ganglia showed more dramatic associations with age than the other three regions.

This potential lack of brain associations may be surprising given the cognitive focus of identified subtypes of autism and the high incidence of significant associations between brain volume and autism in the literature.

Table IX-1 Autism Cohort Demographics
Table IX-1 Autism Cohort Demographics

Conclusions

Chaganti et al., “Context signatures of electronic medical records improve diagnostic classification using medical image computing,” IEEE J. Jack et al., “Prediction of AD with MRI-based hippocampal volume in mild cognitive impairment ". Georgiades et al., "Investigating phenotypic heterogeneity in children with autism spectrum disorder: a factorial mixture modeling approach," J.

O’Dwyer et al., »Brain Volumetric Correlates of Symptoms Spectrum Disorder of Autism Spectrum Disorder in Attention Deficit/Hyperactivity Disorder,« PLoS One, vol.

Conclusions & Future Work

Introduction

One exciting facet of this field is multi-modal modeling: combining multiple sources of medical data into a single model of a disease. Limited data availability in medical imaging produces models that are biased and difficult to generalize; the accumulation of multiple data sources for multi-modal modeling further limits data availability and may exacerbate these biases. Finally, this work culminated in the development of an interpretable framework for multi-modal modeling of brain MRI and EHR (Chapter IX).

Collectively, this work expands the growing field of model interpretation and contributes new methods for multimodal medical inference with limited data.

Interpretable Machine Learning with MRI and EHR

  • Summary
  • Technical Innovations
  • Clinical Impacts
  • Future Directions

We extended the interpretability of PheWAS models and integrated explainable machine learning principles into this core big data technology. We have found the first evidence of combined structural and diffusion biomarkers on small data in mild traumatic brain injury. Although we have made measurable progress in interpretable small-data machine learning for multicontrast MRI, there are still open questions that need to be explored.

Our innovations in machine learning for EHR analysis have increased both the interpretability and accessibility of PheWAS.

Interpretable Deep Learning with MRI and EHR

  • Summary
  • Technical Innovations
  • Clinical Impact
  • Future Directions

We characterized the significant impact that batch size has on the interpretation of deep neural network embeddings of medical data. By improving the interpretation of unsupervised deep learning models, we have increased their potential for novel abnormality detection and phenotype discovery in EHR and MRI datasets. Despite the encouraging results seen in this work, there are still many opportunities to explore in creating interpretable deep learning models for MR and EHR.

Our proposed unsupervised framework for hard case mining showed positive results in accelerating model convergence and improving latent embedding interpretation, but these effects were not seen uniformly across both EHR and MR.

Interpretable Multi-Modal Modeling for MRI and EHR

  • Summary
  • Technical Innovations
  • Clinical Impact
  • Future Directions

Our latest paper combined these innovations to propose an interpretable framework for the joint analysis of MRI and EHR in a data-limited cohort of children with autism spectrum disorder (Chapter IX). This framework involved synthesizing different sources of EHR data to identify subtypes of autism spectrum disorders. We identified novel EHR subtypes in a cohort of patients with autism spectrum disorders and found significant associations between these subtypes and six brain regions.

There are still many opportunities for interpretable multimodal modeling of MRI and EHR.

ICD codes for defining down syndrome and intellectual and developmental disability groups

Carmichael et al., “Joint and individual analysis of histological images and genomic covariates of breast cancer.”. Stoub et al., “MRI predictors of Alzheimer's disease risk: a longitudinal study,” Neurology, vol. Kerley et al., “pyPheWAS: a phenomenon-disease association tool for electronic health record analysis,” Neuroinformatics, vol.

Dingen et al., “RegressionExplorer: Interactive Exploration of Logistic Regression Models with Subgroup Analysis,” IEEE Trans.

Gambar

Figure I-3 Example of a post-hoc explanation in a deep learning model
Figure  II-1 An  overview  of  imaging  metric  generation  is  presented.  Full-brain  tractography  is  performed on the preprocessed DWI volume, and four streamline bundles are extracted using the  BrainCOLOR labels
Figure  II-3 A  schematic  overview  of  the  imaging  metric  analysis.  First,  the  imaging  metrics  are  normalized  by  converting  the  raw  imaging  metrics  to  z-scores  using  the  mean  𝜇controls  and  standard deviation 𝜎controls of  the contr
Figure II-4 shows SVM classifier performance as the PCA components are swept for both the individual  metric sets and the combined set
+7

Referensi

Dokumen terkait