Chapter VI Phenotyping Down Syndrome: Discovery and Predictive Modeling with Electronic
2. Background
3.2. Study 2: Congenital heart disease and surgical needs in DS
Study 2 aimed to determine longitudinal EMR predictors of surgical interventions for DS subjects diagnosed with CHD. To this end, we first identified which DS subjects from Study 1 also had CHD, and from the CHD subset, which subjects did or did not receive heart surgery. The EMRs of these subjects were then censored to seven days before the first heart surgery record to isolate only events that occurred before
surgery. Three classifiers, including Support Vector Machine, Random Forest, and Multi-layer Perceptron, were subsequently trained to predict whether or not a subject would receive surgery based only on the pre- surgical records. Finally, a black-box explainability technique was used on the best-performing trained classifier to determine the importance of each phecode toward surgery predictions. This process is described more fully in the following sections.
3.2.1. Cohort selection
Out of the 2,282 DS subjects from Study 1, subjects with the additional diagnosis of CHD were identified as those who had at least one instance of a phecode for CHD: 747, 747.1, 747.13, and 747.2. This resulted in 1,098 subjects in the combined DS+CHD group. From this group, 204 surgery subjects were identified as those with at least one record of the phecode for heart transplant/surgery (429.1). Surgery subjects were matched one-to-one with non-surgery DS+CHD controls based on three matching criteria.
The first two included biological sex and minimum age at visit (± 2 years). The last criterion matched surgery subjects’ age at surgery to control subjects’ median visit age (± 3 years); this final criterion ensured that matched controls had EMR events around the same time as their match’s surgery, forcing their EMRs to be approximately the same length up until the point of surgery. Finally, the EMRs of all subjects were censored to only records before seven days prior to surgery (control EMRs were censored to their matched surgery subject’s age at surgery). The seven-day buffer was added leading up to surgery to prevent EMR contamination around the time of surgery due to reporting delays. This selection process yielded a final cohort of 408 DS+CHD subjects, evenly split between surgery and non-surgery.
3.2.2. Longitudinal predictive modeling
In the same way as Study 1, the first step in analyzing the EMRs was converting ICD-9 and ICD-10 codes to phecodes and aggregating those phecode instances across each subject’s record. This resulted in a 1x1,866 binary vector for each subject, where again, phecodes present in the record were represented by a 1, while phecodes not present in the record were represented by a 0. Finally, the phecode for heart transplant/surgery (429.1) was removed, yielding a 1x1,865 binary phecode vector describing the EMR fingerprint leading up to surgery for each subject in the DS+CHD group.
Three different classifier models were next trained to predict whether or not a given subject would receive heart surgery based only on the subject’s 1x1,865 binary phecode vector; these models included Support Vector Machine, Random Forest, and Multi-layer Perceptron. Four-fold cross validation was used to optimize model and training parameters for all model types. The optimal Support Vector Machine model
used a sigmoid kernel and class weightings (0.45 for the non-surgery class, 0.55 for the surgery class). The optimal Random Forest model included 60 estimators, employed minimal cost-complexity pruning with α
= 0.007, and used entropy as the split criterion. Finally, the optimal Multi-layer Perceptron model employed rectified linear unit activation, a stochastic gradient decent optimizer, and two hidden network layers (a 100-neuron layer followed by a 50-neuron layer). All models were trained using the scikit-learn [Python]
package; any model parameters not explicitly mentioned in this article were set to the scikit-learn default values [83].
These optimized parameters were next used to train secondary models of each type, now on the full dataset; 80% of the DS+CHD cohort (326 subjects) were used for training, and the remaining 20% (82 subjects) were used for testing. These secondary models were trained using the full dataset in order to leverage as much information as possible in the subsequent explanatory phecode analysis.
3.2.3. Model-based feature importance and explanatory variables
We next used LIME, an explainability technique for machine learning models, to investigate which phecodes were most important in predictions of surgery. LIME stands for local interpretable model- agnostic explanations; in short, it is a method of generating human-interpretable explanations for the predictions of any machine learning model [68]. For a given input X, LIME perturbs the input data and monitors how the perturbations modify the model’s prediction. LIME then generates an explanation for the model’s prediction based on X, consisting of a weight for each input feature. For the purposes of this study, LIME perturbs the binary phecode vector for an individual subject and monitors how this modifies the surgery prediction. It then generates an explanation for that subject, consisting of a weight for each phecode.
The weight, wphe, for a phecode, phe, may be interpreted in the following way: on average, the presence of phe in the subject’s record increases the probability of a surgery prediction by wphe. Such explanations were generated for all 408 subjects in the DS+CHD cohort. Explanatory weights were then averaged across subjects, yielding a single average explanatory weight for each phecode. This procedure was repeated for each classification model type, resulting in three separate sets of explanatory phecode weights. Finally, these weights were considered in relation their prevalence in the surgery cohort; findings were reported only for health conditions that appear in at least 40% of the subjects who received surgery.
4. Results
Extant EMR data enabled a large sample of N = 2,282 individuals with DS, as well as the creation of comparison groups (as specified below) with stringent criteria, to be included in this two-part study. Study
1 comprehensively analyzed a range of co-occurring health conditions and captured as-yet unidentified ones among individuals in DS. Study 2 evaluated health conditions that longitudinally predict known clinical outcomes by looking at surgical intervention for CHD in a subset of individuals with DS.