risk prediction models and visualizations

I would like to thank my graduation committee for their support in the realization of this project. My work would not have been possible without the generous financial support of the Department of Veterans Affairs and the Advanced Fellowship Program in Medical Informatics through the Office of Academic Affiliations. Thank you to the faculty, staff and fellow students of the Department of Veterans Affairs.

Thanks also to the faculty, staff, and students of the Department of Biomedical Informatics at Vanderbilt. Finally, I would like to thank my parents and my sister for supporting me and encouraging me through this experience and work. 20 Table 6: Characteristics of the cohort of cirrhotic patients with and without HRS as determined by chart review.

28 Table 11: Breakdown of the candidate predictor variables used in the Hepatorenal Syndrome risk prediction model. 22 Figure 9: Discrimination (panel A), via the ROC curve, and calibration (panel B), via smooth observed-to-expected probability plots, for the five different hepatorenal syndrome phenotyping models.

Introduction

In contrast, an "out-of-the-box" NLP system can be used that attempts to identify all medical concepts within the documents. NRI can be measured at specific thresholds of interest of the underlying model, which may be more informative than a global measure such as AUC. Feature selection hopes to identify a relevant subset of the original features; while feature generation creates new features (optionally replacing some original features) to improve model performance.

Certain methods may be more appropriate for certain types of data.165,166 Table 2 describes some of the patterns for information visualization and the data sizes and dimensionality for which they are suitable. Geometric visualization is one of the most common and has the closest resemblance to scientific visualization. This allows the user to see details about a specific part of the timeline while maintaining the global perspective.

Zhao et al.181 use a radial layout but combine it with alternative linear views that provide different perspectives of the data. Within the context of the use case for HRS, we aim to explore some of these challenges.

Figure 1: Change in prevalence of chronic liver disease in the United States from 1988 to 2008

Phenotyping Hepatorenal Syndrome

To the structured variables, we added natural language processing variables from the clinical notes as detailed in the next section. For LR and naive Bayes, we first performed variable selection using penalized LR, using the L1 penalty (Least Absolute Shrinkage and Selection Operator—LASSO), to select a subset of the predictor variables.42 For the in the rest of the models we used the full set of predictor variables. Note: Slope and Intercept refer to the parameters of the best-fit line through the observed-to-predicted probability plot; AUC: Area under the curve.

This is one of the first efforts to phenotype AKI etiology, a condition that affects up to 2% of hospitalized patients.245 Penalized LR achieved the best performance with an AUC of 0.93 (95% CI. The probabilistic phenotype algorithm allows changing thresholds for varying levels of sensitivity and specificity depending on user needs Despite the better discriminatory power of the logistic regression model, calibration was better with gradient boosting and support vector machines, suggesting that the performance of some cut-points may still favor.

At the same time, it is worth noting that there is one-to-one mapping for two of the important ICD-9 codes (ATN and HRS) based on the General. Unlike the fixed sensitivity and specificity of the HRS ICD-9 code, this probabilistic model can be used at multiple set points depending on the use case (eg, a bias versus specificity or sensitivity).

Figure 8: Workflow describing Natural Language Processing pipeline.

Risk Prediction Models for Hepatorenal Syndrome

See Appendix 1 for details on validating the HRS ICD-9 code and ascertaining ascites status. We included data -24 hours before admission as part of the admission time frame to include emergency room data. We performed a penalized logistic regression, using the L1 penalty (Least Absolute Shrinkage and Selection Operator - LASSO), to select a subset of predictor variables.42 Refer to Appendix 1 for details on the variable selection procedure.

We reported the AUC of the GEE model with a 95% confidence interval (CI) calculated from bootstrap samples and variable odds ratios.129 We assessed model calibration using the Brier score (range 0 to 1, where 0 means perfect calibration). the slope and intercept of the regression line between O/E probabilities and an O/E probability plot.261 We performed two sensitivity analyses: first, excluding hospitalizations where patients received vasopressin or norepinephrine on admission; second, excluding patients who may have cardiorenal syndrome. Of the 'non-opioid analgesic' group, 167 of 9,986 admissions had acute tubular necrosis (ATN), by ICD-9 code, versus 476 of 25,426 admissions (p=0.22), suggesting that the difference was not anti-drug-induced AKI. - non-steroidal inflammatory. Since INR is part of the MELD score, it is possible that the MELD score already captured patients with higher INR values.

This could affect the study with a coding bias that neglects patients who are less likely to receive an HRS ICD-9 code, which are probably some borderline patients with a wider differential for their renal failure. Our ICD-9 code validation was based on the older HRS criteria (particularly the strict creatinine cutoff) because chart review was performed for patients treated before the 2015 criteria.

Figure 10: Cohort selection process from an initial sample of all inpatient admissions after applying exclusion criteria

Information Visualization for Model Analysis

To facilitate comparison, both patient timelines are anchored to the last visualized date—the first day of the index admission. The group visualization view in the middle area visualizes multiple groups of patients using a glyph visualization. The details view on the right allows a drill down of all the variables that make up the cluster.

The size of the pie is proportional to the importance of that variable in the cluster. Each cluster's affinity for 14 variables is represented by a color-coded pie slice, with a larger pie slice showing greater affinity. The clustering along with model AUC and model calibration using the slope and intercept of the regression line through the O/E probability plot; In addition, the visualization also shows each cluster's affinity for fourteen key clinical variables.

Using the left control panel, users can select the predictors of interest to display within the group visualization from five domains: demographics,. The user can also plot either the percentage of patients who have HRS in the group, or the prediction model's AUC for the respective group.

Figure 12: Example clinical course visualization for a patient with alcoholic cirrhosis

Conclusions

Changes in the prevalence of the most common causes of chronic liver disease in the United States from 1988 to 2008. Undiagnosed cirrhosis occurs frequently in the elderly and requires periodic follow-up and medical treatments. Quality of care provided to patients with cirrhosis and ascites in the Department of Veterans Affairs.

An open-label, pilot, randomized controlled trial of Noradrenaline versus Terlipressin in the treatment of type 1. Online prediction of health care utilization in the next six months based on electronic Health Record information: A cohort and validation study. Magnetic resonance elastography in the detection of hepatorenal syndrome in patients with cirrhosis and ascites.

The role of duplex Doppler ultrasound in the diagnosis of renal dysfunction and hepatorenal syndrome in patients with liver cirrhosis. Diagnostic significance of dimethylarginine in the development of hepatorenal syndrome in patients with alcoholic cirrhosis. Urinary neutrophil gelatinase-associated lipocalin as a biomarker in the differential diagnosis of renal dysfunction in cirrhosis.

Time trends in the health care burden and mortality of acute or chronic liver failure in the United States. Predictors of Kidney Disease Progression in the Modification of Diet in Renal Disease Study. Appendix Table A.2: Code definitions for comorbid conditions and procedures used in the model, based on the International Classification of Diseases version 9, current procedure terminology, and the ICD procedure code.

Values from the four separate chains were averaged together for the final imputed values used in the dataset. Mayo Clinic Disease and Conditions did not include relevant content on the HRS to allow inclusion in SAFE. A list of filtered candidate CUIs based on the public knowledge sources is available in the online appendix.

We adjusted the SAFE silver standard thresholds to ensure reasonable sample sizes in the extreme subsets (LICD=0, LNLP=0, UICD=1, and UNLP=3). We implemented 50 iterations of the elastic net models for each of the three silver standards in the SAFE approach, selecting CUIs included in 50% of the total models. Values from the four separate chains were averaged together for the final imputed values used in the data set.

As cardiorenal syndrome develops in the setting of acute (or acute on chronic) decompensated heart failure (ADHF) or acute myocardial infarction (AMI), 304–306 we performed a sensitivity analysis that included patients with decompensated heart failure or acute myocardial infarction who had an HRS ICD9 code to the “No HRS” cohort.