Supplemental Figure Titles and Captions

(1)

Supplemental Figure Titles and Captions

Supplemental Figure 1: Term Frequency – Inverse Document Frequency (TF-IDF) Equation

Supplemental Figure 2: Model Development and Validation

UCSF- University of California, San Francisco, BIDMC – Beth Israel Deaconess Medical Center, AUROC – Area Under the Receiving-Operator Curve

Supplemental Figure 3: Calibration Curves for Penalized Logistic Regression Models

The bottom bar graph reports the number of patients found in each decile of predicted mortality. Mortality on the y axis of the line graph represents actual computed mortality rate for patients in each decile. The TF-IDF model does not report a 10^th point because the model did not predict any patients to have a probability of mortality between 0.9 and 1.

Supplemental Figure 4: Calibration Curves for Feedforward Neural Network Models

The bottom bar graph reports the number of patients found in each decile of predicted mortality. Mortality on the y axis of the line graph represents actual computed mortality rate for patients in each decile.

Supplemental Figure 5: Calibration Curves for Random Forest Models

The bottom bar graph reports the number of patients found in each decile of predicted mortality. Mortality on the y axis of the line graph represents actual computed mortality rate for patients in each decile.

(2)

Term Frequency: tf(t,d) = Number of times term t is present in document d

Document Frequency (df) = Number of times term t is present in the document corpus N

Inverse Document Frequency of term t idf(t) = log(N/(df + 1))

TF-IDF Formula: for term t in document d within a document corpus N

tf-idf(t,d) = tf(t,d)*log(N/(df+1))

(3)

UCSF Note Text BIDMC Note Text

Natural Language Pre-Processing

Pre-Processed

UCSF Note Text Pre-Processed

BIDMC Note Text

Text Data Vectorization

Featurized UCSF Data Featurized BIDMC Data

Model Training Using 10-Fold Cross Validation

UCSF Trained Model AUROC and CI of

Trained Model on UCSF Data

Calculate Model Performance of UCSF Trained Model with Featurized BIDMC Data Data Kept Separated

AUROC of Trained Model on BIDMC Data

(4)

(5)

(6)