Model specification and optimization for the logistic regression

(1)

Supplemental Methods 2: Model specification and optimization for the logistic regression and Gradient Boosting models

Logistic Regression Modeling

The logistic regression models for both preoperative and intraoperative models were regularized using the least absolute shrinkage and selection operator (LASSO) using the glmnet package in R (version 3.0-2)²⁵. This approach simultaneously performs variable selection and

constrains/shrinks the regression coefficients, which tends to reduce overfitting and improve out-of-sample performance²⁶. The LASSO penalty parameter was selected to minimize the binomial deviance estimated using 10-fold cross-validation. Four preoperative logistic models were fit using different subsets of candidate predictors due to varying degrees of missingness among the preoperative predictors. The first set of predictors had the least amount of missing.

The three other sets of predictors each progressively included more predictors and had increasing levels of missingness. When fitting each preoperative model, patients with missing values were excluded. The sets of predictors are shown in SDC 6 Table 2. For both

preoperative and intraoperative logistic models, the predictions are calculated based on the following type of formula:

( ( )) ( ) ⁄ ( ) ⁄ ( ) ⁄

In this formula, ( ) indicates the probability of the outcome (hypotension);

are the predictors; are the SD values of the predictors; and

are the corresponding odds ratios of the predictors. The logistic function is defined as ( ) ( ). This function allows the weighted sum of predictors, which

(2)

has an unrestricted range of values, to be mapped to the range 0 to 1, corresponding to the probability of the outcome of interest.

Gradient Boosting Trees

Gradient boosting trees for the preoperative and intraoperative models were implemented using the XGBoost algorithm in the xgboost package in R (version 1.0.0.2)²⁷. For both the

preoperative and intraoperative models, the XGBoost hyperparameters (e.g., the number of trees, the learning rate [eta], maximum tree depth, etc.) were tuned in an iterative fashion. The parameters were each systematically varied along a specified range. At each combination of hyperparameter values, model performance was checked based on a 5-fold cross-validation of the area under the receiver operating characteristic (ROC) curve (ROC-AUC) and area under the precision-recall (PR) curve (PR-AUC). Parameter values were selected to maximize these cross-validated performance metrics. Multiple optimization cycles through all hyperparameters was performed to avoid local maxima. SDC 7 Table 3 lists all parameters optimized, including the range of values considered and their final values for both preoperative and intraoperative models.