3 2.2 Using Stroke Risk Scores in the Presence of Uncollected Risk Factors

The risk of event can then be calculated as a monotonic function of the risk score. Despite the popular utility of risk prediction modeling in clinical research, its application can present a set of challenges when the required risk factors are missing or not collected. In the following, three examples on the application of Framingham Risk Score are provided to illustrate the three approaches. 2016) looked at the utility of the Framingham risk score in predicting secondary cardiovascular outcomes for a group of patients diagnosed with coronary heart disease who received percutaneous coronary intervention, where missing risk factors were assumed to be absent in the calculation of the risk score.

Clearly, this approach will underestimate the risk of high-risk patients and lead to a biased evaluation of the utility of the Framingham Risk Score. Another scenario of missing risk factors is uncollected (ie systematically missing) risk factors, where relevant risk factors are not collected for any subjects in the study. This often happens when a risk score is used in studies different from where it was originally developed, and therefore not all risk factors in the risk score are included in the study protocol. In the presence of uncollected risk factors, the above complete case analysis and the imputation method cannot be used.

Extensive simulation studies will be conducted to evaluate the characteristics of uncollected risk factors in relation to risk prediction model performance. The risk factors identified were age, systolic blood pressure, antihypertensive therapy, diabetes mellitus, cigarette smoking, cardiovascular disease, atrial fibrillation and left ventricular hypertrophy (LVH). FSRP risk score is developed as a linear combination of those risk factors and is calculated using a point system for convenience use.

Such naive approach of treating unknown risk factors as absent may result in bias when externally validating a risk prediction model for another population or using a risk score for risk adjustment purposes.

METHODS

The first approach calculates a risk score by omitting the uncollected risk factor from the calculation, which is equivalent to assuming that the risk factor is absent. The second approach calculates a risk score by refitting the model excluding the unsampled risk factor. This presents a challenge in comparing the true and reconstructed models, but is less of a concern for the true and naïve models, since we know that differences in matched pairs are solely attributable to the omitted risk factor.

A simple method to evaluate the addition of the risk factor is to build two models, one with and one without the addition, and evaluate the difference in the area under the receiver-operating-characteristic (ROC) curve (AUC). Generally, this is used when a risk score has already been developed and a new risk factor is individually identified as predictive of the outcome of interest. The relationship of the omitted risk factor to other predictors in the model does not have much influence on the correlation in the naive model.

For the reconstruction model, when the omitted risk factor is independent of the other predictors, the model performs slightly better when measured using the Spearman correlation, but this effect is not seen when the Pearson correlation is used. Similar to the correlation, we saw that the performance of the model was greatly affected by the frequency and weight of the removed risk factor. When the removed risk factor has a small weight, the frequency of other risk factors in the data set is not relevant, but when the weight of the risk factor is large, some information can be gained if the actual risk factors have high prevalence.

The correlation of the omitted risk factor with other predictors in the model did not affect the C index. When the simulated coefficient for the omitted risk factor was small, the effect on the IDI was minimal, but when the risk factor was large, we observed a significant reduction in the IDI. This effect was modified by the frequency of the omitted risk factor, again indicating that omitting a high-frequency risk factor has a detrimental effect on model performance.

Performance was not affected by the correlation of the omitted risk factor with other variables. The patch model often showed only minor improvements, if any, compared to the naïve model. All models performed relatively well, but were best when the omitted risk factor had a high weight or low frequency (Figure 4.4A). Similar to large-scale calibration, the naïve model performs best when the omitted risk factor has a large effect or low frequency.

For most scenarios, the model is also worse if the omitted risk factor has a low weight. Simple arithmetic explains why the size of the mean difference is related to the frequency and weight of the omitted risk factor in the naïve model.

APPLICATION

Compared to the true model, discrimination was most affected when the continuous predictor was omitted (Figure 5.2). Cigarette smoking was the categorical risk factor that produced the largest impact on the C-index when removed from the model. This is due to using the same data that the model was built on for testing.

We can see different results if we split the set into training and testing datasets. For the calibration slope we see the coefficients of the refitting model very close to one due to using the same data for training and testing (Figure 5.3B). In the naïve model we see a larger impact from one of our continuous risk factors, systolic blood pressure.

For categorical risk factors, we see the greatest impact in the naive model when left ventricular hypertrophy is omitted. When we divide the cohort into risk groups, we see differences in the coefficients for each omitted risk factor between the risk groups (Table 5.7). The mean difference in the patient's estimated individual risk when comparing the naive or refit models with the real models is shown in Figure 5.4A.

The difference is not only affected by the weight of the risk factor, but also by the frequency of the risk factor in the original data set. Smoking, CVD, and antihypertensive medications were quite common in the Framingham data set, so we see that ignoring these risk factors would be more detrimental to population risk estimates than LVH, even though LVH has the largest coefficient. The reconstruction model shows almost no loss of information when estimating the patient's risk to the population.

In contrast to the average difference, the maximum absolute difference shows how far an individual estimate can deviate (Figure 5.4B). This concerns the estimate of an individual patient, so the frequency of the risk factor in the sample is not relevant. In the Framingham dataset, assuming LVH to be negative when it is unknown will result in an underestimation of a patient's risk by 15% (Figure 5.4C).

Table 5.2: Coefficient comparison between true and refit models

Calibration in the Large) Fit 2 (Calibration Slope)

DISCUSSION

Single Brachial Index Collaboration (2008), Single brachial index combined with framingham risk score to predict cardiovascular events and mortality: a meta-analysis, JAMA: the journal of the American Medical Association Longitudinal study of heart disease in Framingham, Massachusetts: an interim report, in 'Research in Public Health, Papers presented at the 1951 Annual Conference of the Milbank Memorial Fund', 241–247.