Chapter 1: Introduction
1.5 Screening tools for active tuberculosis
1.5.4 Clinical prediction models for tuberculosis screening
Clinical prediction models (CPMs) may also be used as a screening tool to identify PLHIV who are at increased risk of tuberculosis and need further confirmatory testing. CPMs (also
25
known as clinical prediction rules, risk models or risk scores) predict the risk of disease (e.g., tuberculosis) for a person using the combination of multiple predictors.103,104 These predictors include clinical information (e.g., age, sex, symptoms, and signs) and laboratory tests (e.g., CD4 cell count and CRP). CPMs require appropriate development and validation to be clinically useful.
The development of CPMs require several steps.105 Once the dataset is prepared, the CPM is typically developed using multivariable regression methods such as logistic regression. This process requires several decisions: identifying candidate predictors, determining the
functional form of predictors, assessing sample size relative to number of model parameters, handling missing data using a suitable method, and identifying an appropriate strategy for predictor selection.103 A full model approach, in which all candidate predictors are included, is one strategy for predictor selection.106 This approach reduces overfitting, but requires extensive knowledge about the most relevant predictors. Conversely, an alternative predictor selection approach chooses predictors by backward or forward selection.106 Forward selection begins with an empty model to which predictors are added, while backward selection begins with all predictors and removes predictors.106 Backward selection is preferred as all
predictors are assessed at the same time. Univariable selection is also commonly used in the literature but is not recommended since important predictors may be rejected.107
The validation of a CPM involves internal validation (using data from the same population) and external validation (using data from a different population). At validation, the
performance of the CPM is evaluated by determining the model’s discrimination and calibration. Discrimination refers to how well the model can differentiate between patients that have active tuberculosis and those that do not.105 Discrimination is assessed using the concordance statistic (C-statistic). A C-statistic of ≥0.7 and ≥0.8 is considered as acceptable and excellent performance, respectively.108 Calibration refers to agreement between expected and observed outcomes.105 Calibration is assessed using calibration-in-the-large (a value of 0 indicating perfect calibration), calibration slope (a value of 1 indicating perfect calibration), and calibration plots. The clinical utility of a CPM to improve decision making should also be evaluated. Clinical utility can be evaluated using decision curve analysis, which shows the net benefit (i.e., benefit versus harm) over a range of clinically relevant threshold
probabilities.109 The model or test with the greatest net benefit for a particular threshold is considered to have the most clinical value.109
26
The TRIPOD statement was developed to provide recommendations on the development and validation of CPMs.107 However, despite the availability of the TRIPOD statement,107 as well as other guidance and methodological frameworks,104,105 CPMs are still poorly developed and validated.110 There have been few CPMs developed for tuberculosis screening either in PLHIV irrespective of signs and symptoms of tuberculosis or in PLHIV who have a positive W4SS. 111-116 However, the methodology used to develop and/or validate these CPMs is inadequate.
Auld et al developed a CPM for active tuberculosis in outpatient PLHIV not on ART
irrespective of symptoms and signs of tuberculosis.111 The CPM included W4SS symptoms, sex, smoking status, temperature, body-mass index (BMI), and haemoglobin as predictors.
The analysis involved splitting a cohort of PLHIV not on ART enrolled in Botswana by geographic region into development and internal validation datasets. No sample size
calculations were provided. Although the development dataset had 189 tuberculosis cases, 15 predictors were assessed including 6 continuous variables that were assessed for nonlinearity, meaning that overfitting of the model was a possibility. In the development dataset, sputum was only collected in those with a positive W4SS (for Xpert testing), and sputum culture was only performed if 4 sputum samples were collected. The final CPM was converted to a simplified score, which included categorization of continuous variables. This approach is known to lead to a loss of discriminative ability. Only the simplified score was externally validated in 3 outpatient cohorts from South Africa. The 3 cohorts differed from the
derivation cohort in that one included a high percentage on ART, another included those with low CD4 cell counts, and the third included PLHIV not on ART derived from a background population with high tuberculosis prevalence. The final model showed excellent and
acceptable discrimination in the derivation (C-statistic: 0.82) and internally validated (C- statistic: 0.77) datasets. However, the CPM had suboptimal and variable discrimination in the 3 external validation datasets with C-statistics of 0.63, 0.71, and 0.79. Furthermore, at a cut- off that provided similar sensitivity to W4SS, the score did not improve specificity. In 2 of the 3 external validation datasets, 45% and 29% of participants were also excluded because of missing data, respectively. There was no assessment of the clinical utility of the CPM.
Baik et al developed a CPM for tuberculosis in outpatients who were both HIV-positive and HIV-negative and who were symptomatic (i.e., W4SS positive) using a dataset from 28 clinics in South Africa.112 The CPM included age, sex, HIV status, diabetes, W4SS
symptoms, and cough ≥2 weeks as predictors. The CPM was converted to a simplified score
27
following categorization of continuous predictors and externally validated using a dataset from 4 clinics in Uganda. In the development and external validation datasets, the definition of tuberculosis was only a positive sputum Xpert. Discrimination was acceptable in the validation cohort (C-statistic: 0.75). However, discrimination was not assessed specifically in those who were HIV-positive. Decision curve analysis was performed but the score was not compared to other screening tools such as the W4SS. Because the authors used a random sample of those without tuberculosis, spectrum bias is a concern.
Hanifa et al developed a CPM among outpatient PLHIV with a positive W4SS who were drawn from a larger cohort of PLHIV enrolled irrespective of symptoms and signs of tuberculosis in South Africa.113 The CPM included ART status, CD4 cell count, BMI, and W4SS as predictors. The analysis involved splitting the cohort by time into development and internal validation datasets. The development dataset had 52 tuberculosis cases. However, 11 predictors were assessed including nonlinear terms for continuous variables and interaction terms, meaning that overfitting of the model was likely. The final CPM was converted to a simplified score following categorization of continuous predictors. The score showed
acceptable discrimination during internal validation (C-statistic: 0.72). Boyles et al externally validated the full CPM among outpatient PLHIV who were enrolled from a PCF setting in South Africa and found adequate calibration but suboptimal discrimination (C-statistic: 0.65), suggesting that performance would be even lower for the simplified score that was derived from the full CPM.115
Balcha et al developed a CPM among outpatient PLHIV not on ART with a positive W4SS who were drawn from a larger cohort enrolled irrespective of symptoms and signs of tuberculosis in Ethiopia.114 The CPM incorporated cough, lymphadenopathy, haemoglobin, Karnofsky score, and mid-upper arm circumference (MUAC). During model development, initial predictors were selected using their univariable associations with tuberculosis – a procedure that may falsely exclude important predictors. Continuous variables were also dichotomized. The development dataset had 137 tuberculosis cases, but since 25 predictors were assessed, overfitting of the model is likely. Furthermore, the authors excluded those unable to produce a sputum sample and those with a clinical diagnosis of tuberculosis. The final CPM was converted to a simplified score, which showed acceptable discrimination in the derivation cohort (C-statistic: 0.75). However, the CPM and simplified score were not internally or externally validated. The CPM and simplified score are also complex, requiring
28
examination for lymphadenopathy, measurement of haemoglobin concentration, and assessment of Karnofsky Performance.
Boyles et al developed 2 CPMs for active tuberculosis in outpatient PLHIV with a positive W4SS in South Africa based on first visit and return visit, respectively.115 The CPM based on first visit included ART status, number of W4SS symptoms, duration of W4SS symptoms, and temperature as predictors. The CPM based on return visit included change in symptoms after antibiotics, CRP at return visit, number of W4SS symptoms, duration of W4SS
symptoms, and ART status. Both models were developed and internally validated according to the TRIPOD principles.107 During internal validation, the CPM based on first visit showed suboptimal discrimination (C-statistic: 0.68), while the CPM based on return visit showed acceptable discrimination (C-statistic: 0.76). However, externally validation and clinical utility of both models has not yet been assessed.
Nanta et al developed a CPM among PLHIV who were enrolled regardless of symptoms and signs of tuberculosis from an ART clinic, tuberculosis clinic, and outpatient and inpatient departments in Thailand.116 The CPM included BMI ≤19 kg/m2, cough >2 weeks, shaking chills ≥1 week, ART status, CD4 cell count ≤200 cells/µl, and history of tuberculosis.
However, the study had several limitations. During model development, the authors selected predictors using their univariable associations with tuberculosis. Overfitting was likely, because 43 predictors were assessed, but there were only 66 cases of tuberculosis. The CPM has not been validated (either internally or externally), and clinical utility has not yet been assessed.
In summary, although there are several CPMs for tuberculosis screening in PLHIV enrolled irrespective of tuberculosis symptoms and signs or in PLHIV with a positive W4SS, they have several limitations. Current CPMs have been developed using many predictors relative to number of events,111,113,114,116 univariable selection of predictors,114,116 or categorization of continuous variables.111-114 Current CPMs have also not been internally validated,114,116 shown suboptimal performance at external validation,111,113 have not been externally validated or extensively externally validated,112-116 or have not been assessed for clinical utility.111,113-116