Limitations - Discussion of thesis - The evaluation of different strategies to improve the diag

Chapter 6: Discussion of thesis

6.2 Limitations

108

Using decision curve analysis to assess clinical utility, both CPMs showed equivalent or higher net benefit across all threshold probabilities compared with other tools or strategies (including WHO-recommended screening tools). Compared with both CPMs, CRP (≥5mg/L) showed optimal net benefit across a plausible range of thresholds (~13 to 32 confirmatory tests performed to identify one tuberculosis case). The newly developed CRP-based CPMs added value at more extreme thresholds – if resources permit more confirmatory tests per diagnosed case or if resources only allow fewer confirmatory tests per diagnosed case. The W4SS demonstrated lower net benefit compared with other screening tools and both CPMs.

The W4SS would capture 91% of tuberculosis cases and result in confirmatory testing for 78% of participants; CRP (≥5 mg/L) would capture a similar number of participants compared with W4SS but reduce confirmatory tests required by 36%.

109

of HIV-associated tuberculosis.¹ Study findings might not also be generalisable to children with HIV, and test performance might vary in the context of regular screening.

Third, for the studies that contributed to chapters 2, 3, and 4, I only included participants in analyses if they had complete data on both the index test in question and the reference standard. For example, participants who were unable to produce a sputum sample were largely excluded from several analyses because a sputum sample is required for sputum culture, meaning that findings might not generalise to this to those who are unable to produce sputum. In chapter 5, we included participants unable to produce a sputum sample, since multiple imputation was used to deal with missing data prior to prediction modelling.

Fourth, the reference standard might be considered imperfect. Few studies included extrapulmonary tuberculosis samples for culture or Xpert, meaning our results are more applicable to pulmonary tuberculosis. Since IPDMAs in chapters 3 and 4 were based on inpatient

cohorts, an imperfect reference standard might affect the results of these chapters to a greater extent. Inpatients often present with extrapulmonary or disseminated tuberculosis and

produce paucibacillary sputum samples.²³ The reference standard in all of the IPDMAs might also be considered imperfect because sputum culture, which was all that was done in most of the included studies, should ideally comprise multiple samples collected in the early morning to maximise sensitivity, but this was not done in any of our included studies. The imperfect reference standard may result in an underestimation of specificity and overestimation of sensitivity of existing algorithms. Tuberculosis prevalence estimates are also likely to be underestimates because of the limitations of our reference standard. A composite reference standard (that includes clinical assessment) would likely have resulted in lower sensitivity and higher specificity. However, a composite reference standard has disadvantages since the individual components are assumed to have the same accuracy and to be independent of one another.^217,218 Other methods, such as latent class analysis, may be useful in the absence of gold standard test.

It is unlikely, however, that this limitation would alter the findings in this thesis. For the IPDMAs described in chapters 2 and 5, most tuberculosis cases in an outpatient screening setting are likely pulmonary tuberculosis cases. Our results were also consistent across several reference standards: culture, combinations of culture and Xpert, and Xpert (which is the currently recommended confirmatory test). Diagnostic yield analyses for Xpert and LF- LAM confirmatory tests in chapters 3 and 4 also did not require a reference standard. For the

110

IPDMAs in chapters 2, 3, and 4, the alternative reference standards to assess screening tools were the WHO-recommended confirmatory tests Xpert and LF-LAM, which correctly classifies Xpert or LF-LAM positive tuberculosis, respectively. WHO recommends that the diagnostic accuracy of screening tools be assessed against recommended confirmatory tests that follow and not just culture (which is the gold standard).⁷⁹ Furthermore, estimates of the proportion of inpatients eligible for Xpert and AlereLAM according to WHO criteria in chapters 2, 3, and 4 were based on data with higher methodological quality, since these analyses did not require a reference standard.

Fifth, although the IPDMAs in chapters 2, 3, and 4 report diagnostic test accuracy using direct comparisons, which minimizes confounding by applying both tests to each individual, these analyses were limited by fewer studies and reduced precision. Furthermore, the limited number of studies included in the IPDMAs for chapters 3, 4, and 5 precluded adequate investigation of heterogeneity, as well as publication bias.

Sixth, I was unable to obtain IPD for 3 out of the 25 studies that contributed to chapter 2, although these 3 studies comprised only 8% of potentially available data. For the IPDMAs described in chapters 3, 4, and 5, all studies identified in the systematic review were obtained and included in analyses.

Seventh, data on some confirmatory tests were limited or not sought. Only 2 studies evaluated FujiLAM. Although we sought data for Xpert Ultra, only 1 study in chapter 1 assessed Xpert Ultra,¹⁷³ and no studies in chapters 2 and 3 assessed Xpert Ultra. We also did not attempt to obtain data on other molecular-based tests, such as TB-LAMP and Trunat assays. However, TB-LAMP has suboptimal sensitivity in PLHIV and is not recommended by WHO.⁶¹ Furthermore, WHO recently assessed the diagnostic accuracy of Trunat assays, finding that no study has assessed this assay in PLHIV irrespective of tuberculosis symptoms and signs.⁵⁸

Eighth, data on some screening tools were limited. We used W4SS or CD4 cell count ≤200 cells/μL as WHO eligibility criteria for AlereLAM in chapter 4, given limited data on WHO- defined danger signs and WHO stage. However, if WHO-defined danger signs and WHO stage were included in the definition of WHO eligibility criteria for AlereLAM, the

proportion eligible for AlereLAM would be even higher. Thus, this limitation would not alter the conclusions of this chapter. For the IPDMA in chapter 5, several potential predictors of tuberculosis were not included during CPM development. These predictors were missing for

111

a large proportion of participants or unmeasured in several cohorts. For example, data on haemoglobin, a well-known predictor of tuberculosis,¹²¹ was missing in 43% of individuals overall and systematically missing (i.e., 100% missing) in 2 cohorts. I was also unable to validate several published CPMs in the literature with predictors that were not measured in some or all cohorts.111,112,114,116,145,204

Finally, for the IPDMAs in chapters 2 and 3, calculations based on a hypothetical cohort were presented to give insight into consequences of screening and confirmatory testing, but these calculations were often based on heterogenous diagnostic test accuracy measures.

Furthermore, in the case of inpatients, these calculations were based on diagnostic accuracy results derived from few participants, some of whom had an imperfect reference standard done. Therefore, these results should be treated with caution given the uncertainty of the estimates that these results were based on.

Dalam dokumen The evaluation of different strategies to improve the diagnosis of tuberculosis in people living with HIV in resource-limited settings (Halaman 135-138)