• Tidak ada hasil yang ditemukan

https://links.lww.com/WAD/A346

N/A
N/A
Protected

Academic year: 2024

Membagikan "https://links.lww.com/WAD/A346"

Copied!
5
0
0

Teks penuh

(1)

Supplemental Material for

“A comparison of methods for predicting future cognitive status: mixture modeling, latent class analysis, and competitors”

Details on deriving categorical variables for latent class analysis

Since latent class analysis requires categorical inputs, numerical variables are recoded as follows:

Summary data of each numerical variable are obtained, and the 25th and 75th percentiles are extracted for use as boundaries. Specifically, the 25th and 75th percentiles of the numerical variables are 46 and 57 respectively for Tscore, 72.75 and 81.25 respectively for age, 28 and 30 respectively for MMSE, and 14.75 and 18 respectively for education. The values of the numerical variable in question less than or equal to the 25th percentile are recoded as 1, values greater than 25th and less than or equal to 75th percentile are recoded as 2, and values greater than the 75th percentile are recoded as 3. Since the 75th percentile is the maximum of MMSE, we had only two categories for the recoded MMSE.

Overview of the singular Bayesian information criterion

The singular Bayesian information criterion (sBIC) was proposed by Drton and Plummer in 20171. To understand the sBIC, it is helpful to first review the Bayesian information criterion (BIC)2.

The BIC is an often-used tool for selection of statistical models. The log likelihood for a fitted model is penalized with a term that is proportional to the product of the number of parameters with the logarithm of the sample size. Roughly speaking, the penalty compensates for inflation of the log

likelihood, due to using maximum likelihood parameter estimates. Such inflation becomes more pronounced when there are more parameters, so the penalty needs to increase with the number of parameters.

(2)

For mixture models, the inflation of the log likelihood is not as easily described as in many other statistical settings. This relates to a mathematical phenomenon called “singularity” of the Fisher

information matrix. The essence of this phenomenon is that, if one chooses a mixture model with too many components, there are redundancies among patterns of change in the log likelihood, as different model parameters are varied. These redundancies attenuate the inflation of the log likelihood, versus what occurs in many other statistical settings, so that the penalty in the BIC is not properly calibrated for mixture models; in particular, the penalty in the BIC is too strong1.

The sBIC accounts for the lesser degree of log likelihood inflation in mixture models (and in other settings where the Fisher information matrix is singular). The sBIC has a lighter penalty, but there is not a simple formula to express it. Rather, a system of nonlinear equations is solved to yield values of the sBIC for different models under consideration. These nonlinear equations depend on quantities called learning coefficients, which are not known in general. However, approximating bounds for these learning coefficients give rise to approximate versions of the sBIC1.

For the T score data considered in our paper, an approximate version of the sBIC turned out to be extremely close to the BIC (this will not always be the case1), and both criteria strongly preferred a three-component model over a two-component model (cf. Figure 1a).

References for Supplemental Material

1. Drton M, Plummer M. A Bayesian Information Criterion for Singular Models. J. R. Statist. Soc. B (2017), 79, Issue 2, pp. 323–380.

2. Schwarz G. Estimating the Dimension of a Model. Annals of Statistics (1978), 6, Issue 2, pp. 461-464.

(3)

Supplemental Figure 1: Graphical summary of results from latent class analysis. Heights of bars

estimate, within each of three classes, relative frequencies of the two sexes and of values for categorical versions of education, mini mental state exam, age, and T score. How categorical versions were derived from underlying numeric variables is described earlier in this supplement.

(4)

Supplemental Figure 2: Kaplan-Meier plots for time to cognitive transition, by groups obtained from latent class analysis. While the Kaplan-Meier curve for group 2 clearly separates from the other two, the Kaplan-Meier curves for groups 1 and 3 are intertwined. Thus, latent class analysis does not yield groups for which there is a clear ordering of risk for cognitive transition.

(5)

Supplemental Figure 3: Kaplan-Meier plots for time to cognitive transition, by groups based on T score cutoffs one standard deviation above and below the mean for historical controls. Apart from a brief crossing, the Kaplan-Meier curves are fairly well separated for patients whose baseline T scores are below 40, between 40 and 59, and above 59. The coarse appearance of the curve for patients with T scores below 40 is because there were not many such patients.

Referensi

Dokumen terkait