• Tidak ada hasil yang ditemukan

A Transition Model for the RSV data

Chapter 1 Introduction

6.5 A Transition Model for the RSV data

Diggle et al (2002) state that transition models are considered as extensions of generalized linear models (GLMs) for describing the conditional distribution of each responseyij as an explicit function of past responsesyij−1, . . . , yi1 and covariates xij. Hence the past outcomes are treated as predictor variables.

If we consider the generalized linear transition model with respect to the Kil- ifi data set, we can model the conditional distribution of Yij given the past as an explicit function of the qpreceding responses. We can assume that the probability of RSV for child i at visit j has a direct dependence on whether or not the child had RSV at visit j−1 as well as on explanantory variables, xij. This is the first case of a first order transition model. If we take the logit link then a first order transition model is given by

logit[P(Yij = 1|Yij−1, . . . , Yi1)]=x0ijβ+αYij−1.

Therefore the probability of RSV at time tij depends on the measured co- variates or explanatory variables but also on whether or not the child had RSV at the previous visit. The parameter exp(α) is the ratio of the odds of infection among the children who did and did not have RSV at the prior visit. The β coefficient is the change per unit change in x in the log odds of infection among children who were free of RSV at the previous visit. The transition model stated above is a first order Markov chain according to Feller (1968, vol 1, p. 132). At equally spaced time intervals the 2×2 transition matrix whose elements are P(Yij =yij|Yij−1 = yij−1) where each of Yij and Yij−1 may take values of 0 and 1 is given by inverting the logistic regression equation for every pair (yij, yij−1) as

Yij

0 1

Yij−1 0 1

1+exp(x0ijβ

exp(x0ijβ) 1+exp(x0ijβ)

1 1

1+exp(x0ijβ+α)

exp(x0ijβ+α) 1+exp(x0ijβ+α)

However in the general transition model we let Hij = {Yi1, . . . , Yij−1} rep- resent the past responses for the i-th subject, µcij = E(Yij|Hij) and let vcij = var(Yij|Hij) be the conditional mean and variance of Yij given past responses and the explanatory variables. We can specify the model analo- gous to the GLM for independent data, where we assume:

g(µcij) =x0ijβ+

s

X

r=1

fr(Hij;α) = x0ijβ+h0ijα (6.10) and

vijc =v(µ0ij

We model the transition from the prior state by the functions fr to the present response. The past outcomes after transformation by the known

functions fr are treated as explanatory variables. Interactions among the prior responses may be considered. We can then fit the transition model using GLM techniques and treat the repeated transitions for a child/subject as independent events.

General

Diggle et al (2002) focus on the case where the observation times tij are equally spaced. The history for subject i at visit j is denoted as Hij = {yik, k = 1, . . . , j−1}. The most useful transition models are Markov chain for which the conditional distribution of Yij given Hij depends only on the q prior responses Yij−1, . . . , Yij−q. The integer q represents the order of the model. Writing the conditional p.d.f of Yij as an exponential family type of distribution gives

f(yij|Hij) = exp{[yijθij −ψ(θij)]/φ+c(yij, φ)} (6.11)

for known functionsψ(θij) andc(yij, φ). The conditional mean and variance:

µcij =E(Yij|Hij) =ψ0ij) and v0ij =var(Yij|Hij) = ψ00ij

Diggle et al. (2002) consider models where the conditional mean and vari- ance satisfy the equations

g(µcij) =x0ijβ+

s

X

r=1

fr(Hij;α) for suitable functions fr and

vijc =v(µ0ij

where h and v are known link and variance functions determined from the

density function. Hence the transition model expresses the conditional mean as a function of both covariates xij and of the past responsesYij−1, . . . , Yij−q

in a much more general setting. We assume that the past affects the present through the sum of sterms each of which may depend on theq prior values.

As an example: A logistic regression model for binary responses assuming a first order Markov chain (Cox, 1970, Korn and Whittemore, 1979, Zeger et al., 1985) specified as:

g(µcij) = x0ijβ+αyij−1 (6.12) where g(µcij) = logit(µcij), v(µcij) = µcij(1−µcij), fr(Hij, α) =αryij−r,

s =q = 1 and µcij = Prob(Yij = 1|Hij).

A first order Markov model can be fitted by making use of the likelihood function. The contribution to the likelihood for theithsubject can be written as:

Li(yi1, . . . , yini =f(yi1)Qni

j=2f(yij|Hij) whereHij is the history measure- ment at occasion j given by Hij ={yij−1}

In a Markov model of order q, the conditional distribution of Yij is

f(yij|Hij) =f(yij|yij−1, . . . , yij−q) so that the likelihood is:

f(yi1, . . . , yiq)Qni

j=q+1f(yij|yij−1, . . . , yij−q)

The GLM in Eq.(6.8) specifies only the conditional distribution f(yij|Hij) whilst the likelihood of the first q observations f(yi1, . . . , yiq) is not specified directly. In the logistic and log-linear modelsf(yi1, . . . , yiq) is not determined from the GLM assumption about the conditional model and the full likeli- hood is unavailable. An alternative is to estimate β and α by maximizing

the conditional likelihood given by:

QN

i=1f(yiq+1, . . . , yini|yi1, . . . , yiq) = QN i=1

Qni

j=q+1f(yij|Hij) where N is the number of subjects or clusters in the study.

There are 2 distinct cases to consider in the maximization process of the likelihood

CASE 1

fr(Hij,α, β)=αrfr(Hij) so that

g(µcij)=x0ijβ+Ps

r=1αrfr(Hij)

Clearly g(µcij) is a linear function of both β and α=(α1, . . . , αs)0 so that estimation is the same as for GLMs for independent data. We regress Yij on the (p+s) dimensional vector of extended explanatory variables (x0ij, f1(Hij), . . . , fs(Hij))0.

CASE 2

Case 2 occurs when functions of past responses include both β and α. Ex- amples are linear and log-linear models. The Iterative weighted least squares method is used to estimate β and α. This exposition is given in Diggle et al (2002, pg. 193-194). As a summary; the derivative of the log conditional likelihood or conditional score function has the form

S0(δ) =

m

X

i=1 ni

X

j=q+1

∂µcij

∂δ vijc−1(yij −µcij) = 0 (6.13) where δ = (β,α). The above equation is analogous to the GLM score equation. The derivative ∂µ

c ij

∂δ is analogous to xij but it can depend on both α and β. The iterative weighted least squares procedure is formulated as

follows. Let Yi be the (ni−q) vector of responses forj =q+ 1, . . . , ni and µcij its expectation given byHij.

Let Xi be an (ni−q)x (p+s) matrix with the kth row given by

∂µiq+k

∂δ and Wi = diag

1 v0ik+q

, k = 1, . . . , ni−q be an (ni −q)x (ni−q) diagonal weighting matrix.

Finally let Zi=Xiδ+ (Yˆ i − µˆci) then an updated δˆ can be obtained by iteratively regressing Z onXusing weights in W. When the correct model is assumed for the conditional mean and variance, the solution ˆδ of Eq.(6.13) asymptotically follows a Gaussian distribution, as N goes to infinity, with mean equal to the true value δ and (p+s)×(p+s) variance matrix:

Vδ =

N

X

i=1

Xi0WiXi

!−1

The variance Vδ depends on both α and β and a consistent estimate ˆVδ is obtained by replacing α and β by their estimates ˆα and ˆβ. However when the conditional mean is correctly specified and variance is not, consistent inferences about δ can be still obtained using the robust variance:

VR=

m

X

i=1

Xi0WiXi

!−1 m

X

i=1

Xi0WiViWiXi

! m X

i=1

Xi0WiXi

!−1

.

A consistent estimate of VR can be obtained by replacing Vi = var(Yij|Hi) by its estimate (Yi−µcic)(Yi−µcic)0. Interestingly, even when the Markov assumtption is violated, the robust variance will give more consistent confi- dence intervals for ˆδ. This concludes the estimation process for the transition model.

6.6 Software for fitting Conditional Models