Efficient Two-Phase Estimation - A LaTeX Format for Theses and Dissertations

4.4 Methods

4.4.1 Efficient Two-Phase Estimation

effects. We assume that the cumulative hazard function of the event timeT conditional on covariatesXand Ztakes the form^R₀^texp

β(s)X+γ^TZ dΛ(s), whereβ(·)is the time-varying regression coefficient ofX,γ is the time-fixed regression coefficient ofZ, andΛ(·)is an unspecified positive increasing function. In the presence of right censoring, we observeY and∆instead ofT, whereY =min(T,C),∆=I(T≤C),Cis the censoring time onT, andI(·)is the indicator function.

LetP(·|·)denote a conditional density function. If(Y,∆,X,Z)is observed for allnsubjects in the study, then the inference onβ(·),γ, andΛ(·)is typically based on the likelihood∏ⁿ_i=1P(Y_i,∆_i|X_i,Z_i).

Under the two-phase design, however, only(Y,∆,Z)is measured on allnsubjects in Phase I, andXis measured for a sub-sample of sizen2in Phase II. LetRbe the selection indicator for the measurement ofX in Phase II. We assume that the distribution of(R₁, . . . ,Rn)depends on(Y_i,∆i,X_i,Z_i) (i=1, . . . ,n)only through the Phase I data(Y_i,∆i,Z_i) (i=1, . . . ,n). This assumption implies that the data onXare missing at random, such that the joint distribution of(R₁, . . . ,R_n)conditional on(Y₁,∆1,Z₁, . . . ,Y_n,∆n,Z_n)can be disregarded in the likelihood inference ofβ(·),γ, andΛ(·). Following Zeng and Lin (2014), we further assume that the censoring timeCis independent ofT given(X,Z)among subjects withR=1 and independent ofT andXgivenZamong subjects withR=0. In this situation, the observed-data likelihood takes the form

n i=1

∑

R_i{logP(Y_i,∆_i|X_i,Z_i) +logP(X_i|Z_i)}+

n i=1

∑

(1−R_i)log _Z

P(Y_i,∆_i|x,Z_i)P(x|Z_i)dx

, (4.1)

where

P(Y_i,∆i|X_i,Z_i)∝

Λ^′(Y_i)exp{β(Y_i)X_i+γ^TZ_i}^∆ exp

− Z_Y_i

exp

β(t)X_i+γ^TZ_i dΛ(t)

wheref^′(x) =d f(x)/dx. Our main interest lies in the inference ofβ(·).

We use non-parametric maximum likelihood estimation and sieve approximation to maximize expression (4.1). Firstly, we approximateβ(·)on[0,τ]using B-splines, whereτis the study duration; that

is, we assume

β(t) =

l=1

∑

αlA^w_l(t),

whereA^w_l(·)is thelth B-spline basis function of orderwon[0,τ],d_nis the total number of functions in this B-spline basis, andα_jis the coefficient forA^w_l(·) (l=1, . . . ,d_n)in the B-spline approximation ofβ(·).

Secondly, we estimateΛ(·)by a step function with jumps only at the observedY_iwith∆_i=1(i=1, . . . ,n).

Letλ_ibe the jump size ofΛ(·)atY_i. We haveλ_i>0 and=0 when∆_i=1 and 0, respectively. Finally, ifZ is discrete, then for each distinct observedZ=Z, we estimateP(X|Z)by a discrete probability function on the distinct observed values ofX, denoted byx1, . . . ,xm(m≤n2), wheremis the total number of distinct values ofX (i.e.,mincreases withn₂). IfZcontains continuous components, then this nonparametric estimation procedure becomes infeasible because only a small number of observations onXare associated with eachZ. In this situation, we approximateP(x|Z_i)and logP(x|Z_i)in expression (4.1) by

P(x|Z_i) =

∑

k=1

I(x=x_k)

∑

j=1

B^q_j(Z_i)p_{k j},

and

∑

k=1

I(x=x_k)

∑

j=1

B^q_j(Z_i)logp_{k j},

respectively, whereB^q_j(·)is the jth B-spline basis function of orderq,snis the total number of functions in the B-spline basis, andp_{k j}is the coefficient ofB^q_j(Z_i)atx_k(k=1, . . . ,m; j=1, . . . ,s_n) in the B-spline approximation ofP(x|Z_i). Details about the construction of the B-spline bases{A^w_l (·)}^d_l=1ⁿ and{B^q_j(·)}^s_j=1ⁿ and guidelines about the choices of(w,d_n)and(q,s_n)can be found in Chen et al. (2013) and Tao et al.

(2017), respectively. In practice,wandqare typically chosen to be less than or equal to four, which corresponds to cubic splines, andd_nands_nare determined by the Phase I sample sizen. We note thatp_{k j}

needs to satisfy the constraints

m k=1

∑

p_{k j}=1 andp_{k j}≥0(k=1, . . . ,m; j=1, . . . ,s_n) (4.2)

becauseP(x|Z_i)is a conditional probability function. Consequently, the observed-data log-likelihood can be rewritten as

l_n(θ,{λ_i},{p_{k j}}) =

∑

i=1

R_i (

logP(Y_i,∆_i|X_i,Z_i) +

∑

k=1 sn

∑

j=1

I(X_i=x_k)B^q_j(Z_i)logp_{k j} )

∑

i=1

(1−R_i)log ( m

∑

k=1

P(Y_i,∆i|x_k,Z_i)

∑

j=1

B^q_j(Z_i)p_{k j} )

, (4.3)

where

P(Y_i,∆_i|x,Z_i)∝ (

λ_iexp

∑

l=1

α_lA^w_l(Y_i)x+γ^TZ_i

!)∆

×exp (

−

∑

i^′=1

I(Y_i′≤Yi)λ_i′exp

l=1

∑

α_lA^w_l(Y_i′)x+γ^TZ_i

!) ,

We aim to maximize expression (4.3) under the constraints (4.2).

It is difficult to maximize expression (4.3) directly because of the intractable form of the second term.

Following Tao et al. (2017), we solve this maximization problem by artificially creating a latent variableU for subjects withR=0 such thatUtakes values on 1/s_n,2/s_n, . . . ,1 and satisfies the equations

P(U= j/s_n|Z) =B^q_j(Z),

P(X=x_k|Z,U= j/s_n) =P(X=x_k|U=j/s_n) =p_{k j}, P(Y,∆|X,Z,U) =P(Y,∆|X,Z).

This step is essential because it enables us to interpret∑^s_j=1ⁿ B^q_j(Z)p_{k j}asP(X=x_k|Z)for subjects with R=0. Hence, the second term in expression (4.3) is equivalent to the log-likelihood of(Y_i,∆i,Z_i), assuming that the complete data consist of(Y_i,∆i,X_i,Z_i,U_i)but withX_iandU_imissing.

The maximization of expression (4.3) is carried out through an EM-algorithm, where(X,U)for subjects withR=0 are treated as missing data. The complete-data log-likelihood is

n i=1

∑

R_i (

logP(Y_i,∆i|X_i,Z_i) +

m k=1

∑

j=1

I(X_i=x_k)B^q_j(Z_i)logp_{k j} )

n i=1

∑

(1−R_i){logP(Y_i,∆i|X_i,Zi) +logP(X_i|U_i) +logp(U_i|Z_i)}

∑

i=1

R_i (

logP(Y_i,∆_i|X_i,Z_i) +

∑

k=1 sn

∑

j=1

I(X_i=x_k)B^q_j(Z_i)logp_{k j} )

∑

i=1

(1−R_i) ( m

∑

k=1

I(X_i=x_k)logP(Y_i,∆i|x_k,Z_i)

∑

k=1 sn

∑

j=1

I(X_i=x_k,U_i=j/s_n)logp_{k j}+

∑

j=1

I(U_i=j/s_n)logB^q_j(Z_i) )

Letθ= (α₁, . . . ,αdn,γ^T)^T. We start with the following initial values:bθ

(0)=0,bλ_i⁽⁰⁾=∆i/ ∑ⁿ_i^′₌₁∆i , and bp⁽⁰⁾_{k j} =n

∑ⁿi=1R_iI(X_i=x_k)B^q_j(Z_i)o

∑ⁿi=1R_iB^q_j(Z_i)o .

In the E-step of the(t+1)th iteration, we calculate the conditional expectations of I(X_i=x_k,U_i=j/s_n)andI(X_i=x_k)given(Y_i,∆_i,Z_i),x₁, . . . ,x_m, evaluated atbθ^(t),bλ₁^(t), . . . ,bλ_n^(t), bp^(t)₁₁, . . . ,pb^(t)_ms_n, denoted asψb_{k ji}^(t+1)andqb^(t+1)_ik , respectively. That is,

ψb_{k ji}^(t+1)=











I(X_i=x_k)B^q_j(Z_i), R_i=1,

P(Y_i,∆_i|x_k,Z_i)B^q_j(Z_i)pb^(t)_{k j}

∑^m_k′=1P(Yi,∆i|x_k′,Zi)∑^sn_j′=1B^q

j′(Zi)bp^(t)

k′j′

,R_i=0,

qb^(t+1)_ik =











I(X_i=x_k), Ri=1,

P(Yi,∆i|x_k,Zi)∑^sn

j′=1B^q

j′(Zi)bp^(t⁾

k j′

∑^m_k′=1P(Yi,∆i|x_k′,Zi)∑^sn_j′=1B^q

j′(Zi)bp^(t)

k′j′

,R_i=0.

In the M-step of the(t+1)th iteration, we updateθb

(t+1)

andbλ_i^(t+1)(i=1, . . . ,n) by maximizing

∑

i=1 m

∑

k=1

qb^(t+1)_ik logP(Y_i,∆_i|x_k,Z_i), (4.4) which is a weighted likelihood for the Cox model with time-varying effects. Let

θ_ik(t) = (A^w₁(t)x_k, . . . ,A^w_d_n(t)x_k,Z^T_i)^T,

and

G_ik(t) =exp

l=1

∑

αlA^w_l(t)x_k+γ^TZ_i

! .

We updatebθ

(t+1)

by solving

S(θ) =

n i=1

∑

m k=1

∑

qb^(t+1)_ik ∆i

(

θik(Y_i)−∑ⁿ_i^′₌₁∑^m_k=1I(Y_i^′≥Y_i)bq^(t+1)_i′k G_i^′_k(Y_i)θ_i^′_k(Y_i)

∑ⁿ_i′=1∑^m_k=1I(Y_i′≥Y_i)bq^(t+1)_i′k G_i^′_k(Y_i) )

using the one-step Newton–Raphson algorithm. That is, we updateθb

(t+1)

bθ

(t+1)

=θb

(t)+

Ω

bθ

(t)−1

θb

(t) ,

where

Ω(θ) =

∑

i=1 m

∑

k=1

qb^(t+1)_ik ∆_i

∑ⁿ_i^′₌₁∑^m_k=1I(Y_i′ ≥Yi)qb^(t+1)_i′k G_i^′_k(Y_i)θ_i′k(Y_i)^⊗2

∑ⁿ_i^′₌₁∑^m_k=1I(Y_i^′≥Y_i)qb^(t+1)_i′k G_i^′_k(Y_i)

− n

∑ⁿ_i^′₌₁∑^m_k=1I(Y_i^′ ≥Y_i)qb^(t+1)_i′k G_i^′_k(Y_i)θ_i^′_k(Y_i)o⊗2

∑ⁿ_i^′₌₁∑^m_k=1I(Y_i^′≥Y_i)qb^(t+1)_i′k G_i^′_k(Y_i)o2

anda^⊗2=aa^T. We updatebλ_i^(t+1)by using the Breslow estimator such that

bλ_i^(t+1)= ∆i

∑ⁿ_i^′₌₁∑^m_k=1I(Y_i^′≥Y_i)bq^(t+1)_i′k G_i^′_k(Y_i) .

We observe thatbλ_i^(t+1)>0 and=0 when∆_i=1 and 0, respectively. We updatebp^(t+1)_{k j} (k=1, . . . ,m;

j=1, . . . ,sn) by maximizing

n i=1

∑

m k=1

∑

j=1

R_iqb^(t+1)_ik B^q_j(Z_i) + (1−R_i)ψb_{k ji}^(t+1) o

logp_{k j},

such that

pb^(t+1)_{k j} = ∑ⁿi=1

R_iqb^(t+1)_ik B^q_j(Z_i) + (1−R_i)ψb_{k ji}^(t+1) o

∑^m_k^′₌₁∑ⁿi=1

R_ibq^(t+1)_ik′ B^q_j(Z_i) + (1−R_i)ψb_k^(t+1)′ji

We observe thatpb^(t+1)_{k j} satisfies the two constraints in expression (4.2).

We iterate between the E-step and M-step until

θb

(t+1)

−bθ

(t) ₁

∑

i=1

bλ^(t+1)−bλ^(t) +

∑

k=1 sn

∑

j=1

bp^(t+1)_{k j} −bp^(t)_{k j} <10⁻⁴

to obtain the sieve maximum likelihood estimator (SMLE)θb,bλi(i=1, . . . ,n), andpb_{k j}(k=1, . . . ,m;

j=1, . . . ,s_n).

To obtain the variance estimate ofbθ, we use the profile likelihood method proposed by Murphy and van der Vaart (2000). By verifying the smoothness conditions of Theorem 1 in Murphy and van der Vaart (2000), it can be shown that the negative inverse of the Hessian matrix of the profile likelihood function pl(θ) =max{{λ_i},{p_{k j}}}l_n(θ,{λ_i},{p_{k j}})is a consistent estimator for the limiting covariance matrix of n^1/2(bθ−θ). In practice, we obtain the value ofpl(θ)by holdingθfixed in the EM algorithm and obtaining the value ofl_n(θ,{λ_i},{p_{k j}})at convergence. We estimate the covariance matrix ofθbby the negative inverse of the matrix whose(k,l)th element is

h⁻²_n n

pl(bθ+e_kh_n+e_lh_n)−pl(bθ+e_kh_n)−pl(bθ+e_lh_n) +pl(bθ)o ,

wheree_kis thekth canonical vector, andh_nis a constant of the ordern^−1/2.

Dalam dokumen A LaTeX Format for Theses and Dissertations (Halaman 73-80)