Inference for random effects - Analysis of longitudinal binary data : an application to a disea

Chapter 1 Introduction

3.10 Inference for random effects

Brief discussion on Empirical Bayes (EB) inference and how to carry out best linear unbiased prediction will be outlined.

3.10.1 Empirical Bayes Inference

The purpose of random effects b_i in the model is to reflect how the evolution for theithsubject deviates from the expected evolutionX_iβ. The estimation ofb_i is helpful for the detection of outlying profiles. This strategy is however, only meaningful under the hierarchical model interpretation. Recall that the hierarchical specification of the model is given as

Y_i|b_i ∼N(X_iβ+Z_ib_i,Σ_i), b_i ∼N(0, G).

Since the b_i are random, it is natural to use Bayesian methods. Under this setting or approach the prior distribution forbi will be taken asN(0, G). Its posterior density f(b_i|y_i) is then given by

f(bi|y_i) ≡ f(bi|Y_i =yi)

= f(y_i|b_i)f(b_i) R f(y_i|b_i)f(b_i)db_i

∝ f(y_i|b_i)f(b_i)

∝ . . .

∝ exp{−1

2(b_i−GZ_i⁰W_i(y_i−X_iβ))⁰Λ⁻¹(b_i−GZ_i⁰W_i(y_i−X_iβ))}

for some some positive definite matrix Λ_i. It follows that the posterior distribution of b_i is given by

b_i|y_i ∼N(GZ_i⁰W_i(y_i−X_iβ),Λ_i)

Thus a logical estimate of b_i can be obtained from its posterior mean given by

bˆi(θ) = E[bi|Yi =yi]

= Z

bif(bi|yi)dbi

= GZ_i⁰Wi(α)(yi−Xiβ) (3.17) assume to depend on a parameter θ. It is clear from the above that ˆb_i(θ) is normally distributed with covariance

Var( ˆbi(θ)) =GZ_i⁰{Wi−WiXi(

i=1

X_i⁰WiXi)⁻¹X_i⁰Wi}ZiG

It follows that the inference aboutb_i should account for the variability inb_i. Because of the this reason, inference for b_i should be based on

var( ˆb_i(θ)−b_i) = G−var( ˆb_i(θ)).

It follows that just as for the fixed effects inference discussed in Section 3.7, Wald tests can be derived to test hypotheses about b_i. Parameters in θ are replaced by their ML or REML estimates, obtained from fitting the marginal model. The estimate ˆb_i = ˆb_i(θ) is called the ‘Empirical Bayes’ estimate of b_i. Approximate t-test and F-tests to account for the variability introduced by replacing θ by ˆθ similar to testing for fixed effects can be derived.

3.10.2 Best Linear Unbiased Prediction

Often parameters of interest are linear combinations of fixed effects inβ and random effects in b_i. For example, a subject specific slope is the sum of the average slope for subjects with same covariate values and the subject specific random slope for that subject. Thus in general, suppose

u=λ⁰_ββ+λ⁰_bb_i

is of interest. Conditionally on α, ˆ

u=λ⁰_ββˆ+λ⁰_bbˆ_i

is a best linear unbiased predictor ( BLUP) of u. In fact from the theory of linear models ˆu is linear in the observations Y_i, unbiased foru and it has minimum variance among all unbiased linear estimators and abbreviated as (UMVUE).

3.10.3 Shrinkage estimators

Consider the the prediction of the evolution of the ith subject. That is Yˆ_i ≡ X_iβˆ +Z_ibˆ_i

= X_iβˆ +Z_iGZ_i⁰V_i⁻¹(y_i−X_iβ)ˆ because

bˆ_i=GZ_i⁰V_i⁻¹(y_i−X_iβ).

Now since

Vi =ZiGZ_i⁰ + Σi

it follows that

V_i−Σ_i =Z_iGZ_i⁰ so that if we make this substitution, we have

Yˆ_i = X_iβˆ+ (V_i−Σ_i)V_i⁻¹(y_i−X_iβ)ˆ

= X_iβˆ−(V_i −Σ_i)V_i⁻¹X_iβˆ+ (V_i−Σ_i)V_i⁻¹y_i

= X_iβˆ−X_iβˆ+ Σ_iV_i⁻¹X_iβˆ + (I_n_i−Σ_iV_i⁻¹)y_i

= Σ_iV_i⁻¹X_iβˆ+ (I_n_i −Σ_iV_i⁻¹)y_i (3.18)

Hence, ˆY_i is a weighted mean of the population averaged profile X_iβˆ and the observed data y_i, with weights ˆΣ_iVˆ_i⁻¹ and I_n_i − Σˆ_iVˆ_i⁻¹ respectively.

Note that X_iβˆ gets much higher weight if the residual variability is large in comparison to the total variability contained in V_i. This phenomenon is called ‘shrinkage’. The observed data are shrunk towards prior average X_iβ. This is also reflected in the fact that for any linear combination λ⁰b_i of random effects

Var(λ⁰ˆb_i)≤Var(λ⁰b_i)

3.10.4 The random-intercepts model revisited

Consider the random intercepts model with Z_i =1_n_i a vector of ones and

D=σ_b²I_n_i,

a diagonal n_i×n_i matrix with only one variance componentσ_b². Also assume absence of serial correlation such that

Σ_i =σ²I_n_i

so that from Eq. (3.17) Empirical Bayes estimate for the random estimate b_i, equals

ˆbi = σ²1⁰_n

i(σ_b²1n_i1⁰_n

i+σ²Ini)⁻¹(yi−Xiβ)

= σ_b² σ²1⁰_n

I_n_i− σ_b²

σ²+n_iσ²_b1_n_i1⁰_n

(y_i−X_iβ)

= n_iσ_b² σ²+niσ²_b

1 ni

j=1

(y_ij −X_i^[j]β)

It is important to take note that ˆb_i is a weighted average of 0 (prior mean) and the average residual for subject i. The less shrinkage the larger n_i and the smallerσ² relative toσ²_b. The equation above shows that the larger n_i is the smaller σ² is relative toσ_b² and the less the shrinkage and vice versa.

3.10.5 The normality assumption for Random Effects

In practice, histograms of Empirical Bayes (EB) estimates are often used to check the normality assumption for the random effects. However, since

bˆ_i = GZ_i⁰W_i(y_i−X_iβ) Var( ˆb_i) = GZ_i

(

W_i−W_iX_i(

i=1

X_i⁰W_iX_i)⁻¹X_i⁰W_i )

Z_iG

One should at least first standardize the EB estimates. Further due to the shrinkage property, the EB estimates do not fully reflect the heterogeneity in the data. Therefore EB estimates obtained under normality cannot be used to check normality. This suggests that the only possibility to check the normality assumption is to fit a more general model, with a classical linear mixed model as a special case and to compare both models using Likelihood ratio methods.

3.10.6 The heterogeneity model

One possible extension of the linear mixed model is to assume a finite mixture as random effects distribution namely :

b_i ∼

j=1

p_jN(µ_j, G) with Pg

j=1p_j = 1 and Pg

i=1p_jµ_j =0.

The interpretation of the above assumption is as follows: The population

consists of g sub-populations. Each sub-population contains a fraction p_j of the total population and in each sub-population, a linear mixed model holds. A very flexible class of parametric models holds for the random effects distribution whilst the classical model is the case where g = 1. The fitting of the above model is based on an EM algorithm for which a SAS macro is available and the EB estimates can be calculated under the heterogeneity model.

3.10.7 Power analyses under the linear mixed model

In any statistical test no matter how simple or complex the test is, the statis- tician is always interested in the power of the test. In this section the F-test for fixed effects is considered. Thus consider the general linear hypothesis:

H₀ :Lβ=0 versus

H_A :Lβ 6=0

Recall that the F test statistic is given by:

F_T =

βˆ⁰L⁰h L(PN

i=1X_i⁰V_i⁻¹( ˆα)X_i)L⁰i Lβˆ rank(L)

The approximate null distribution of F_T is F with the numerator degrees of freedom equal to the rank(L). The denominator degrees of freedom need to be estimated from the data. This can be done so using three possible methods namely, the:

1. Containment method

2. Sattherwaite approximation

3. Kenward and Roger approximation

In general, not necessarily under H₀, F_T is approximately F distributed with the same number of the degrees of freedom but with a non-centrality parameter:

φ =β⁰L⁰

i=1

X_i⁰V_i⁻¹( ˆα)X_i)⁻¹L⁰

# Lβ

which equals 0 under H0. This can be used to calculate powers under a variety of models and under a variety of alternative hypotheses. Note that φ is equal to rank(L) ×FT and with β replaced by ˆβ. The SAS procedure ‘MIXED’ can therefore be used for the calculation of φ and the related numbers of degrees of freedom.

Calculation in SAS

The following is an outline of the steps involved in the calculation of the power of the test.

1. Construct a data set of the same dimension and with the same co- variates and factor values as the design for which the power is to be calculated.

2. Use as responsesy_ithe average valuesX_iβunder the alternative model.

3. The fixed effects estimate will then be equal to β(α) =ˆ

i=1

X_i⁰W_i(α)X_i

!⁻¹ _N X

i=1

X_i⁰W_i(α)y_i

i=1

X_i⁰W_i(α)X_i

!⁻¹ _N X

i=1

X_i⁰W_i(α)X_iβ

= β

4. Hence the F statistic reported by SAS will be equal to _rank(L)^φ

5. This calculated F value and the associated numbers of degrees of freedom can be saved and used afterwards for the calculation of the power 6. Note that this requires keeping the variance components in α fixed,

equal to the assumed population values 7. The steps in the calculations are as follows:

• Use PROC MIXED to calculate φ and the degrees of freedom ν₁ and ν₂

• Calculate the critical value F_c:

P(F_ν₁_,ν₂_,0 > F_c) =level of signif icance

• Calculate the power

power=P(F_ν₁_,ν₂_,φ > F_c)

8. The SAS functions ‘finv’ and ‘probf’ are used to calculate Fc and the power

Using the above procedure it is clear that the within subject correlation will increase the power for inferences on within subject effects but decrease the power for inferences on between subject effects.

Dalam dokumen Analysis of longitudinal binary data : an application to a disease process. (Halaman 79-86)