Bangladesh
ANALYSIS OF ERROR-CONTAMINATED SURVIVAL DATA UNDER THE PROPORTIONAL ODDS MODEL
WENQINGHE
Department of Statistical and Actuarial Sciences University of Western Ontario. London, Ontario, Canada
Email: [email protected]
JUANXIONG
Department of Statistical and Actuarial Sciences University of Western Ontario, London, Ontario, Canada and
and Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada
Email: [email protected]
GRACEY. YI
Department of Statistics and Actuarial Science University of Waterloo, Waterloo, Ontario, Canada
Email: [email protected]
SUMMARY
There has been extensive research interest in analysis of survival data with covariates sub- ject to measurement error. The focus of the most discussions is on the proportional hazards (PH) model, although there are some work concerning the accelerated failure time (AFT) model and the additive hazards (AH) model. Relatively little attention has been directed to studying the impact of measurement error on other models, such as the proportional odds (PO) model. The proportional odds model is an important alternative when PH, AFT or AH models are not appropriate to fit data. In this paper we discuss two inference methods to accommodate measurement error effects under the PO model, in contrast to the naive analysis that ignores the covariate measurement error. Numerical studies are conducted to evaluate the performance of the proposed methods.
Keywords and phrases: Censoring; Likelihood; Measurement error; Regression calibra- tion; Proportional odds model; Survival data.
c Institute of Statistical Research and Training (ISRT), University of Dhaka, Dhaka 1000, Bangladesh.
1 Introduction
Covariate measurement error is a typical feature that is present with survival data. There has been extensive research on addressing covariate measurement error in survival analysis, where the focus is on proportional hazards (PH) models. Prentice (1982) pioneered the so-called regression calibration method to correct for measurement error effects under the rare event assumption. A large body of various inference methods have since then been developed by many authors, including Nakamura (1992), Hu, Tsiatis and Davidian (1998), Zhou and Wang (2000), Huang and Wang (2000), Xie, Wang and Prentice (2001), Song and Huang (2005), Li and Ryan (2004, 2006), and Yi and Lawless (2007), among many others.
Recently, research attention has been extended to alternative models when the proportionality assumption attached to the PH model fails. For instance, Gim´enez, Bolfarine and Colosimo (1999, 2006), Yi and He (2006), and He, Yi and Xiong (2007), among others, considered covariate measure- ment error problems under the accelerated failure time (AFT) models; while with additive hazards (AH) regression Kulich and Lin (2000), Sun, Zhang and Sun (2006), and Sun and Zhou (2008) developed inference methods to accommodate covariate measurement error.
Relatively little attention has been directed to studying the impact of measurement error on other models, such as the proportional odds (PO) model (Bennett 1983). The proportional odds model is an important alternative when PH, AFT or AH models are not appropriate to fit data (Yang and Prentice 1999). In this paper, in contrast to the naive analysis that ignores the covariate measurement error, we discuss two inference methods to accommodate measurement error effects under the PO model.
The remainder is organized as follows. Notation and model formulation are introduced in Sec- tion 2. In Section 3 we describe two inference methods to account for measurement error in co- variates. The performance of the proposed methods is assessed empirically in Section 4. General discussion is included in Section 5.
2 Notation and Model Formulation
For the survival process, letTiandCibe the survival and censoring times for subjecti, respectively, andδi be the censoring indicator variable taking 1 ifTi ≤Ciand 0 otherwise,i= 1,2, ..., n. Let
Yi = min(Ti, Ci)fori = 1, ..., n. LetXibe the vector of covariates subject to possible measure-
ment error, andZi be the vector of covariates free of error. As a common practice, independent censoring is assumed here. That is, conditional on covariates,TiandCiare independent.
2.1 Proportional Odds Model
LetF(t)be the distribution function ofTi. Response variableTiis characterized by the proportional odds model, given by
F(Ti)
1−F(Ti) = F0(Ti)
1−F0(Ti)exp{βTxXi+βTzZi}, (2.1)
whereβ = (βTx,βTz)T is the vector of regression parameters, and F0(t) represents the baseline distribution function that is known up to unknown parameterα, say. IfR0(t) =F0(t)/(1−F0(t)), then model (2.1) becomes
F(Ti) = R0(Ti) exp{βTxXi+βTzZi}
1 +R0(Ti) exp{βTxXi+βTzZi}, (2.2)
yielding, equivalently, the density function
f(Ti) = r0(Ti) exp{βTxXi+βTzZi}
[1 +R0(Ti) exp{βTxXi+βTzZi}]2, (2.3)
wherer0(t) =R00(t) =f0(t)/{1−F0(t)}2withf0(t) =F00(t).
2.2 Measurement Error Model
LetX∗i be an observed version of covariateXi.XiandX∗i are assumed to follow a classical additive measurement error model, a model which is perhaps the most widely used in practice (e.g., Li and Lin 2003; Greene and Cai 2004; Carroll et al. 2006). That is, conditional on covariatesXiandZi, responseTiand censoring timeCi,
X∗i =Xi+ei, (2.4)
whereeifollows, independent of other variables, a normal distribution with mean0and covariance matrixΣe= [σjk]. To emphasize estimation on the response parameters which are of prime interest, here we treat the parameters inΣeas known or estimated from an independent sample. Discussion on this treatment is given in Section 5.
2.3 Covariate Process
Assume thatXiis modeled by the linear mixed model
Xi=γZi+Siui, (2.5)
whereγis an array of regression parameters,Siis an array of covariates, andui= (ui1, ..., uir)Tis a vector of random effects that are independent of the errorsei. The model is flexible to cover a broad class of random effects models. For instance, withγset to be zero, settingSi= (1, t)Tleads to the linear growth-curve model that is often discussed in the literature (e.g., Wulfsohn and Tsiatis 1997;
Tseng, Heieh and Wang 2005), usually together with a multivariate normal distributional assumption Nr(µu,Σu)for random effectsui. Other specifications ofSican describe more complex nonlinear form ofXi.
3 Inference Methods
3.1 Regression Calibration
Letη = (βT,αT,γT)Tbe the vector containing all the related parameters, whereβis the vector of regression parameters,αis the vector of parameters characterizing the baseline distribution function
F0(t), andγis comprised of the parameters associated with the random effect model in the covariate process. The likelihood function ofηcontributed from subjectiis then given by
Li(η) ={f(Yi|Xi,Zi)}δi{1−F(Yi|Xi,Zi)}1−δi, (3.1)
and the likelihood function forηis
L(η) =
n
Y
i=1
Li(η). (3.2)
If covariatesXiare free of measurement error, inference about theηparameter can be carried out by the maximum likelihood method. That is, maximizingL(η)with respect toηleads to the maximum likelihood estimator ofη.
However, whenXiis subject to measurement error, the observed surrogateX∗i may differ from Xi, and replacingXiwithX∗i in the likelihood function to perform inference about the parameter ηmay lead to biased results. One strategy to correct the resulting bias is to employ the so-called regression calibrationmethod (Prentice 1982, Carroll et al. 2006). That is, we first replaceXiwith its conditional expectationE(Xi|X∗i,Zi)in the likelihood functionL(η), and then maximize the resulting function with respect toη. The bootstrap method can be used to determine the standard error of the resulting estimator ofη.
To implement the regression calibration method, we need to determineE(Xi|X∗i,Zi), which is a direct result from the availability of the conditional distributionf(Xi|X∗i,Zi). Given the model setup described in the previous section, the condition distribution is given by
f(Xi|X∗i,Zi) = f(X∗i|Xi,Zi)f(Xi|Zi) Rf(X∗i|Xi,Zi)f(Xi|Zi)dXi,
wheref(X∗i|Xi,Zi)is determined by (2.4) and (2.5), andf(Xi|Zi) is determined by (2.5). In particular, whenui ∼N(µu,Σu), we obtain thatXi|Zi ∼N(Siµu+γZi,SiΣuSTi). By the as- sumption of additive measurement error model thatX∗i|(Xi,Zi)∼N(Xi,Σe), the conditional dis- tribution ofXigiven(X∗i,Zi)is a normal distribution with meanSiµu+γZi+SiΣuSTi(SiΣuSTi+ Σe)−1(X∗i −Siµu−γZi).
3.2 Observed Likelihood
Under the layout of these models, one may carry out the likelihood-based analysis to perform esti- mation of the parameters. The likelihood function ofηcontributed from subjectiis given by
Li(η) ={f(Yi|X∗i,Zi)}δi{1−F(Yi|X∗i,Zi)}1−δi, where the probability density function is determined by
f(ti|X∗i,Zi) = Z
f(ti|Xi,Zi)f(Xi|X∗i,Zi)dXi (3.3)
and the observed cumulative distribution function is determined accordingly by F(ti|X∗i,Zi) =
Z ti
0
f(v|X∗i,Zi)dv
= Z ti
0
Z
f(v|Xi,Zi)f(Xi|X∗i,Zi)dXidv
= Z (
f(Xi|X∗i,Zi)· Z ti
0
f(v|Xi,Zi)dv )
dXi
= Z
f(Xi|X∗i,Zi)F(ti|Xi,Zi)dXi with the functionF(.)being determined by (2.2).
In the formulation here we assume nondifferential measurement error mechanism, i.e.,f(ti|X∗i,Xi,Zi) =
f(ti|Xi,Zi). Consequently, the observed likelihood function is
L∗(η) =
n
Y
i=1
L∗i(η), (3.4)
where L∗i(η) =
(Z
f(ti|Xi,Zi)f(Xi|X∗i,Zi)dXi
)δi(
1−
Z
f(Xi|X∗i,Zi)F(ti|Xi,Zi)dXi
)1−δi
. Inference aboutη can be performed based on the observed likelihood functionL∗(η). Max- imizingL∗(η)gives the maximum likelihood estimator ofη, denoted byη. Under suitable con-b ditions, √
n(ηb −η) has an asymptotic normal distribution with mean 0and covariance matrix hE{(∂`∗i(η)/∂ηT)(∂`∗i(η)/∂η)}i−1
, where`∗i(η) = logL∗i(η). Some details are included in the appendix.
Directly maximizingL∗(η)however is not possible, since (3.4) does not have a closed form due to the involvement of nonlinear functions in the integrations. A remedy to this complication is to invoke numerical approximations such as the Monte Carlo algorithm. To be specific, let`∗(η) = Pn
i=1`∗i(η). Note that
`∗i(η) = logL∗i(η)
= δilog
(Z
f(ti|Xi,Zi)f(Xi|X∗i,Zi)dXi
)
+(1−δi) log (
1−
Z
f(Xi|X∗i,Zi)F(ti|Xi,Zi)dXi )
.
Specify a large integerN. We generatex(1)i , ...,x(Ni )from the distributionf(Xi|X∗i,Zi)for i= 1, ..., n. Define
`e∗i(η) =δilog (1
N
N
X
j=1
f(ti|x(j)i ,Zi) )
+ (1−δi) log (
1− 1
N
N
X
j=1
F(ti|x(j)i ,Zi) )
,
ande`∗(η) =Pn
i=1`e∗i(η). Then we use`e∗(η)to approximate`∗(η)and maximizee`∗(η)to obtain an estimator ofη. An estimate of the covariance matrix ofbηis given byh
Pn
i=1(∂`e∗i/∂ηT)(∂e`∗i/∂η)i−1
.
4 Empirical Studies
In this section we conduct simulation studies to assess the performance of the proposed approaches, as opposed to the naive analysis which ignores covariate measurement error.
4.1 Design of Simulation
We consider a single covariateXifor simplicity. Sample size is set to be 200, and 500 simulations are run for each of the parameter configurations. For each subjecti, the covariateZiis generated from a binomial distributionBin(1,0.5), representing each subject is randomly assigned to either treatment or control arms, for instance. The covariateXiis generated from the linear mixed model
Xi =ui0+ui1Zi,
where we setui= (ui0, ui1)Tto follow a bivariate normal distribution with meanµ= (4.173,−0.0103) and covariance matrix
Σ=
1.24 −0.0114
−0.0114 0.003
,
to mimic an HIV clinical trial as described in Tsiatis and Davidian (2004). SurrogateXi∗is generated from the error modelXi∗=Xi+eiwithei∼N(0, σ2).
The survival times are generated based on the proportional odds model F(Ti)
1−F(Ti)= F0(Ti)
1−F0(Ti)exp(βxXi+βzZi), i= 1,2, ..., n
whereF0(t)is the baseline distribution function. We particularly consider two parametric modeling forF0(t)by using (1) a log-logistic distribution and (2) a Weibull distribution.
To be specific, in the first case, we setF0(t)to be a log-logistic distribution F0(t) = 1− {1 + (α1t)α2}−1
withα1= 5andα2= 2. That is, for theith subject, we first generate a random variatevi∼U[0,1],
and calculate the survival timeTiby
Ti=h vi
αα12(1−vi)exp{−(βxXi+βzZi)}i1/α2
.
In the second case, we setF0(t)to be a Weibull distribution
F0(t) = 1−exp{−(α1t)α2}
withα1= 1andα2= 1.5. The survival times can be generated similarly like the log-logistic case:
Ti= 1 α1
n
logh vi
1−vi
exp{−(βxXi+βzZi)}+ 1io1/α2
, wherevi ∼U[0,1].
The censoring times are generated independently from an exponential distribution with a fixed mean to produce roughly 20% and 40% censoring rates. We assess the performance of the proposed methods under a variety of situations. Different configurations for the magnitudeσ2of measurement error are considered, withσ2= 0,0.25, and0.6to feature increasing degrees of measurement error in covariateXi. The numberN of the Monte Carlo simulations is chosen as 5000.
4.2 Simulation Results
We compare the performance of the three methods: (1) the naive method with the true covariate Xidirectly replaced by its surrogateXi∗, (2) the regression calibration approach which replaces the true covariateXi with its conditional expectationE(Xi|Xi∗, Zi), and (3) the observed likelihood method. Specifically, we report on the results of the bias of the estimates (Bias), the empirical standard error (SEE), the model based standard error (SEM), and the coverage rate (CR) for95%
confidence intervals of the parameters. In particular, for the case that the baseline distribution is modeled by a log-logistic distribution, Tables 1.1 and 1.2 report the results for theβparameters by varying the magnitude of the covariate effects, and Tables 1.3 and 1.4 report the results for theα parameter related to the baseline distribution. Similarly, Tables 2.1 to 2.4 display the results for the case that the baseline distribution is set as a Weibull distribution.
The impact of ignoring measurement error depends on the magnitude of error as well as the covariate effects, and this is evident from the results obtained from the naive analysis. If the true error-prone covariate effect is not zero, it is seen that as the magnitude of the measurement error becomes severe, the bias of the estimate for the error prone covariate effect increases while the associated standard error tends to get smaller; consequently, the corresponding coverage rate for the 95% confidence intervals becomes farther off the nominal level 95%. When the covariate effect of the error prone covariate is zero, then ignoring measurement error does not appear to have impact on estimation. The point estimate, standard error and the coverage rate all seem to be satisfactory, and they are not shown to be obviously affected by varying degrees of measurement error. The impact on estimation of the error-free covariate effect is not apparent. The finite sample bias and the associated standard error for the estimate ofβzare fairly reasonable and stable as the degree in error changes, and the coverage rate for the confidence intervals agrees well with the nominal level 95%.
On the other hand, it is interesting to see that ignoring covariate measurement error has an effect on estimation of the parameters of the baseline distribution function. If the true error-prone covariate effect exists, then the finite sample bias tends to increase as the magnitude of error increases, whereas the standard error seems to be relatively stable to the change of the error degree. The joint impact is also evident from the discrepancy between the coverage rate of the 95% confidence intervals and the nominal level. However, when the true error-prone covariate effect does not exist, the naive analysis does not appear to influence estimation of theαparameter. When the censoring percentage and the
baseline distribution change, we observe the similar impact of measurement error on estimation of the parameters.
In implementing the regression calibration approach, we employ the bootstrap method with 1000 runs for each setting to calculate the model-based standard error (SEM). As the error effect is par- tially adjusted through the replacement of the error contaminated covariate by its expected value given the observed covariate, the performance of the regression calibration method on estimation of theβparameters is greatly improved in contrast to that of the naive method. In the case thatβxis not zero, the regression calibration produces much smaller finite sample biases for the estimates of both theβxandβzparameters than the naive method does; ifβxis zero, the estimates from both methods are quite comparable. The standard errors obtained from the regression calibration are greater than those obtained from the naive analysis, but the corresponding coverage rates are reasonably close to the nominal value, regardless of the value of βx. The effect of the censoring percentage and the magnitude of measurement error on the performance of the regression calibration seems quite similar to that on the naive analysis. Although it is observed that the regression calibration method can correct or partially correct for the bias induced by measurement error when estimating theβ parameters, it is interesting to see that it does not necessarily do this for estimation of the parameters of the baseline distribution. See the settings withσ2= 0.6in Table 1.3, for example.
Finally, we examine the performance of the observed likelihood approach which fully accom- modates the measurement error effects in inferential procedures. In actual implementation, 5000 simulations are run for each configuration when using the Monte Carlo integration algorithm. It is seen that the performance of this method is satisfactory in all the settings. This method further im- proves the results obtained from the regression calibration method. Unlike the regression calibration approach, the observed likelihood method gives good estimates for the parameters of the baseline distribution functions. The observed likelihood approach produces reasonably small finite sample biases for both theβandαparameters and good coverage rates for the 95% confidence intervals.
The standard errors resulted from the observed likelihood and the regression calibration methods are fairly close in most settings. The censoring percentage and the degree of measurement error have the similar effects on the performance of the observed likelihood method to those for the regression calibration approach.
In summary, the naive analysis with covariate measurement error ignored would often lead to biased results, whereas the regression calibration approach improves the results a lot, and its per- formance is reasonably satisfactory for many settings but not all. The observed likelihood method behaves the best and outperforms the other two methods. It can not only adjust for the measurement error effects on estimation of the covariate effects, but also produce reasonable estimates for the parameters of the baseline distribution function.
5 Discussion
The impact of covariate measurement error is well documented for survival models such as propor- tional hazards, accelerated failure time and additive hazards models. Various methods have been developed to correct for effects induced by mismeasured covariates. In this paper, we consider an
alternative, the proportional odds model, which is useful for survival analysis. Our numerical stud- ies demonstrate that the naive analysis with covariate error ignored would lead to biased results. To correct for error effects, we describe two methods: the regression calibration and the observed like- lihood methods. Empirical studies demonstrate satisfactory performance of the observed likelihood method. They also suggest that the regression calibration approach dramatically outperforms the naive method, and the performance of the regression calibration is quite comparable to the observed likelihood in most of the settings we consider.
Covariate measurement error is a common feature in many applications. In specific applications, often it is known that some covariates are subject to measurement error, but there is a lack of addi- tional data sources, such as a validation subsample or replicated measurements for those covariates, that can be used feasibly to facilitate the estimation of the measurement error parameters. Under such a circumstance, a viable strategy is to conduct sensitivity analyses to evaluate how sensitive the estimates of the response parameters are affected by different degrees of measurement error. Typi- cally, one may choose a set of given values of the parameters for the measurement error model, and repeatedly conduct estimation on the response parameters to see what patterns may be unfolded.
This strategy can enhance our understanding of the measurement error effects on individual data analysis.
Acknowledgement
This research was supported by the Natural Sciences and Engineering Research Council of Canada.
References
[1] Bennett, S. (1983). Log-logistic regression models for survival data. Applied Statistics,32, 165-171.
[2] Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. M. (2006).Measurement Error in Nonlinear Models: A Modern Perspective, Second Edition. Chapman and Hall CRC Press.
[3] Gim´enez, P., Bolfarine, H. and Colosimo, E. A. (1999). Estimation in Weibull regression model with measurement error.Communications in Statistics - Theory and Method,28, 495-510.
[4] Gim´enez, P., Bolfarine, H. and Colosimo, E. A. (2006). Asymptotic relative efficiency of score tests in Weibull models with measurement errors.Statistical Papers,47, 461-470.
[5] Greene, W. F. and Cai, J. (2004). Measurement error in covariates in the marginal hazards model for multivariate failure time data.Biometrics,60, 987-996.
[6] He, W., Yi, G. Y. and Xiong, J. (2007). Accelerated failure time models with covariates subject to measurement error.Statistics in Medicine,26, 4817-4832.
[7] Hu, P., Tsiatis, A. A. and Davidian, M. (1998). Estimating the parameters in the Cox model when covariate variables are measured with error.Biometrics,54, 1407-1419.
[8] Huang, Y. and Wang, C. Y. (2000). Cox regression with accurate covariates unascertainable: A nonparametric correction approach.Journal of the American Statistical Association,95, 1209- 1219.
[9] Kulich, M. and Lin, D. Y. (2000). Additive hazards regression with covariate measurement error.Journal of the American Statistical Association,95, 238-248.
[10] Lehmann, E. L. (1983).Theory of Point Estimation. New York: John Wiley.
[11] Li, Y. and Lin, X. (2003). Functional inference in frailty measurement error models for clus- tered survival data using the SIMEX approach.Journal of the American Statistical Association, 98, 191-203.
[12] Li, Y. and Ryan, L. (2004). Survival analysis with heterogeneous covariate measurement Error.
Journal of the American Statistical Association,99, 724-735.
[13] Li, Y. and Ryan, L. (2006). Inference on survival data with covariate measurement error - A imputation-based approach.The Scandinavian Journal of Statistics,33, 169-190.
[14] Nakamura, T. (1992). Proportional hazards model with covariates subject to measurement er- ror.Biometrics,48, 829-838.
[15] Prentice, R. L. (1982). Covariate measurement errors and parameter estimation in a failure time regression model.Biometrika,69, 331-342.
[16] Song, X. and Huang, Y. (2005). On corrected score approach for proportional hazards model with covariate measurement error.Biometrics,61, 702-714.
[17] Sun, L. Zhang, Z. and Sun, J. (2006). Additive hazards regression of failure time data with covariate measurement errors.Statistica Neerlandica,60, 497-509.
[18] Sun, L. and Zhou, X. (2008). Inference in the additive risk model with time-varying covariates subject to measurement errors.Statistics and Probability Letters,78, 2559-2566.
[19] Tseng, Y. K., Hsieh, F. and Wang, J. L. (2005). Joint modeling of accelerated failure time and longitudinal data.Biometrika,92, 587-603.
[20] Tsiatis, A. A. and Davidian, M. (2004). Joint modeling of longitudinal and time-to-even data:
An overview.Statistica Sinica,14, 809-834.
[21] Wulfsohn, M. S. and Tsiatis, A. A. (1997). A joint model for survival and longitudinal data measured with error.Biometrics,53, 330-339.
[22] Xie, S. H., Wang, C. Y. and Prentice, R. L. (2001). A risk set calibration method for failure time regression by using a covariate reliability sample.Journal of the Royal Statistical Society B,63, 855-870.
[23] Yang, S. and Prentice, R. L. (1999). Semiparametric inference in the proportional odds model.
Journal of the American Statistical Association,94, 125-136.
[24] Yi, G. Y. and He, W. (2006). Methods for bivariate survival data with mismeasured covariates under an accelerated failure time model.Communications in Statistics - Theory and Methods, 35, 1539-1554.
[25] Yi, G. Y. and Lawless, J. F. (2007). A corrected likelihood method for the proportional haz- ards model with covariates subject to measurement error.Journal of Statistical Planning and Inference,137, 1816-1828.
[26] Zhou, H. and Wang C. Y. (2000). Failure time regression with continuous covariates measured with error.Journal of the Royal Statistical Society B,62, 657-665.
Table1.1:SimulationResultsontheRegressionCoefficientsObtainedfromtheParametricApproach.TheTrueBaselineDistributionFollowsALog-logisticDistribution,andtheError-ProneCovariatehasAnEffectwithβx=−1.
CensoringMeasurementNaiveMethodRegressionCalibrationLikelihoodMethod
RateErrorParameterBias aSEMSEECRBiasSEMSEECRBiasSEMSEECR 20% σ 2=0 βx=−1-0.0120.1340.1330.952-0.0120.1390.1330.962-0.0120.1340.1330.952
βz=0.50.0160.2620.2590.9500.0160.2700.2590.9560.0160.2620.2590.950 σ2=0.25 βx=−10.1870.1190.1210.6140.0210.1480.1450.948-0.0170.1600.1610.954
βz=0.50.0190.2620.2680.9520.0180.2710.2700.9580.0370.2730.2750.952 σ2=0.6 βx=−10.3630.1040.1050.0860.0510.1610.1570.946-0.0230.1920.1940.964
βz=0.50.0040.2610.2710.9440.0020.2710.2700.9500.0400.2830.2860.946 40% σ 2=0 βx=−1-0.0200.1470.1460.948-0.0200.1540.1450.962-0.0200.1470.1460.948
βz=0.50.0170.2850.2780.9560.0160.2950.2780.9580.0170.2850.2780.956 σ 2=0.25 βx=−10.1870.1300.1320.6680.0210.1620.1570.950-0.0170.1750.1760.954
βz=0.50.0090.2840.2840.9580.0050.2950.2860.9620.0260.2960.2950.956 σ2=0.6 βx=−10.3620.1140.1150.1300.0500.1750.1700.940-0.0280.2110.2110.966
βz=0.5-0.0060.2830.2850.960-0.0080.2940.2860.9620.0320.3070.3050.964
aBias:biasoftheestimate;SEM:model-basedstandarderror;SEE:empiricalstandarderror;CR:95%coveragerate.TheSEMfromregressioncalibrationapproachiscalculatedbythebootstrapmethodwithB=1000.ThelikelihoodmethodusesMonteCarlointegrationalgorithmwithN=5000.
Table1.2:SimulationResultsontheRegressionCoefficientsObtainedfromtheParametricApproach.TheTrueBaselineDistribution FollowsALog-logisticDistribution,andtheError-ProneCovariatehasNoEffectwithβx=0. CensoringMeasurementNaiveMethodRegressionCalibrationLikelihoodMethod RateErrorParameterBiasa SEMSEECRBiasSEMSEECRBiasSEMSEECR 20%
σ2 =0βx=00.0030.1170.1190.9500.0030.1220.1190.9560.0030.1170.1190.950 βz=0.50.0160.2610.2590.9460.0160.2690.2590.9520.0160.2610.2590.946 σ2=0.25βx=00.0020.1070.1090.9520.0040.1330.1320.9460.0030.1290.1300.954 βz=0.50.0330.2620.2600.9580.0310.2700.2630.9640.0260.2620.2600.958 σ2=0.6βx=00.0030.0960.0980.9480.0060.1470.1470.9520.0050.1440.1480.944 βz=0.50.0330.2620.2630.9560.0330.2700.2620.9640.0360.2620.2630.956 40%
σ2 =0βx=00.0000.1270.1300.9480.0000.1330.1300.9560.0000.1270.1300.948 βz=0.50.0140.2830.2700.9620.0140.2930.2700.9660.0140.2830.2700.962 σ2 =0.25βx=00.0010.1150.1160.9500.0040.1440.1390.9580.0040.1390.1400.948 βz=0.50.0250.2830.2880.9480.0320.2930.2850.9640.0370.2830.2870.946 σ2=0.6βx=00.0020.1040.1040.9500.0020.1610.1560.9560.0040.1550.1570.952 βz=0.50.0300.2830.2860.9480.0320.2930.2840.9660.0320.2830.2850.948 aBias:biasoftheestimate;SEM:model-basedstandarderror;SEE:empiricalstandarderror;CR:95%coveragerate.TheSEMfromregressioncalibrationapproach iscalculatedbythebootstrapmethodwithB=1000.ThelikelihoodmethodusesMonteCarlointegrationalgorithmwithN=5000.
Table1.3:SimulationResultsonTheBaselineFunctionParametersObtainedfromTheParametricApproach.TheTrueBaselineDistributionFollowsALog-logisticDistribution.TheTrueValuesareβx=−1,βz=0.5,α1=5andα2=2.
CensoringMeasurementNaiveMethodRegressionCalibrationLikelihoodMethod
RateErrorParameterBias aSEMSEECRBiasSEMSEECRBiasSEMSEECR 20% σ 2=0 log(α1)-0.0040.2600.2540.954-0.0040.2660.2540.952-0.0040.2600.2540.954
log(α2)0.0100.0650.0620.9720.0100.0660.0620.9640.0100.0650.0620.972 σ2=0.25 log(α1)-0.3620.2460.2540.666-0.0110.2960.3000.944-0.0120.2910.2980.940
log(α2)-0.0210.0650.0660.938-0.0210.0650.0660.9320.0170.0690.0710.938 σ2=0.6 log(α1)-0.6950.2320.2380.144-0.0100.3370.3380.944-0.0110.3300.3390.946
log(α2)-0.0530.0650.0660.874-0.0520.0640.0660.8700.0190.0760.0770.942 40% σ 2=0 log(α1)0.0070.2810.2750.9520.0080.2900.2750.9520.0080.2810.2750.952
log(α2)0.0120.0740.0740.9520.0120.0750.0740.9520.0110.0740.0740.952 σ 2=0.25 log(α1)-0.3610.2660.2730.706-0.0100.3200.3200.940-0.0130.3140.3190.936
log(α2)-0.0190.0740.0760.932-0.0190.0740.0760.9380.0190.0790.0810.940 σ2=0.6 log(α1)-0.6890.2490.2560.204-0.0060.3630.3610.944-0.0050.3560.3610.940
log(α2)-0.0510.0740.0740.904-0.0500.0740.0750.9080.0220.0850.0870.946
aBias:biasoftheestimate;SEM:model-basedstandarderror;SEE:empiricalstandarderror;CR:95%coveragerate.TheSEMfromregressioncalibrationapproachiscalculatedbythebootstrapmethodwithB=1000.ThelikelihoodmethodusesMonteCarlointegrationalgorithmwithN=5000.
Table1.4:SimulationResultsonTheBaselineFunctionParametersObtainedfromTheParametricApproach.TheTrueBaseline DistributionFollowsALog-logisticDistribution.TheTrueValuesareβx=0,βz=0.5,α1=5andα2=2. CensoringMeasurementNaiveMethodRegressionCalibrationLikelihoodMethod RateErrorParameterBiasa SEMSEECRBiasSEMSEECRBiasSEMSEECR 20%
σ2 =0log(α1)-0.0080.2600.2560.946-0.0080.2670.2560.952-0.0080.2600.2560.946 log(α2)0.0110.0650.0640.9600.0110.0660.0640.9600.0110.0650.0640.960 σ2=0.25log(α1)-0.0090.2380.2370.946-0.0130.2850.2850.942-0.0090.2800.2780.950 log(α2)0.0150.0650.0660.9320.0160.0660.0650.9340.0160.0650.0660.938 σ2=0.6log(α1)-0.0100.2180.2200.946-0.0190.3140.3130.952-0.0160.3080.3110.952 log(α2)0.0150.0650.0660.9380.0140.0660.0660.9340.0180.0650.0670.934 40%
σ2 =0log(α1)-0.0050.2820.2810.944-0.0050.2910.2810.942-0.0050.2820.2810.944 log(α2)0.0100.0740.0730.9580.0100.0750.0730.9520.0100.0740.0730.958 σ2 =0.25log(α1)-0.0040.2580.2570.952-0.0140.3110.3040.954-0.0120.3040.3030.948 log(α2)0.0160.0740.0760.9340.0150.0750.0760.9360.0150.0740.0750.942 σ2=0.6log(α1)-0.0060.2370.2340.950-0.0090.3420.3360.944-0.0130.3350.3350.940 log(α2)0.0150.0740.0760.9380.0150.0750.0750.9360.0160.0740.0770.940 aBias:biasoftheestimate;SEM:model-basedstandarderror;SEE:empiricalstandarderror;CR:95%coveragerate.TheSEMfromregressioncalibrationapproach iscalculatedbythebootstrapmethodwithB=1000.ThelikelihoodmethodusesMonteCarlointegrationalgorithmwithN=5000.
Table2.1:SimulationResultsontheRegressionCoefficientsObtainedfromtheParametricApproach.TheTrueBaselineDistributionFollowsAWeibullDistribution,andtheError-ProneCovariatehasAnEffectwithβx=−1.
CensoringMeasurementNaiveMethodRegressionCalibrationLikelihoodMethod
RateErrorParameterBias aSEMSEECRBiasSEMSEECRBiasSEMSEECR 20% σ 2=0 βx=−1-0.0040.1360.1360.944-0.0040.1420.1360.946-0.0040.1360.1360.944
βz=0.5-0.0140.2690.2590.952-0.0140.2770.2590.958-0.0150.2690.2590.952 σ2=0.25 βx=−10.1790.1210.1270.6600.0280.1500.1480.942-0.0180.1620.1650.950
βz=0.5-0.0370.2680.2730.938-0.0180.2780.2750.944-0.0060.2800.2850.936 σ2=0.6 βx=−10.3500.1060.1110.1300.0660.1610.1580.930-0.0210.1930.1980.948
βz=0.5-0.0600.2670.2680.942-0.0290.2790.2750.948-0.0020.2910.2900.948 40% σ 2=0 βx=−10.0010.1520.1470.9560.0010.1610.1470.9620.0000.1520.1470.956
βz=0.5-0.0190.2990.2820.972-0.0190.3110.2820.974-0.0190.2990.2820.972 σ 2=0.25 βx=−10.1730.1350.1430.7100.0240.1700.1650.952-0.0240.1800.1880.934
βz=0.5-0.0350.2980.3020.940-0.0160.3130.3040.950-0.0080.3110.3130.946 σ2=0.6 βx=−10.3450.1170.1260.1900.0580.1820.1790.930-0.0270.2150.2300.934
βz=0.5-0.0660.2960.3000.938-0.0310.3130.3040.9560.0030.3230.3300.946
aBias:biasoftheestimate;SEM:model-basedstandarderror;SEE:empiricalstandarderror;CR:95%coveragerate.TheSEMfromregressioncalibrationapproachiscalculatedbythebootstrapmethodwithB=1000.ThelikelihoodmethodusesMonteCarlointegrationalgorithmwithN=5000.
Table2.2:SimulationResultsontheRegressionCoefficientsObtainedfromtheParametricApproach.TheTrueBaselineDistribution FollowsAWeibullDistribution,andtheError-ProneCovariatehasNoEffectwithβx=0. CensoringMeasurementNaiveMethodRegressionCalibrationLikelihoodMethod RateErrorParameterBiasa SEMSEECRBiasSEMSEECRBiasSEMSEECR 20%
σ2 =0βx=00.0020.0980.0990.9540.0020.1010.0990.9620.0020.0980.0990.954 βz=0.5-0.0160.2600.2540.948-0.0160.2680.2540.960-0.0160.2600.2540.948 σ2=0.25βx=00.0010.0920.0950.9380.0010.1090.1090.9340.0020.1050.1080.936 βz=0.5-0.0150.2600.2670.956-0.0200.2680.2670.950-0.0190.2600.2640.956 σ2=0.6βx=00.0010.0850.0890.9380.0010.1180.1170.9400.0010.1130.1170.940 βz=0.5-0.0150.2590.2670.954-0.0170.2680.2660.954-0.0110.2600.2670.958 40%
σ2 =0βx=0-0.0020.1080.1090.948-0.0020.1120.1090.954-0.0020.1080.1090.948 βz=0.5-0.0230.2800.2730.958-0.0230.2900.2730.964-0.0230.2800.2730.958 σ2 =0.25βx=00.0000.1010.1000.9540.0010.1210.1150.9580.0010.1150.1150.960 βz=0.5-0.0150.2800.2870.954-0.0140.2900.2870.954-0.0130.2810.2910.948 σ2=0.6βx=00.0010.0930.0940.9400.0010.1310.1240.9540.0010.1240.1240.952 βz=0.5-0.0140.2800.2860.954-0.0130.2910.2860.958-0.0130.2810.2870.950 aBias:biasoftheestimate;SEM:model-basedstandarderror;SEE:empiricalstandarderror;CR:95%coveragerate.TheSEMfromregressioncalibrationapproach iscalculatedbythebootstrapmethodwithB=1000.ThelikelihoodmethodusesMonteCarlointegrationalgorithmwithN=5000.
Table2.3:SimulationResultsonTheBaselineFunctionParametersObtainedfromTheParametricApproach.TheTrueBaselineDistributionFollowsAWeibullDistribution.TheTrueValuesareβx=−1,βz=0.5,α1=1andα2=1.5.
CensoringMeasurementNaiveMethodRegressionCalibrationLikelihoodMethod
RateErrorParameterBias aSEMSEECRBiasSEMSEECRBiasSEMSEECR 20% σ 2=0 log(α1)0.0040.1950.1930.9400.0040.2010.1930.9420.0050.1950.1930.940
log(α2)0.0090.1140.1130.9380.0090.1180.1130.9480.0090.1140.1140.938 σ2=0.25 log(α1)-0.2330.1640.1700.630-0.0150.2180.2180.9440.0160.2230.2270.942
log(α2)0.1460.1140.1160.7600.0090.1290.1270.9420.0070.1250.1260.940 σ2=0.6 log(α1)-0.4340.1330.1430.164-0.0390.2410.2410.9300.0190.2570.2640.942
log(α2)0.2800.1090.1160.2860.0120.1440.1420.9440.0100.1400.1430.940 40% σ 2=0 log(α1)0.0000.2180.2130.9460.0000.2270.2130.9500.0010.2180.2130.946
log(α2)0.0140.1290.1260.9560.0140.1340.1260.9580.0130.1290.1260.956 σ 2=0.25 log(α1)-0.2280.1810.1950.686-0.0080.2470.2450.9340.0240.2480.2610.942
log(α2)0.1470.1260.1330.7800.0080.1460.1420.9460.0070.1400.1440.940 σ2=0.6 log(α1)-0.4310.1460.1600.222-0.0250.2730.2760.9260.0230.2860.3040.922
log(α2)0.2820.1200.1290.3620.0070.1620.1610.9420.0130.1560.1610.936
aBias:biasoftheestimate;SEM:model-basedstandarderror;SEE:empiricalstandarderror;CR:95%coveragerate.TheSEMfromregressioncalibrationapproachiscalculatedbythebootstrapmethodwithB=1000.ThelikelihoodmethodusesMonteCarlointegrationalgorithmwithN=5000.
Table2.4:SimulationResultsonTheBaselineFunctionParametersObtainedfromTheParametricApproach.TheTrueBaseline DistributionFollowsAWeibullDistribution.TheTrueValuesareβx=0,βz=0.5,α1=1andα2=1.5. CensoringMeasurementNaiveMethodRegressionCalibrationLikelihoodMethod RateErrorParameterBiasa SEMSEECRBiasSEMSEECRBiasSEMSEECR 20%
σ2 =0log(α1)0.0100.1760.1750.9480.0100.1830.1750.9540.0100.1760.1750.948 log(α2)0.0040.1000.0960.9620.0040.1010.0960.9620.0040.1000.0960.962 σ2=0.25log(α1)0.0100.1660.1720.9500.0130.1950.1930.9460.0100.1860.1930.952 log(α2)0.0110.0960.0970.9480.0080.1070.1050.9460.0100.1040.1050.946 σ2=0.6log(α1)0.0100.1550.1640.9420.0130.2130.2080.9500.0120.1980.2070.948 log(α2)0.0110.0930.0930.9500.0090.1130.1110.9460.0100.1080.1110.948 40%
σ2 =0log(α1)0.0190.1990.2000.9580.0190.2070.2000.9660.0190.1990.2000.958 log(α2)0.0020.1050.1040.9480.0020.1070.1040.9520.0020.1050.1040.948 σ2 =0.25log(α1)0.0090.1870.1880.9520.0090.2200.2110.9540.0080.2090.2110.956 log(α2)0.0100.1020.0990.9540.0100.1120.1060.9540.0110.1090.1060.954 σ2=0.6log(α1)0.0080.1740.1800.9480.0100.2420.2250.9640.0110.2220.2250.960 log(α2)0.0110.0980.0960.9600.0100.1190.1110.9600.0100.1140.1100.960 aBias:biasoftheestimate;SEM:model-basedstandarderror;SEE:empiricalstandarderror;CR:95%coveragerate.TheSEMfromregressioncalibrationapproach iscalculatedbythebootstrapmethodwithB=1000.ThelikelihoodmethodusesMonteCarlointegrationalgorithmwithN=5000.
A Appendix
To establish the asymptotic distribution, we need the following conditions, including regularity con- ditions in Lehmann (1993):
• The measurement errors are independent and identically distributed.
• Given covariates, the survival process is independent of censoring process.
• Let Ωbe the parameter space for parameterη. The true parameter value η0 belongs to a compact set withinΩ.
• There exists an open subset ω of parameter spaceΩthat contains the true parameter value η0such that almost all observations with log likelihood`∗(η)have all third derivatives for η∈ω. There exist a functionMwhich may depend onη0such that
∂3
∂η3`∗(η)
≤M for all η∈ω
andEη0
M
<∞.
• The log likelihood`∗(η)can be differentiated twice under the integral sign.
• The covariance matrixI(η)is positive definite for allη∈ω.
I(η) =Eh∂`∗(η)
∂ηT
∂`∗(η)
∂η i
=Eh
−∂2`∗(η)
∂ηT∂η i
.
Following the arguments in Lehman (1983, p.409), it can be shown that the maximum likelihood estimatorbηexists and is unique with probability 1. In addition, √
n(ηb−η0)has an asymptotic normal distribution with mean0and covariance matrix[I(η0)]−1.