A marginalized two-part joint model for a longitudinal biomarker and a terminal event with
application to advanced head and neck cancers
Item Type Article
Authors Rustand, Denis; Briollais, Laurent; Rondeau, Virginie
Citation Rustand, D., Briollais, L., & Rondeau, V. (2023). A marginalized two-part joint model for a longitudinal biomarker and a terminal event with application to advanced head and neck cancers.
Pharmaceutical Statistics. Portico. https://doi.org/10.1002/
pst.2338
Eprint version Publisher's Version/PDF
DOI 10.1002/pst.2338
Publisher Wiley
Journal Pharmaceutical Statistics
Rights Archived with thanks to Pharmaceutical Statistics under a Creative Commons license, details at: http://
creativecommons.org/licenses/by/4.0/
Download date 23/09/2023 06:23:07
Item License http://creativecommons.org/licenses/by/4.0/
Link to Item http://hdl.handle.net/10754/694566
Supplementary Material for “A marginalized two-part joint model for a longitudinal biomarker and a terminal event with application to advanced head and neck cancers”
Details on the likelihood of the model
The full likelihood of the model can be expressed as Li(·) =
Z
ai
Z
bi
LAi(·)LBi(·)LSi(·)p(ai,bi)dbidai
WhereLAi(·),LBi(·) andLSi(·) corresponds to the likelihood contributions from the binary, continuous and survival parts of the two-part joint model, respectively. Withaiandbithe two vectors of random effects following a multivariate normal distribution:
ai bi
∼M V N(0,B) withB=
Σ2a Σab Σab Σ2b
. The set of parameters to estimate is Θ = (α,β,B, λ0(t),γ,φ).
Noting that
Prob(Yij >0) = exp(XAij⊤ α+ZAij⊤ ai) 1 + exp(XAij⊤ α+ZAij⊤ ai) We can deduce
log(Prob(Yij>0)) =XAij⊤ α+ZAij⊤ ai−log(1 + exp(XAij⊤ α+ZAij⊤ ai))).
Finally,
log(1−Prob(Yij >0)) =−log(1 + exp(XAij⊤ α+ZAij⊤ ai))).
We introduceUij =I[Yij >0], the likelihood contribution from the binary part can be expressed as LAi(·) =
ni
Y
j=1
P(Uij|ai)
=
ni
Y
j=1
Prob(Yij >0)Uij(1−Prob(Yij >0))(1−Uij)
=
ni
Y
j=1
Prob(Yij>0) 1−Prob(Yij>0)
Uij
(1−Prob(Yij >0))
=
ni
Y
j=1
exp XAij⊤ α+ZAij⊤ aiUij
1− exp(XAij⊤ α+ZAij⊤ ai) 1 + exp(XAij⊤ α+ZAij⊤ ai)
! .
The continuous part contribution to the likelihood has a log-normal density LBi(·) =
ni
Y
j=1
( 1 Yijp
2πσ2ϵ exp
−(log(Yij)−µij)2 2σ2ϵ
)Uij
We have defined the continuous part of the marginalized TPJM as follows log(E[Yij|bMi ]) =XBij⊤ βM+ZBij⊤ bMi , and the continuous part of the conditional TPJM as
log(E[Yij|Yij>0,bCi]) =XBij⊤ βC+ZBij⊤ bCi.
To express this model, i.e., assuming the expected mean of the outcome depends on the linear predictor and not the residual variance, the location parameter of the log-normal distribution is defined such that the dependence on the variance is
taken into account internally during the fitting procedure (i.e., in the log-normal likelihood formulation). From these, we can derive either the location parameter of the marginalized TPJM
µij =XBij⊤ βM+ZBij⊤ bMi −log(Prob(Yij>0))−σMϵ 2 2
=XBij⊤ βM+ZBij⊤ bMi +XAij⊤ αM+ZAij⊤ aMi −log(1 + exp(XAij⊤ αM+ZAij⊤ aMi ))−σϵM2 2 or the location parameter of the conditional TPJM
µij =XBij⊤ βC+ZBij⊤ bCi −σϵC2 2 .
The contribution to the likelihood from the survival part corresponds to a Cox proportional hazards model, with splines approximation of the baseline hazard
LSi(·) =
ni
Y
j=1
λi(Ti|ai, bi)δiS(Ti|ai, bi)
=
ni
Y
j=1
λi(Ti|ai, bi)δiexp − Z Ti
0
λi(t|ai, bi)dt
! .
Whereλi(t) =λ0(t) exp{XSi(t)⊤γ+h(·)φ}.
The full likelihood of the M-TPJM is therefore given by
Li(·) = Z
aMi
Z
bMi ni
Y
j=1
exp XAij⊤ αM+ZAij⊤ aMi q
2πσMϵ 2
Yij−1exp −(log(Yij)−µMij)2 2σMϵ 2
!
Uij
× 1− exp(XAij⊤ αM+ZAij⊤ aMi ) 1 + exp(XAij⊤ αM+ZAij⊤ aMi )
!
×λi(Ti|Θ)δiexp − Z Ti
0
λi(t|Θ)dt
!
p(aMi ,bMi )dbMi daMi and the log-likelihood
log(Li(Θ)) = Z
aMi
Z
bMi ni
X
j=1
(
XAij⊤ αM+ZAij⊤ aMi −log(Yij)−log(2π)
2 −log(σMϵ )
− 1 2σMϵ 2
log(Yij) +XAij⊤ αM+ZAij⊤ aMi −log(1 + exp(XAij⊤ αM+ZAij⊤ aMi ))
+ σϵM2
2 −XBij⊤ βM+ZBij⊤ bMi
!2)Uij
−log(1 + exp(XAij⊤ αM+ZAij⊤ aMi )) +δi
log (λ0(Ti|Θij)) +XSi(Ti)⊤γM+h(·)φM
− Z Ti
0
λ0(t|Θij) exp
XSi(t)⊤γM+h(·)φM dt
!
p(aMi ,bMi )dbMi daMi
Hazard ratio of treatment effect on the risk of death for the M-TPJM and the C-TPJM with the current level association
The computation of the hazard ratios of treatment effect aftert= 1 year of follow-up for the reference individual (i.e., age
<65 and sex = female), as provided in the Section 4.4 of the manuscript, are explained here. The first step consists in
computing the difference in the mean of the biomarker between the two treatment arms. For the M-TPJM, this difference is given by
M-TPJMtrt= EbM i
exp β0M+bM0i+t∗(βM1 +bM1i+β5M)
−exp βM0 +bM0i+t∗(β1M+bM1i) ,
and for the C-TPJM it is given by C-TPJMtrt= EaC
i,bCi
"
exp αC0 +aC0i+t∗(αC1 +αC5)
1 + exp αC0 +aC0i+t∗(αC1 +αC5)exp β0C+bC0i+t∗(βC1 +bC1i+β5C)
− exp αC0 +aC0i+t∗αC1
1 + exp αC0 +aC0i+t∗αC1exp β0C+bC0i+t∗(βC1 +bC1i)
#
Note that we do not include the treatment difference at baseline (i.e., β2 and α2), because these parameters just captures a randomization bias but their inclusion follows trivially. Also note that random effects have to be included, and need to be integrated out to have the population mean effect of treatment on the marginal biomarker value.
We can then compute the hazard ratio of treatment on the risk of death by combining the effect of treatment captured byγ1 and the effect captured by the associationφ:
M-TPJMoverall treatment effect= exp(γM1 +φM∗M-TPJMtrt) C-TPJMoverall treatment effect = exp(γ1C+φC∗C-TPJMtrt)
The values of the hazard ratios provided in Section 4.4 and their 95% confidence intervals are obtained by resampling the parameters using the inverse Hessian matrix of the model, with 10000 samples for each model and 10000 sets of random effects for each sample.
For the MTPJM, the population average marginal mean is expressed as:
E[Yij] = EbM
i [E(Yij|bMi )] = exp
XBij⊤ βM+1
2ZBij⊤ ΣMbbZBij
It is simple to estimate both subject-specific and population average means under the M-TPJM, particularly for specific covariates not included as random effects as their regression coefficient takes both subject-specific and population average interpretations (see apppendix B of Smith (2015)).
The multiplicative effect of treatment arm B on the mean of the biomarker compared to treatment arm A is given by
E[Yij|trt=B]
E[Yij|trt=A] = exp
βM0 +t∗(β1M+β5M) +12 1
t
ΣMb
0b1(1, t)
exp
β0M+t∗β1M+12 1
t
ΣMb
0b1(1, t)
= exp(t∗β5M),
where exp(βM1 ) and exp(β5M) corresponds to the effect of treatment at baseline and over time on the population average mean biomarker value.
References
Smith, V. A. (2015).Marginalized two-part models for semicontinuous data with application to medical costs. PhD thesis, The University of North Carolina at Chapel Hill.
0 1 2 3 4
0.00.51.01.52.02.5
Baseline hazard function (M−TPJM)
Time
Hazard
Association structure Shared random effects (SRE) Current level (CL)
0 1 2 3 4
0.00.20.40.60.81.0
Survival curves (M−TPJM current level)
Time
Survival
Treatment arm A arm B
Figure S1. Baseline risk functions under the SRE and CL association structures (left) and survival curves by treatment arm for the CL association structure (right), both obtained from the M-TPJM in the real data application.
In this figure, the baseline risk is slightly lower over time with the CL association because the individual risk depends on the biomarker current value (always positive) while the SRE associa- tion assumes an individual risk depends on the individual deviation from the mean captured by the random effects (with mean value equal to zero). The survival curves with CL association are the result of a combination of the effect of treatment on the risk of event (hazard ratio) and the effect of treatment captured by the biomarker and shared through the current level association (time-dependent). The same plot for the SRE association structure would only be based on the hazard ratio of treatment on the risk of event (not time-dependent), therefore it is easier to interpret and does not require a graphical representation. The confidence intervals are obtained by resampling the parameters using the inverse Hessian matrix of the model, taking the 2.5%
and 97.5% quantiles of 2000 simulated curves.
0 1 2 3 4
02468
Biomarker mean evolution over time (Scenario 2: C−TPJM)
Time True values
marginal TPJM w/ splines
Figure S2. Mean biomarker trajectory captured in the simulation studies from the M-TPJM with natural cubic splines with two degrees of freedom, where the true model is the C-TPJM. The curve is obtained from one single model fit for the purpose of illustrating how the population average biomarker trajectory can be made flexible with the M-TPJM and fit with the true trajectory from the C-TPJM design.
0 1 2 3
0.0 0.5 1.0 1.5 2.0
Time (years)
log(SLD+1)
M−TPJM C−TPJM OPJM Regression LOESS with 1 SD bands
Mean biomarker value (SPECTRUM)
Figure S3. Individual biomarker trajectories from the SPECTRUM data with mean value estimated by the left-censoring OPJM (OPJM), the marginalized TPJM (M-TPJM) and the conditional TPJM (C-TPJM). A local regression curve (locally estimated scatterplot smoothing, LOESS) represents the empirical mean biomarker value. Note that the LOESS curve does not take into account the correlation between the repeated measurements within an individual, informative drop-out and the semicontinuous distribution of the biomarker.
Table S1: Summary of the results of simulations scenario 4 (true model : marginalized TPJM), 300 datasets with 400 individuals each and 1000 integration points, 21.34% zeros on average (SD=1.79). The true value of the parameters estimated in the continuous part of the C-TPJM are unknown, therefore coverage probabilities are not provided for these parameters.
Variable Left-censoring OPJM C-TPJM M-TPJM
Est.∗(SD†) [CP‡] Est. (SD) [CP] Est. (SD) [CP]
Binary part
intercept α0= 4 4.01 (0.35) [95%] 4.01 (0.31) [94%]
time α1=−3 -2.96 (0.33) [94%] -2.98 (0.25) [92%]
treatment α2= 1 0.92 (0.42) [95%] 0.97 (0.39) [95%]
time:treatment α3=−2 -1.86 (0.53) [94%] -1.93 (0.41) [94%]
Continuous part
intercept β0= 1.5 1.82 (0.07) [00%] 1.54 (0.05) 1.52 (0.05) [91%]
time β1=−0.5 -0.80 (0.18) [63%] -0.07 (0.07) -0.48 (0.07) [93%]
treatment β2= 0.3 0.39 (0.09) [80%] 0.26 (0.07) 0.29 (0.07) [94%]
time:treatment β3= 0.3 -0.32 (0.26) [32%] 0.45 (0.10) 0.31 (0.11) [94%]
residual S.E. σϵ= 0.3 0.84 (0.07) [00%] 0.33 (0.01) 0.30 (0.01) [94%]
Survival part
treatment γ=−0.2 -0.12 (0.13) [87%] -0.12 (0.12) [89%] -0.15 (0.12) [93%]
association φ= 0.08 0.09 (0.02) [95%] 0.08 (0.02) [96%] 0.08 (0.02) [95%]
Random effects
intercept (binary part) σa= 1.4 1.33 (0.20) 1.33 (0.20)
intercept (continuous part) σb0= 0.6 0.38 (0.07) 0.61 (0.03) 0.61 (0.03) slope (continuous part) σb1= 0.3 1.20 (0.21) 0.43 (0.08) 0.26 (0.11)
corab0 = 0.5 0.41 (0.13) 0.55 (0.12)
corab1 = 0.5 -0.23 (0.27) 0.43 (0.38)
corb0b1 = 0.2 0.45 (0.24) -0.37 (0.14) 0.37 (0.34)
Convergence rate 100% 100% 100%
∗Mean of parameter estimates;† Standard deviation from the mean;‡ Coverage probability
Table S2: Summary of the results of simulations scenario 5 (true model : conditional TPJM), 300 datasets with 400 individuals each and 1000 integration points, 22.00% zeros on aver- age (SD=1.78). The true value of the parameters estimated in the continuous part of the left-censoring OPJM and the M-TPJM are unknown, therefore coverage probabilities are not provided for these parameters.
Variable Left-censoring OPJM C-TPJM M-TPJM
Est.∗(SD†) [CP‡] Est. (SD) [CP] Est. (SD) [CP]
Binary part
intercept α0= 4 4.05 (0.35) [96%] 3.56 (0.30) [60%]
time α1=−3 -3.01 (0.33) [96%] -2.26 (0.27) [19%]
treatment α2= 1 0.98 (0.43) [96%] 0.61 (0.37) [78%]
time:treatment α3=−2 -1.98 (0.54) [95%] -1.30 (0.43) [45%]
Continuous part
intercept β0= 1.5 1.76 (0.09) 1.52 (0.05) [89%] 1.49 (0.05)
time β1=−0.5 -1.05 (0.21) -0.50 (0.06) [97%] -0.79 (0.08)
treatment β2= 0.3 0.46 (0.08) 0.30 (0.07) [94%] 0.34 (0.07)
time:treatment β3= 0.3 -0.48 (0.29) 0.31 (0.09) [93%] 0.19 (0.12) residual S.E. σϵ= 0.3 0.81 (0.10) 0.30 (0.01) [95%] 0.30 (0.01) Survival part
treatment γ=−0.2 -0.22 (0.12) [94%] -0.20 (0.12) [93%] -0.22 (0.12) [95%]
association φ= 0.08 0.10 (0.03) [92%] 0.08 (0.02) [95%] 0.08 (0.02) [94%]
Random effects
intercept (binary part) σa= 1.4 1.34 (0.21) 1.21 (0.18)
intercept (continuous part) σb0 = 0.6 0.48 (0.07) 0.61 (0.03) 0.63 (0.03) slope (continuous part) σb1 = 0.3 1.32 (0.36) 0.29 (0.06) 0.47 (0.09)
corab0= 0.5 0.52 (0.12) 0.62 (0.11)
corab1= 0.5 0.51 (0.25) 0.74 (0.13)
corb0b1 = 0.2 0.44 (0.20) 0.22 (0.23) 0.49 (0.18)
Convergence rate 100% 100% 100%
∗Mean of parameter estimates;† Standard deviation from the mean;‡Coverage probability
Table S3: Summary of the results of simulations scenario 6 (true model : Left-censoring OPJM), 300 datasets with 400 individuals each and 1000 integration points, 20.03% zeros on average (SD=0.02). The true value of the parameters estimated in the continuous part of the C-TPJM are unknown, therefore coverage probabilities are not provided for these parameters.
Variable Left-censoring OPJM C-TPJM M-TPJM Est.∗ (SD†) [CP‡] Est. (SD) [CP] Est. (SD) [CP]
Binary part
intercept α0 5.34 (0.64) 3.67 (0.40)
time α1 -3.14 (0.52) -2.09 (0.34)
treatment α2 2.51 (0.77) 1.38 (0.40)
time:treatment α3 1.19 (0.69) 0.69 (0.38)
Continuous part
intercept β0= 1.5 1.50 (0.04) [97%] 1.57 (0.04) 1.46 (0.05) [89%]
time β1=−0.5 -0.52 (0.06) [93%] -0.37 (0.05) -0.63 (0.07) [54%]
treatment β2= 0.3 0.31 (0.06) [94%] 0.28 (0.06) 0.34 (0.07) [92%]
time:treatment β3= 0.3 0.31 (0.08) [93%] 0.20 (0.07) 0.40 (0.10) [75%]
residual S.E. σϵ= 0.3 0.29 (0.01) [91%] 0.28 (0.01) 0.28 (0.01) [32%]
Survival part
treatment γ=−0.2 -0.21 (0.13) [95%] -0.21 (0.13) [94%] -0.21 (0.13) [94%]
association φ= 0.08 0.08 (0.02) [92%] 0.08 (0.02) [94%] 0.08 (0.02) [93%]
Random effects
intercept (binary part) σa 4.62 (0.51) 2.54 (0.26)
intercept (continuous part) σb0= 0.6 0.60 (0.03) 0.56 (0.03) 0.66 (0.03) slope (continuous part) σb1= 0.3 0.30 (0.05) 0.19 (0.05) 0.30 (0.06)
corab0 0.96 (0.03) 0.98 (0.01)
corab1 0.16 (0.28) 0.78 (0.12)
corb0b1= 0.2 0.23 (0.17) -0.11 (0.27) 0.65 (0.17)
Convergence rate 99% 99% 100%
∗Mean of parameter estimates;†Standard deviation from the mean;‡Coverage probability