A marginalized two-part joint model for a longitudinal biomarker and a terminal event with application to advanced head and neck cancers

(1)

A marginalized two-part joint model for a longitudinal biomarker and a terminal event with

application to advanced head and neck cancers

Item Type Article

Authors Rustand, Denis; Briollais, Laurent; Rondeau, Virginie

Citation Rustand, D., Briollais, L., & Rondeau, V. (2023). A marginalized two-part joint model for a longitudinal biomarker and a terminal event with application to advanced head and neck cancers.

Pharmaceutical Statistics. Portico. https://doi.org/10.1002/

pst.2338

Eprint version Publisher's Version/PDF

DOI 10.1002/pst.2338

Publisher Wiley

Journal Pharmaceutical Statistics

Rights Archived with thanks to Pharmaceutical Statistics under a Creative Commons license, details at: http://

creativecommons.org/licenses/by/4.0/

Download date 23/09/2023 06:23:07

Item License http://creativecommons.org/licenses/by/4.0/

Link to Item http://hdl.handle.net/10754/694566

(2)

Supplementary Material for “A marginalized two-part joint model for a longitudinal biomarker and a terminal event with application to advanced head and neck cancers”

Details on the likelihood of the model

The full likelihood of the model can be expressed as Li(·) =

Z

a_i

Z

b_i

LAi(·)LBi(·)LSi(·)p(ai,bi)dbidai

WhereL_Ai(·),L_Bi(·) andL_Si(·) corresponds to the likelihood contributions from the binary, continuous and survival parts of the two-part joint model, respectively. Witha_iandb_ithe two vectors of random effects following a multivariate normal distribution:

a_i b_i

∼M V N(0,B) withB=

Σ²_a Σ_ab Σ_ab Σ²_b

. The set of parameters to estimate is Θ = (α,β,B, λ₀(t),γ,φ).

Noting that

Prob(Y_ij >0) = exp(X_Aij^⊤ α+Z_Aij^⊤ a_i) 1 + exp(X_Aij^⊤ α+Z_Aij^⊤ ai) We can deduce

log(Prob(Y_ij>0)) =X_Aij^⊤ α+Z_Aij^⊤ a_i−log(1 + exp(X_Aij^⊤ α+Z_Aij^⊤ a_i))).

Finally,

log(1−Prob(Yij >0)) =−log(1 + exp(X_Aij^⊤ α+Z_Aij^⊤ ai))).

We introduceUij =I[Yij >0], the likelihood contribution from the binary part can be expressed as L_Ai(·) =

ni

Y

j=1

P(U_ij|ai)

=

ni

Y

j=1

Prob(Y_ij >0)Ûîj(1−Prob(Y_ij >0))^(1−Uîj⁾

=

n_i

Y

j=1

Prob(Yij>0) 1−Prob(Yij>0)

Uij

(1−Prob(Yij >0))

=

n_i

Y

j=1

exp X_Aij^⊤ α+Z_Aij^⊤ ai^Uij

1− exp(X_Aij^⊤ α+Z_Aij^⊤ ai) 1 + exp(X_Aij^⊤ α+Z_Aij^⊤ ai)

! .

The continuous part contribution to the likelihood has a log-normal density LBi(·) =

n_i

Y

j=1

( 1 Y_ijp

2πσ²_ϵ exp

−(log(Yij)−µij)² 2σ²_ϵ

)Uij

We have defined the continuous part of the marginalized TPJM as follows log(E[Yij|b^M_i ]) =X_Bij^⊤ β^M+Z_Bij^⊤ b^M_i , and the continuous part of the conditional TPJM as

log(E[Yij|Yij>0,b^C_i]) =X_Bij^⊤ β^C+Z_Bij^⊤ b^C_i.

To express this model, i.e., assuming the expected mean of the outcome depends on the linear predictor and not the residual variance, the location parameter of the log-normal distribution is defined such that the dependence on the variance is

(3)

taken into account internally during the fitting procedure (i.e., in the log-normal likelihood formulation). From these, we can derive either the location parameter of the marginalized TPJM

µij =X_Bij^⊤ β^M+Z_Bij^⊤ b^M_i −log(Prob(Yij>0))−σ^M_ϵ ² 2

=X_Bij^⊤ β^M+Z_Bij^⊤ b^M_i +X_Aij^⊤ α^M+Z_Aij^⊤ a^M_i −log(1 + exp(X_Aij^⊤ α^M+Z_Aij^⊤ a^M_i ))−σ_ϵ^M² 2 or the location parameter of the conditional TPJM

µij =X_Bij^⊤ β^C+Z_Bij^⊤ b^C_i −σ_ϵ^C² 2 .

The contribution to the likelihood from the survival part corresponds to a Cox proportional hazards model, with splines approximation of the baseline hazard

LSi(·) =

n_i

Y

j=1

λi(Ti|ai, bi)^δⁱS(Ti|ai, bi)

=

n_i

Y

j=1

λ_i(T_i|a_i, b_i)^δⁱexp − Z Ti

0

λ_i(t|a_i, b_i)dt

! .

Whereλi(t) =λ0(t) exp{XSi(t)^⊤γ+h(·)φ}.

The full likelihood of the M-TPJM is therefore given by

Li(·) = Z

a^M_i

Z

b^M_i n_i

Y

j=1











exp X_Aij^⊤ α^M+Z_Aij^⊤ a^M_i q

2πσ^M_ϵ ²

Y_ij⁻¹exp −(log(Y_ij)−µ^M_ij)² 2σ^M_ϵ ²

!











U_ij

× 1− exp(X_Aij^⊤ α^M+Z_Aij^⊤ a^M_i ) 1 + exp(X_Aij^⊤ α^M+Z_Aij^⊤ a^M_i )

!

×λi(Ti|Θ)^δⁱexp − Z Ti

0

λi(t|Θ)dt

!

p(a^M_i ,b^M_i )db^M_i da^M_i and the log-likelihood

log(Li(Θ)) = Z

a^M_i

Z

b^M_i n_i

X

j=1

(

X_Aij^⊤ α^M+Z_Aij^⊤ a^M_i −log(Yij)−log(2π)

2 −log(σ^M_ϵ )

− 1 2σ^M_ϵ ²

log(Yij) +X_Aij^⊤ α^M+Z_Aij^⊤ a^M_i −log(1 + exp(X_Aij^⊤ α^M+Z_Aij^⊤ a^M_i ))

+ σ_ϵ^M²

2 −X_Bij^⊤ β^M+Z_Bij^⊤ b^M_i

!2)Uij

−log(1 + exp(X_Aij^⊤ α^M+Z_Aij^⊤ a^M_i )) +δi

log (λ0(Ti|Θij)) +XSi(Ti)^⊤γ^M+h(·)φ^M

− Z T_i

0

λ0(t|Θij) exp

XSi(t)^⊤γ^M+h(·)φ^M dt

!

p(a^M_i ,b^M_i )db^M_i da^M_i

Hazard ratio of treatment effect on the risk of death for the M-TPJM and the C-TPJM with the current level association

The computation of the hazard ratios of treatment effect aftert= 1 year of follow-up for the reference individual (i.e., age

<65 and sex = female), as provided in the Section 4.4 of the manuscript, are explained here. The first step consists in

(4)

computing the difference in the mean of the biomarker between the two treatment arms. For the M-TPJM, this difference is given by

M-TPJM_trt= E_bM i

exp β₀^M+b^M_0i+t∗(β^M₁ +b^M_1i+β₅^M)

−exp β^M₀ +b^M_0i+t∗(β₁^M+b^M_1i) ,

and for the C-TPJM it is given by C-TPJMtrt= E_aC

i,b^C_i

"

exp α^C₀ +a^C_0i+t∗(α^C₁ +α^C₅)

1 + exp α^C₀ +a^C_0i+t∗(α^C₁ +α^C₅)exp β₀^C+b^C_0i+t∗(β^C₁ +b^C_1i+β₅^C)

− exp α^C₀ +a^C_0i+t∗α^C₁

1 + exp α^C₀ +a^C_0i+t∗α^C₁exp β₀^C+b^C_0i+t∗(β^C₁ +b^C_1i)

#

Note that we do not include the treatment difference at baseline (i.e., β2 and α2), because these parameters just captures a randomization bias but their inclusion follows trivially. Also note that random effects have to be included, and need to be integrated out to have the population mean effect of treatment on the marginal biomarker value.

We can then compute the hazard ratio of treatment on the risk of death by combining the effect of treatment captured byγ1 and the effect captured by the associationφ:

M-TPJMoverall treatment effect= exp(γ^M₁ +φ^M∗M-TPJM_trt) C-TPJMoverall treatment effect = exp(γ₁^C+φ^C∗C-TPJM_trt)

The values of the hazard ratios provided in Section 4.4 and their 95% confidence intervals are obtained by resampling the parameters using the inverse Hessian matrix of the model, with 10000 samples for each model and 10000 sets of random effects for each sample.

For the MTPJM, the population average marginal mean is expressed as:

E[Yij] = E_bM

i [E(Yij|b^M_i )] = exp

X_Bij^⊤ β^M+1

2Z_Bij^⊤ Σ^M_bbZBij

It is simple to estimate both subject-specific and population average means under the M-TPJM, particularly for specific covariates not included as random effects as their regression coefficient takes both subject-specific and population average interpretations (see apppendix B of Smith (2015)).

The multiplicative effect of treatment arm B on the mean of the biomarker compared to treatment arm A is given by

E[Yij|trt=B]

E[Y_ij|trt=A] = exp

β^M₀ +t∗(β₁^M+β₅^M) +¹₂ 1

t

Σ^M_b

0b₁(1, t)

exp

β₀^M+t∗β₁^M+¹₂ 1

t

Σ^M_b

0b1(1, t)

= exp(t∗β₅^M),

where exp(β^M₁ ) and exp(β₅^M) corresponds to the effect of treatment at baseline and over time on the population average mean biomarker value.

References

Smith, V. A. (2015).Marginalized two-part models for semicontinuous data with application to medical costs. PhD thesis, The University of North Carolina at Chapel Hill.

(5)

0 1 2 3 4

0.00.51.01.52.02.5

Baseline hazard function (M−TPJM)

Time

Hazard

Association structure Shared random effects (SRE) Current level (CL)

0 1 2 3 4

0.00.20.40.60.81.0

Survival curves (M−TPJM current level)

Time

Survival

Treatment arm A arm B

Figure S1. Baseline risk functions under the SRE and CL association structures (left) and survival curves by treatment arm for the CL association structure (right), both obtained from the M-TPJM in the real data application.

In this figure, the baseline risk is slightly lower over time with the CL association because the individual risk depends on the biomarker current value (always positive) while the SRE association assumes an individual risk depends on the individual deviation from the mean captured by the random effects (with mean value equal to zero). The survival curves with CL association are the result of a combination of the effect of treatment on the risk of event (hazard ratio) and the effect of treatment captured by the biomarker and shared through the current level association (time-dependent). The same plot for the SRE association structure would only be based on the hazard ratio of treatment on the risk of event (not time-dependent), therefore it is easier to interpret and does not require a graphical representation. The confidence intervals are obtained by resampling the parameters using the inverse Hessian matrix of the model, taking the 2.5%

and 97.5% quantiles of 2000 simulated curves.

0 1 2 3 4

02468

Biomarker mean evolution over time (Scenario 2: C−TPJM)

Time True values

marginal TPJM w/ splines

Figure S2. Mean biomarker trajectory captured in the simulation studies from the M-TPJM with natural cubic splines with two degrees of freedom, where the true model is the C-TPJM. The curve is obtained from one single model fit for the purpose of illustrating how the population average biomarker trajectory can be made flexible with the M-TPJM and fit with the true trajectory from the C-TPJM design.

(6)

0 1 2 3

0.0 0.5 1.0 1.5 2.0

Time (years)

log(SLD+1)

M−TPJM C−TPJM OPJM Regression LOESS with 1 SD bands

Mean biomarker value (SPECTRUM)

Figure S3. Individual biomarker trajectories from the SPECTRUM data with mean value estimated by the left-censoring OPJM (OPJM), the marginalized TPJM (M-TPJM) and the conditional TPJM (C-TPJM). A local regression curve (locally estimated scatterplot smoothing, LOESS) represents the empirical mean biomarker value. Note that the LOESS curve does not take into account the correlation between the repeated measurements within an individual, informative drop-out and the semicontinuous distribution of the biomarker.

Table S1: Summary of the results of simulations scenario 4 (true model : marginalized TPJM), 300 datasets with 400 individuals each and 1000 integration points, 21.34% zeros on average (SD=1.79). The true value of the parameters estimated in the continuous part of the C-TPJM are unknown, therefore coverage probabilities are not provided for these parameters.

Variable Left-censoring OPJM C-TPJM M-TPJM

Est.^∗(SD^†) [CP^‡] Est. (SD) [CP] Est. (SD) [CP]

Binary part

intercept α0= 4 4.01 (0.35) [95%] 4.01 (0.31) [94%]

time α1=−3 -2.96 (0.33) [94%] -2.98 (0.25) [92%]

treatment α2= 1 0.92 (0.42) [95%] 0.97 (0.39) [95%]

time:treatment α3=−2 -1.86 (0.53) [94%] -1.93 (0.41) [94%]

Continuous part

intercept β0= 1.5 1.82 (0.07) [00%] 1.54 (0.05) 1.52 (0.05) [91%]

time β1=−0.5 -0.80 (0.18) [63%] -0.07 (0.07) -0.48 (0.07) [93%]

treatment β2= 0.3 0.39 (0.09) [80%] 0.26 (0.07) 0.29 (0.07) [94%]

time:treatment β3= 0.3 -0.32 (0.26) [32%] 0.45 (0.10) 0.31 (0.11) [94%]

residual S.E. σϵ= 0.3 0.84 (0.07) [00%] 0.33 (0.01) 0.30 (0.01) [94%]

Survival part

treatment γ=−0.2 -0.12 (0.13) [87%] -0.12 (0.12) [89%] -0.15 (0.12) [93%]

association φ= 0.08 0.09 (0.02) [95%] 0.08 (0.02) [96%] 0.08 (0.02) [95%]

Random effects

intercept (binary part) σa= 1.4 1.33 (0.20) 1.33 (0.20)

intercept (continuous part) σb0= 0.6 0.38 (0.07) 0.61 (0.03) 0.61 (0.03) slope (continuous part) σb₁= 0.3 1.20 (0.21) 0.43 (0.08) 0.26 (0.11)

corab₀ = 0.5 0.41 (0.13) 0.55 (0.12)

corab₁ = 0.5 -0.23 (0.27) 0.43 (0.38)

corb₀b₁ = 0.2 0.45 (0.24) -0.37 (0.14) 0.37 (0.34)

Convergence rate 100% 100% 100%

∗Mean of parameter estimates;^† Standard deviation from the mean;^‡ Coverage probability

(7)

Table S2: Summary of the results of simulations scenario 5 (true model : conditional TPJM), 300 datasets with 400 individuals each and 1000 integration points, 22.00% zeros on average (SD=1.78). The true value of the parameters estimated in the continuous part of the left-censoring OPJM and the M-TPJM are unknown, therefore coverage probabilities are not provided for these parameters.

Variable Left-censoring OPJM C-TPJM M-TPJM

Est.^∗(SD^†) [CP^‡] Est. (SD) [CP] Est. (SD) [CP]

Binary part

intercept α0= 4 4.05 (0.35) [96%] 3.56 (0.30) [60%]

time α1=−3 -3.01 (0.33) [96%] -2.26 (0.27) [19%]

treatment α2= 1 0.98 (0.43) [96%] 0.61 (0.37) [78%]

time:treatment α3=−2 -1.98 (0.54) [95%] -1.30 (0.43) [45%]

Continuous part

intercept β0= 1.5 1.76 (0.09) 1.52 (0.05) [89%] 1.49 (0.05)

time β1=−0.5 -1.05 (0.21) -0.50 (0.06) [97%] -0.79 (0.08)

treatment β2= 0.3 0.46 (0.08) 0.30 (0.07) [94%] 0.34 (0.07)

time:treatment β3= 0.3 -0.48 (0.29) 0.31 (0.09) [93%] 0.19 (0.12) residual S.E. σϵ= 0.3 0.81 (0.10) 0.30 (0.01) [95%] 0.30 (0.01) Survival part

treatment γ=−0.2 -0.22 (0.12) [94%] -0.20 (0.12) [93%] -0.22 (0.12) [95%]

association φ= 0.08 0.10 (0.03) [92%] 0.08 (0.02) [95%] 0.08 (0.02) [94%]

Random effects

intercept (binary part) σa= 1.4 1.34 (0.21) 1.21 (0.18)

intercept (continuous part) σb₀ = 0.6 0.48 (0.07) 0.61 (0.03) 0.63 (0.03) slope (continuous part) σb₁ = 0.3 1.32 (0.36) 0.29 (0.06) 0.47 (0.09)

corab0= 0.5 0.52 (0.12) 0.62 (0.11)

corab₁= 0.5 0.51 (0.25) 0.74 (0.13)

corb0b1 = 0.2 0.44 (0.20) 0.22 (0.23) 0.49 (0.18)

∗Mean of parameter estimates;^† Standard deviation from the mean;^‡Coverage probability

(8)

Table S3: Summary of the results of simulations scenario 6 (true model : Left-censoring OPJM), 300 datasets with 400 individuals each and 1000 integration points, 20.03% zeros on average (SD=0.02). The true value of the parameters estimated in the continuous part of the C-TPJM are unknown, therefore coverage probabilities are not provided for these parameters.

Variable Left-censoring OPJM C-TPJM M-TPJM Est.^∗ (SD^†) [CP^‡] Est. (SD) [CP] Est. (SD) [CP]

Binary part

intercept α0 5.34 (0.64) 3.67 (0.40)

time α1 -3.14 (0.52) -2.09 (0.34)

treatment α2 2.51 (0.77) 1.38 (0.40)

time:treatment α3 1.19 (0.69) 0.69 (0.38)

Continuous part

intercept β0= 1.5 1.50 (0.04) [97%] 1.57 (0.04) 1.46 (0.05) [89%]

time β1=−0.5 -0.52 (0.06) [93%] -0.37 (0.05) -0.63 (0.07) [54%]

treatment β2= 0.3 0.31 (0.06) [94%] 0.28 (0.06) 0.34 (0.07) [92%]

time:treatment β3= 0.3 0.31 (0.08) [93%] 0.20 (0.07) 0.40 (0.10) [75%]

residual S.E. σϵ= 0.3 0.29 (0.01) [91%] 0.28 (0.01) 0.28 (0.01) [32%]

Survival part

treatment γ=−0.2 -0.21 (0.13) [95%] -0.21 (0.13) [94%] -0.21 (0.13) [94%]

association φ= 0.08 0.08 (0.02) [92%] 0.08 (0.02) [94%] 0.08 (0.02) [93%]

Random effects

intercept (binary part) σa 4.62 (0.51) 2.54 (0.26)

intercept (continuous part) σb₀= 0.6 0.60 (0.03) 0.56 (0.03) 0.66 (0.03) slope (continuous part) σb₁= 0.3 0.30 (0.05) 0.19 (0.05) 0.30 (0.06)

corab0 0.96 (0.03) 0.98 (0.01)

corab₁ 0.16 (0.28) 0.78 (0.12)

corb0b1= 0.2 0.23 (0.17) -0.11 (0.27) 0.65 (0.17)

∗Mean of parameter estimates;^†Standard deviation from the mean;^‡Coverage probability