And Sam, thank you for giving me so many interesting research questions to tackle, and for your mentorship and guidance as I learned how to be an effective collaborator. To my parents, Rob and Andrea Sigworth, thank you for all the ways you have supported and encouraged me.
Abstract
Introduction of Topics Covered
Building a Dose Toxo-Equivalence Model from a Bayesian Meta-Analysis of Pub-
Thus, we found ourselves in need of a method that relied only on publicly available data, but could produce efficient results consistent with a model fit to the individual subject's data. We developed our model in a Bayesian framework and benchmarked its performance against individual subject data via simulation, then applied it to the taxane clinical trial metadata described earlier.
An Empirical Comparison of Methods for Efficiently Estimating Time-Varying Ef-
This analysis looked at the potential for a time-varying effect of metformin exposure on survival in individuals with various tumor types, modeled while controlling for the demographic information, ICD-9 codes, and drug exposures available in the electronic health record.
Efficient Estimation of the Cox Model with Time-Varying Effects Under Two-
We compare the computational and inferential efficiency of our method with existing methods for time-varying coefficient estimation and conclude with the performance of our method on the colorectal cancer data in Chapter 4.
Summary
Introduction
The use of aggregated summary data in study-level meta-analyses may introduce bias. In this article, we use the term individual patient data (IPD) meta-analysis for the regression analysis based on individual patients and refer to the analysis based on aggregated study-level data as study-level meta-analysis (SL).
Motivating Example
Through extensive simulation studies in Section 2.5, we demonstrate the ability of this model to closely approximate the true toxic-equivalent dose relationships for different study designs, different levels of between-study variance, and in the presence of level data. subject (which are non-treatment or dose effect modifiers) in addition to study-level covariates. We apply the method for study-level data described in Section 2.4 to these data in Section 2.6, after exploring its performance against individual patient data.
Methods
Hierarchical Model Structure
- IPD Meta-analysis
- Study-level Meta-analysis
In this model, α1 is the mean score for studies on drug A with a normalized dose of 0 and α1+α2. As before, β1 represents the mean score for studies on drug A with a normalized dose of 0 and β1+β2 is the mean score for studies on drug B with a normalized dose of 0, while µ represents an estimated random intercept component at the study level to measure between- study heterogeneity.
Equivalence Relationship
When including additional covariates in the data-generating model, we consider SL meta-analysis models both with and without Zai, denoted SL-C and SL-NC, respectively. Similarly, we calculate dose pairs (dA,dkB,IPD) from the posterior of our IPD meta-analysis.
Simulation Study
Simulation for a Correctly Specified Model
- Method
- Results
In Figure 2.1 PanelC we present the estimated dose-toxo equivalence curves and 95% credible intervals for the IPD-meta (pink) and SL-meta (blue) model fits without additional covariates and compare them to the actual dose-toxo equivalence relationship (green ), byτ2(columns) and study design (rows). Without additional covariates, the performance of the IPD meta- and SL meta-models in recovering the true dose-toxo equivalence relationship is comparable.
Simulation Under a Misspecified Model
- Misspecified Model Simulation
- Misspecified Model Results
With additional covariates, the SL-C and SL-NC models again lasted one minute, while the IPD model lasted approximately 24 hours. Thus, the SL-C and SL-NC curves provide a reasonable estimate of the relationship between the sexes, but not at a sex-specific level.
Real-World Application: Taxane Treatment and Neuropathy
Compared to the simulated curves, the width and shape of the credible interval around this ratio is consistent with ours. Along the IQR of paclitaxel doses observed in our data, 656−1085mg/m2, the width of the credible interval was approximately equal to the cumulative dose of two treatment cycles of docetaxel, providing useful guidance to clinicians.
Discussion
In the next two, we consider the possibility that the subject-specific covariateZijis a moderator of the dose toxo-equivalence relationship. The SL-C and SL-NC models are fit as in Section 2.5.1 in the context of aggregated covariates (excluding interactions).
Summary
Introduction
An empirical comparison of methods for efficiently estimating time-varying effects in the Cox model. Multiple established methods exist for estimating time-varying coefficients within the Cox regression framework.
Review of Methods
The Cox Model with Time-Varying Effects
In this work, we compared the relative performance of the previously mentioned estimation methods via extensive simulation studies. In section 3.3.2 we describe the five special estimation methods whose performance we have chosen to compare.
Methods of Estimation
- Bernstein Polynomials
- B-splines
- Penalized Splines
- Restricted Cubic Splines
- Local Linear Estimation
The behavior of ordinary cubic splines can be bad in the tail of the curve (before the first and after the last node, also called the boundary nodes). Further details, including a derivation of the estimation of the variance of ˆa(t), can be found in Cai and Sun (2003) and Tian et al.
Simulation Study
- Data Simulation
- Tuning Parameter Selection by Estimation Method
- Model Fitting Procedure
- Methods of Evaluation
For local linear estimation, the variance of β(t) at each time is calculated following Cai and Sun (2003) based on the second derivative of the local linear partial likelihood function. Similarly to the above, the optimal fits for each of the 1000 simulated data sets were then summarized along with the tuning-specific summaries.
Simulation Results
Within Estimation Method
Finally, looking at PanelC, we see that mean squared error increases with increasing order across all curve shapes, especially at the end of the time series from ft=1.5 onwards. The performance of the optimal curve is now aligned with a 5 node fit in both panelsA and C.
Between Estimation Methods
In PanelB, the coverage is low at a small number of nodes, especially for the 3 nodes in curves (c) to (e) around the articulation points. The optimal curve behavior for constrained cubic joints closely matches orders 4 and 5 in terms of curve shape, but its coverage is very low for (c) to (e).
Real-World Application: Metformin in Cancer Patients
Data Source and Methods
These simulations are evaluated at intervals of 0.05, but modifying that interval width would dictate the computation time. Furthermore, penalized splines were not fit to these data due to their poor coverage in the simulation.
Results
In Figure 3.9, comparing all robust methods, we similarly find that the effect of metformin when modeled across all cancers shows a suggestion of a time-varying effect, with a similar upward trend modeled by both Bernstein polynomials and from the constrained cubic lines compared to fit lines B. However, the findings pertaining to each cancer are consistent across all methods considered and all support further investigation of an effect of metformin that varies over time in cancer survival.
Discussion
Of all the methods examined, B-splines were the only ones that maintained nominal coverage regardless of the number of knots in the base. With increasing flexibility, especially in B-splines, we observed large variability in the tails of the estimated β(t).
Introduction
In this work, we begin by outlining an example of a use case of the Cox model with time-varying effects in a two-phase setting, with inspiration from a large cohort study. Next, we detail our method for efficient two-phase estimation of the Cox model with time-varying effects, detailing the B-spline fits and the expectation maximization algorithm for coefficient estimation, and a profile likelihood approach for estimating standard errors.
Motivating Example
Methods
Efficient Two-Phase Estimation
Details about the construction of the B-spline bases{Awl (·)}dl=1n and{Bqj(·)}sj=1n and instructions about the choices of(w,dn)and(q,sn) can be found in Chen et al. In step E of iteration (t+1), we calculate the conditional expectations of I(Xi=xk,Ui=j/sn)andI(Xi=xk)given(Yi,∆i,Zi),x1 ,.
Inverse Probability Weighting
Local Linear Estimation for Nested Case Control Data
The very first component of this vector, ˆa1(t), is of interest for this manuscript, since our goal is estimation of the time-varying coefficient on the expensive covariateX in particular. For further details, including details on the variance of ˆa(t), can be found in Liu et al.
Simulation Study
Data Simulation
For each data set, subjects were chosen for inclusion in Phase II using one of three methods: case-cohort sampling, nested case control sampling, or mixed sampling. From there, we repeated the above sampling approaches for each subpopulation and then recombined the subjects to form the full cohort of 2,000 for each simulation.
Model Fitting
Both the IPW and local linear estimation models were only suitable to complete the case data. Finally, for local linear estimation, the variance of β(t) at each time point t is calculated, following Liu et al. 2010), based on the second derivative of the local linear partial likelihood function.
Methods for Evaluating Simulation Results
Both IPW and local linear estimation fits were allocated 8 GB of system memory, while the two-phase fits were allocated 32 GB to account for the larger total data size during fitting due to the incorporation of all Phase I data.
Simulation Results
Furthermore, we present the relative performance in estimating γ between two-phase B-fits and IPW fits. Biphasic fits lasted 2-3 minutes in the binary case and 1-2 hours in the most complex federal case (curve shape (e)).
Real-World Application: Oxidative Stress and Cancer Risk
Data Source and Methods
To determine which covariates should be included in the PCA, we use the Phase II data to fit a linear regression model with the oxidative stress outcome and include all of the previously listed covariates as predictors. All candidate variables were included in the final model, including those not included in the PCA.
Results
Since there were several low-cost candidate covariates to use in the two-phase B-splines approach, we chose to use principal components (PCA) (Salem and Hussein, 2019) to reduce the dimension before estimating the conditional probability function. The IPW model for comparison was fit to the same set of covariates as the two-stage model, and with the same specifications for the B-spline basis used to approximate β(t).
Discussion
Confidence Interval Type Empirical SD Fitted SE Fit B−splines − nested case control sampling IPW − nested case control sampling Truth. Regression splines in the cox model with application to covariate effects in liver diseases.Journal of the American Statistical Association.
Performance of fits without additional covariates
Performance of fits with additional covariates
Quantile placement of knots
Data simulation specifications
Dose toxo-equivalence relationships without additional covariates
Dose toxo-equivalence relationships with additional covariates
Dose toxo-equivalence relationships under misspecification
Dose toxo-equivalence relationships with varied dose
Summaries of model fit to taxane data
Simulation results for Bernstein polynomials
Simulation results for B-splines
Simulation results for penalized splines
Simulation results for restricted cubic splines
Simulation results for local linear estimation
Simulation results compared by method
Metformin exposure fits for B-splines
Metformin exposure fits compared by method
Simulation results for moderate correlation, case-cohort sampling
Simulation results for moderate correlation, mixed sampling
Simulation results for moderate correlation, nested case control sampling
Simulation results for high correlation, case-cohort sampling
Simulation results for high correlation, mixed sampling
Simulation results for high correlation, nested case control sampling
Simulation results for high correlation, rare disease, case-cohort sampling
Simulation results for high correlation, rare disease, mixed sampling
Simulation results for high correlation, rare disease, nested case control sampling
Simulation results for relative efficiency of B-spline to IPW fits in γ estimation
Simulation results for relative efficiency of B-spline to full cohort fits in β (t) estimation
Summaries of two-phase and IPW models fit to colorectal cancer data