Decreased susceptibility of marginal odds ratios to finite-sample bias Supplemental Digital Content Table of contents eMethods

(1)

1 Decreased susceptibility of marginal odds ratios to finite-sample bias

Supplemental Digital Content Table of contents

eMethods. Data generation

eFigure 1. Relative error of the log of the odds ratio, comparison of covariate-conditional and marginal estimates at of 2 and 15 events per parameter.

eTable 1. Sensitivity analysis: Results when covariate-conditional odds ratio set to 1.5.

eTable 2. Sensitivity analysis: Results when marginal exposure and outcome prevalence set to 0.15 and number of covariates is 14

(2)

2 eMethods. Data generation

Let 𝑋_𝑖 be a binary random variable representing treatment (where 𝑋 = 1 is treated and 𝑋 = 0 is untreated) for individual 𝑖, 𝒁_𝒊 represent a vector of standard normal covariates , 𝑌_𝑖⁰ be a binary random variable representing the potential outcome when not treated (𝑋 = 0), 𝑌_𝑖¹ be a binary random variable representing the potential outcome when treated, and 𝑌𝑖 represent the observed outcome given the observed value of 𝑋𝑖. We generated 5,000 trials of sample size 𝑁 = 200 for EPV 2, and 𝑁 = 1500 for EPV 15. Individuals (𝑖 = 1 to 𝑁) were independent.

Subscript 𝑖 is suppressed for remaining eMethods.

The following parameters were set (input) to generate the data. Let 𝑝_𝑥 and 𝑝_𝑦 be the referent prevalence of 𝑋 and 𝑌, respectively (i.e., prevalence when all other variables in the probability model are 0); and 𝑂𝑅_𝑍𝑋, 𝑂𝑅_𝑍𝑌, and 𝑂𝑅_𝑋𝑌 be the conditional odds ratios of each 𝑍 and 𝑋, each 𝑍 and 𝑌, and 𝑋 and 𝑌, respectively.

Let 𝑒𝑥𝑝𝑖𝑡(𝑏) = 1/(1 + 𝑒^−b), where 𝑏 is the ln(𝑜𝑑𝑑𝑠).

Data were generated in the following order:

1. 𝒁^′: 29 independent draws from a standard normal distribution, 𝑁(0,1).

2. 𝑋: drawn from a Bernoulli distribution with 𝑝 = 𝑒𝑥𝑝𝑖𝑡 (ln ( ^𝑝^𝑥

1−𝑝_𝑥) + ln(𝑂𝑅_𝑍𝑋) × 𝒁′).

3. 𝑌⁰: drawn from a Bernoulli distribution with 𝑝 = 𝑒𝑥𝑝𝑖𝑡 (ln ( ^𝑝^𝑦

1−𝑝_𝑦) + ln(𝑂𝑅_𝑍𝑌) × 𝒁^′) 4. 𝑌¹: drawn from a Bernoulli distribution with 𝑝 = 𝑒𝑥𝑝𝑖𝑡 (ln ( ^𝑝^𝑦

1−𝑝_𝑦) + ln(𝑂𝑅_𝑍𝑌) × 𝒁^′+ ln⁡(𝑂𝑅_𝑋𝑌)) 5. 𝑌: set according to realized value of 𝑋 (i.e., if 𝑋 = 0, then 𝑌 = 𝑌⁰, else 𝑌 = 𝑌¹).

In scenarios where covariates were independent of exposure, 𝑂𝑅_𝑋𝑍 = 1.0. In scenarios where covariates were confounders, 𝑂𝑅_𝑋𝑍= 1.2. Throughout, 𝑂𝑅_𝑍𝑌= 1.2, and⁡𝑂𝑅_𝑋𝑌 = 3.0.⁡⁡𝑝_𝑥 was selected to obtain a marginal treatment prevalence of 0.3 (i.e, 𝐸[𝑋] = 0.3). 𝑝𝑦 was selected to obtain a marginal outcome prevalence of 0.3 (i.e, 𝐸[𝑌] = 0.3).

In sensitivity analyses, (1) 𝑂𝑅_𝑋𝑌= 1.5 and (2) 𝐸[𝑋] = 𝐸[𝑌] = 0.15 and length of the covariate vector 𝒁 was 14.

(3)

3 SAS Code for data generation

%macro sim(seed=,scenario=,nsim=,nobs=,ncov=,p_x=,covx=,p_y=,or=,covy=);

*DATA GENERATION;

data a;

call streaminit(&seed.);

scenario = &scenario;

n=&nobs;

do j=1 to &nsim; *trial indicator;

do i=1 to &nobs; *obs within trial;

*Standard normal covariates;

array cov cov1-cov&ncov.;

do over cov;

cov=rand("normal");

end;

*Exposure;

x=rand("bernoulli",1/(1+exp(-1*(log(&p_x/(1-&p_x)) %do t=1 %to &ncov.; + log(&covx)*cov&t. %end; ))));

*Potential outcomes;

y0=rand("bernoulli",1/(1+exp(-1*(log(&p_y/(1-&p_y)) + log(&or)*0 %do t=1 %to &ncov.; + log(&covy)*cov&t.

%end; ))));

y1=rand("bernoulli",1/(1+exp(-1*(log(&p_y/(1-&p_y)) + log(&or)*1 %do t=1 %to &ncov.; + log(&covy)*cov&t.

%end; ))));

*Observed outcome;

if x=0 then y=y0; else y=y1;

output;

end;

run;

%mend;

(4)

4 eFigure 1. Relative error of the log of the odds ratio, comparison of covariate-conditional and marginal estimates at of 2 and 15 events per parameter.

Abbreviations: OR, odds ratio; IPTW, inverse probability of treatment weighting; AIPW, augmented inverse probability weighting

Panel A are results from the scenario where the covariates were predictors of the outcome (OR=1.2) but were independent of the exposure.

Panel B are results from the scenario where the covariates were predictors of both the outcome (OR=1.2) and the exposure (OR 1.2) making them confounders. The events per parameter are 2 and 15, on average, for the outcome regression model used for estimation of the covariate- conditional effect and the g-computation and AIPW marginal effects. The events/parameter of the exposure regression models used for estimation of the IPTW and AIPW marginal effects were, on average, 2.1 and 15.5, respectively. Relative error was calculated using the correct covariate-conditional OR (3.0) for covariate-conditional estimates and the correct marginal OR (2.53) for marginal estimates. Filled circle marks mean relative error (a.k.a. relative bias). Whiskers represent ±1.5 times the intra-quartile range. N=5000.

(5)

5 eTable 1. Sensitivity analysis: Results when covariate-conditional odds ratio set to 1.5.

Covariates independent of exposure Covariates were confounders Events per

parameter Parameter Estimator Relative bias Empirical SE root MSE Relative bias Empirical SE root MSE

15

Conditional MLE 0.026 0.135 0.135 0.026 0.139 0.139

Conditional Firth 0.004 0.132 0.132 0.002 0.135 0.135

Marginal G-computation 0.001 0.111 0.111 0.002 0.117 0.117

Marginal IPTW -0.001 0.112 0.112 0.002 0.129 0.129

Marginal AIPW 0.000 0.112 0.112 -0.002 0.127 0.127

2

Conditional MLE 0.254 0.517 0.527 0.256 0.536 0.546

Conditional Firth MLE 0.021 0.412 0.412 0.025 0.425 0.425

Marginal G-computation 0.009 0.350 0.350 0.029 0.369 0.369

Marginal IPTW -0.017 0.400 0.400 0.069 0.465 0.465

Marginal AIPW -0.003 0.395 0.395 -0.004 0.452 0.452

Abbreviations: SE, standard error; MSE, mean squared error; MLE, maximum likelihood estimation; IPTW, inverse probability of treatment weighting; AIPW, augmented inverse probability weighting

True marginal odds ratio = 1.41. The events per parameter are 2 and 15, on average, for the outcome regression model used for estimation of the covariate-conditional effect and the g-computation and AIPW marginal effects. The events/parameter of the exposure regression models used for estimation of the IPTW and AIPW marginal effects were, on average, 2.1 and 16.1, respectively. Relative error was calculated using the correct covariate-conditional OR for covariate-conditional estimates and the correct marginal OR for marginal estimates. N=5000.

(6)

6 eTable 2. Sensitivity analysis: Results when marginal exposure and outcome prevalence set to 0.15 and number of covariates is 14

Covariates independent of exposure Covariates were confounders Events per

parameter Parameter Estimator Relative bias Empirical SE root MSE Relative bias Empirical SE root MSE

15

Conditional MLE 0.012 0.182 0.182 0.015 0.179 0.179

Conditional Firth -0.001 0.178 0.178 0.002 0.175 0.175

Marginal G-computation -0.001 0.165 0.165 0.002 0.167 0.167

Marginal IPTW -0.002 0.169 0.169 -0.001 0.183 0.183

Marginal AIPW -0.001 0.168 0.168 -0.003 0.182 0.182

2

Conditional MLE 0.120 0.642 0.655 0.136 0.625 0.643

Conditional Firth -0.002 0.541 0.541 0.003 0.528 0.528

Marginal G-computation -0.010 0.512 0.513 0.009 0.510 0.510

Marginal IPTW -0.048 0.599 0.601 -0.020 0.628 0.629

Marginal AIPW -0.032 0.593 0.594 -0.029 0.613 0.614

Abbreviations: SE, standard error; MSE, mean squared error; MLE, maximum likelihood estimation; IPTW, inverse probability of treatment weighting; AIPW, augmented inverse probability weighting

True covariate-conditional odds ratio = 3.0. True marginal odds ratio = 2.79. The events per parameter are 2 and 15, on average, for the outcome regression model used for estimation of the covariate-conditional effect and the g-computation and AIPW marginal effects. The

events/parameter of the exposure regression models used for estimation of the IPTW and AIPW marginal effects were, on average, 2.0 and 16.7, respectively. Relative error was calculated using the correct covariate-conditional OR for covariate-conditional estimates and the correct

marginal OR for marginal estimates.

N=5000 except for following:

Covariates independent of exposure at 2 events/parameter without Firth, N=4998 because of non-converged outcome models

Covariates were confounders at 2 events/parameter without Firth, N=4999 because of non-converged outcome model. N=4998 for AIPW estimate because additional estimate removed with risk <0.