Supplement:
Supplement Figure 1 :Cohort Identification
Supplement Table 1 Definitions of Chemotherapy from SEER-Medicare data
ICD-9 diagnosis codes V58.1, V66.2, V67.2 ICD-9 procedure codes 99.25
HCPCS codes J9190 (Fluorouracil)
J0640 (Leucovorin)
J9263 (Oxaliplatin)
J8520 J8521 (Capecitabine) Q0084, Q0085
G0355-G0363
C8953 C8954 C8955 S9329 S9330 S9331 96400-96599
Revenue Center 0331 0332 0335
Instrumental variable regression analysis
Cox models are nonlinear models, therefore, conventional two-stage least squares (2SLS) procedure is not appropriate. Instead, we implemented the two-stage residual inclusion (2SRI) method. The 2SRI method has been shown to have positive properties when estimating treatment effects using nonlinear regression methods
1,2. As with 2SLS, this method requires a two-stage estimation approach. The process includes two
stages: The first stage is based on the likelihood of receiving treatment with instruments and other exogenous factors; the second-stage is the estimation of treatment on
survival while including the residuals from the first stage and other exogenous factors.
Specifically, the first stage is:
AC = probit ( Wα )+Xu
where AC=AC, α denotes the regression parameters, + ¿ X
0W
¿W =¿
and
+¿= [ Instruments ]
W
¿. X
0represents the other observable confounders The residuals from this regression are:
^ Xu = AC − probit (W α ^ )
where α ^ is the column of consistently estimated parameters. ^ Xu denotes the difference between the actual value of the treatment choice and the predicted probability generated by previous probit model.
The second stage is:
h(t )=h
0(t )exp ( AC ^ β
AC+ X
0β ^
0+ ^ Xu ^ β
u)
where ^ β
ACis the consistent estimation of the effect of AC on survival outcome. While this approach can generate asymptotically unbiased estimates of the true coefficient values, the standard errors cannot be obtained directly from the output based on a statistical package. We used a bootstrapping approach to approximate the
asymptotically correct standard errors for coefficients (500 replications).
Supplement Table 2: Comparison of samples grouped by median value of instrumental variables
Health Service Area Provider
Total N=16,316
Below Median N=8,186
Above Median N=8,130
Chi- square statistic
Total N=16,316
Below Median N=8,257
Above Median N=8,059
Chi- square statistics
Age at diagnosis 0.537 0.602
65-74 4845(29.70%) 2413(29.50%) 2432(29.90%) 4845(29.70%) 2481(30.00%) 2364(29.30%)
75-85 7445(45.60%) 3723(45.50%) 3722(45.80%) 7445(45.60%) 3745(45.40%) 3700(45.90%)
86+ 4026(24.70%) 2050(25.00%) 1976(24.30%) 4026(24.70%) 2031(24.60%) 1995(24.80%)
NCI Comorbidity Index 0.000 0.000
0 7525(46.10%) 3915(47.80%) 3610(44.40%) 7525(46.10%) 3960(48.00%) 3565(44.20%)
1-3 5938(36.40%) 2958(36.10%) 2980(36.70%) 5938(36.40%) 2938(35.60%) 3000(37.20%)
>3 2853(17.50%) 1313(16.00%) 1540(18.90%) 2853(17.50%) 1359(16.50%) 1494(18.50%)
Gender 0.145 0.001
Male 7024(43.00%) 3478(42.50%) 3546(43.60%) 7024(43.00%) 3451(41.80%) 3573(44.30%)
Female 9292(57.00%) 4708(57.50%) 4584(56.40%) 9292(57.00%) 4806(58.20%) 4486(55.70%)
Marital status 0.000 0.002
Married 7754(47.50%) 3906(47.70%) 3848(47.30%) 7754(47.50%) 3893(47.20%) 3861(47.90%)
Single, Separated, Divorced 2406(14.70%) 1187(14.50%) 1219(15.00%) 2406(14.70%) 1248(15.10%) 1158(14.40%)
Windowed 5493(33.70%) 2818(34.40%) 2675(32.90%) 5493(33.70%) 2824(34.20%) 2669(33.10%)
Unknown 661(4.10%) 274(3.30%) 387(4.80%) 661(4.10%) 291(3.50%) 370(4.60%)
Race 0.000 0.000
White 14318(87.80%) 7288(89.00%) 7030(86.50%) 14318(87.80%) 7227(87.50%) 7091(88.00%)
Black 1286(7.90%) 528(6.50%) 758(9.30%) 1286(7.90%) 614(7.40%) 672(8.30%)
Other 689(4.20%) 355(4.30%) 334(4.10%) 689(4.20%) 400(4.80%) 289(3.60%)
Unknown 23(0.10%) 15(0.20%) 8(0.10%) 23(0.10%) 16(0.20%) 7(0.10%)
Poverty Level 0.546 0.000
0%-<5% poverty 4058(24.90%) 2020(24.70%) 2038(25.10%) 4058(24.90%) 1871(22.70%) 2187(27.10%) 5% to <10% poverty 4405(27.00%) 2186(26.70%) 2219(27.30%) 4405(27.00%) 2171(26.30%) 2234(27.70%) 10% to <20% poverty 4663(28.60%) 2362(28.90%) 2301(28.30%) 4663(28.60%) 2440(29.60%) 2223(27.60%) 20% to 100% poverty 2898(17.80%) 1460(17.80%) 1438(17.70%) 2898(17.80%) 1606(19.50%) 1292(16.00%)
Unknown 292(1.80%) 158(1.90%) 134(1.60%) 292(1.80%) 169(2.00%) 123(1.50%)
Urban/Rural 0.000 0.000
Big Metro 8613(52.80%) 3308(40.40%) 5305(65.30%) 8613(52.80%) 3773(45.70%) 4840(60.10%)
Metro 4705(28.80%) 2955(36.10%) 1750(21.50%) 4705(28.80%) 2491(30.20%) 2214(27.50%)
Urban 1038(6.40%) 744(9.10%) 294(3.60%) 1038(6.40%) 685(8.30%) 353(4.40%)
Less Urban 1584(9.70%) 948(11.60%) 636(7.80%) 1584(9.70%) 1059(12.80%) 525(6.50%)
Rural 375(2.30%) 230(2.80%) 145(1.80%) 375(2.30%) 248(3.00%) 127(1.60%)
Region 0.000 0.000
West 6199(38.00%) 3407(41.60%) 2792(34.30%) 6199(38.00%) 3579(43.30%) 2620(32.50%)
South 3986(24.40%) 1810(22.10%) 2176(26.80%) 3986(24.40%) 1889(22.90%) 2097(26.00%)
Midwest 2512(15.40%) 826(10.10%) 1686(20.70%) 2512(15.40%) 1233(14.90%) 1279(15.90%)
Northeast 3619(22.20%) 2143(26.20%) 1476(18.20%) 3619(22.20%) 1556(18.80%) 2063(25.60%)
Tumor Location 0.518 0.522
Left-sided colon cancer 4903(30.10%) 2441(29.80%) 2462(30.30%) 4903(30.10%) 2500(30.30%) 2403(29.80%) Right-sided colon cancer 11413(69.90%) 5745(70.20%) 5668(69.70%) 11413(69.90%) 5757(69.70%) 5656(70.20%)
Tumor Stage 0.123 0.076
IIA 14340(87.90%) 7153(87.40%) 7187(88.40%) 14340(87.90%) 7216(87.40%) 7124(88.40%)
IIB 1100(6.70%) 580(7.10%) 520(6.40%) 1100(6.70%) 592(7.20%) 508(6.30%)
IIC 876(5.40%) 453(5.50%) 423(5.20%) 876(5.40%) 449(5.40%) 427(5.30%)
Tumor grade 0.000 0.011
well differentiated 1194(7.30%) 604(7.40%) 590(7.30%) 1194(7.30%) 624(7.60%) 570(7.10%)
moderately differentiated 11570(70.90%) 5717(69.80%) 5853(72.00%) 11570(70.90%) 5865(71.00%) 5705(70.80%) poorly differentiated 2943(18.00%) 1531(18.70%) 1412(17.40%) 2943(18.00%) 1478(17.90%) 1465(18.20%)
undifferentiated 296(1.80%) 182(2.20%) 114(1.40%) 296(1.80%) 122(1.50%) 174(2.20%)
not determined 313(1.90%) 152(1.90%) 161(2.00%) 313(1.90%) 168(2.00%) 145(1.80%)
Lymph node examined 0.681 0.000
<12 lymph node examined 4929(30.20%) 2485(30.40%) 2444(30.10%) 4929(30.20%) 2784(33.70%) 2145(26.60%) >=12 lymph node examined 11387(69.80%) 5701(69.60%) 5686(69.90%) 11387(69.80%) 5473(66.30%) 5914(73.40%)
Supplement table 3: E-value
E-Value (Point Estimator)
E-Value (Confidence Interval Estimator)
AC 1.49 1.32
Gender (Ref: male)
Female 1.79 1.69
Marital (Ref: Not married)
Married 1.57 1.47
Tumor Stage (Ref: Stage IIB/C)
Stage IIA 2.06 1.93
Tumor site (Ref: Left)
Right 1.25 1.08
# of lymph nodes examined: (Ref: < 12 )
s >= 12 1.59 1.49
Tumor Grade (Ref: well/moderately)
poorly/not 1.30 1.15
Poverty level (Ref:5%-100% poverty )
0%-5% poverty 1.29 1.14
References
1. Terza JV, Basu A, Rathouz PJ: Two-stage residual inclusion estimation: addressing endogeneity in health econometric modeling. Journal of health economics 27:531-543, 2008
2. Wan F, Small D, Bekelman JE, et al: Bias in estimating the causal hazard ratio when using
two stage instrumental variable methods. Statistics in medicine 34:2235-2265, 2015 ‐