The estimator itself - Methods: The bagged one-to-one matching estimator

4.2 Methods: The bagged one-to-one matching estimator

4.2.1 The estimator itself

Although the bagged one-to-one matching (BOOM) estimator uses the general bagging methodology introduced by Breiman (1996), our specific approach is inspired by the “complex bootstrap” of Austin and Small (2014). While Austin and Small investigated the complex bootstrap as a way to obtain standard error estimates for a simple matching-based estimator, we modify the process to obtain an estimate of treatment effect.

The BOOM process can take many forms. In broad terms, the process consists of six steps, two of which are optional:

1. (optional) Prune the original dataset to remove regions of extreme nonoverlap between groups.

2. Develop the general form of a distance measure (e.g., propensity score model) on the whole (or pruned) sample.

3. Choose a one-to-one matching algorithm.

4. (optional) Choose the general form of an outcome model or modeling approach (covariate adjustment).

5. Resample the data B times, and obtain a treatment effect estimate in each of the B bootstrap resamples, using the general distance measure from Step 2, the matching algorithm from Step 3, and, if desired, the outcome-modeling approach from Step 4.

6. Take the average of the B treatment effect estimates.

We will now go into more detail about the six steps:

4.2.1.1 Step 1 (optional): Pruning

We begin with N_t0 treated subjects and N_c0 control subjects. In many cases, the distributions of the covariates in the two groups will have areas of obvious nonoverlap.

As a first step, “pruning” the data— restricting the sample in order to eliminate these regions of obvious nonoverlap— can reduce model dependence (Ho et al., 2007).

Although one-to-one matching can also be seen as a type of pruning, here we draw a distinction with pruning before matching, particularly because if propensity scores are being used as the distance measure in Step 2, the performance of any proposed propensity score model can be assessed only in regions of the data where common

support exists, and so pruning is an essential step before the development of a reliable propensity score model (King and Zeng, 2007; Ho et al., 2007). After pruning, we have Nt treated subjects and Nc control subjects, for a total of N subjects, and our inference about treatment effects will apply to this pruned sample.

4.2.1.2 Step 2: Development of distance measure

The next step is to decide on a distance measure for use in the matching process.

Commonly used distance measures include propensity scores (Rosenbaum and Rubin, 1983), prognostic scores (Hansen, 2008), and reweighted or generalized Mahalanobis distance (Greevy et al., 2012; Diamond and Sekhon, 2013). Although the distances themselves will be calculated within each bootstrap sample in Step 5, the analyst will fix some element(s) of the calculation process here in Step 2. For example, an analyst who wants to match on propensity scores and to estimate those propensity scores using logistic regression will develop the general form of the propensity score model here in Step 2, and then fit that same general model within each bootstrap sample in Step 5.

4.2.1.3 Step 3: Selection of matching algorithm

BOOM uses one-to-one matching without replacement. Algorithms for one-to-one matching without replacement have several options, many of which are discussed in Stuart (2010); in this step the analyst must choose the type of algorithm (optimal or greedy), and, if a greedy algorithm is chosen, the order in which to process the subjects. A further decision is whether to use a caliper in the matching process.

4.2.1.4 Step 4 (optional): Outcome model

As mentioned in Section 4.1.1, treatment effect estimates can be obtained with or without further covariate adjustment after matching. If the analyst does not want to do any further covariate adjustment, there is nothing to do in this step. If the analyst does want to do further covariate adjustment, this step consists of specifying the general form of the outcome model. Although the current version of the BOOM software limits covariate adjustment to linear models with no treatment interactions, other types of outcome models could certainly be integrated.

4.2.1.5 Step 5: Obtaining bootstrap estimates

Step 5 consists of conducting steps 5a–5d B times, which means that the analyst must first choose B, the number of bootstrap resamples. The larger B is, the

less Monte Carlo error there will be in the BOOM estimator and, in particular, in estimates of its standard error (discussed below in Section 4.2.2). To get reliable standard error estimates, Efron (2014) suggests a method for choosing B using calculations based on the jackknife, and Wager et al. (2014) state that for many bagged estimators, B needs to be Θ(n^1.5) unless a bias-corrected version of the standard error estimate is used, in which case B can be Θ(n). It is unclear whether BOOM falls into this class of estimators; in addition, for any given sample size, the ideal B will depend on the variability of the point estimate from resample to resample. In our simulations (presented below), we found adequate, but not perfect, performance of the bias-corrected standard error estimates using B = 10n. We recommend that researchers working with a single dataset rather than a series of simulations begin by either using Efron’s jackknife-based calculations or by setting B to several times the sample size, and increasing B as feasible from that point until the differences between the bias-corrected and uncorrected standard error estimates are acceptable in the context of the study.

Step 5a: Resample Nt+Nc =N subjects In a departure from Austin and Small’s complex bootstrap (Austin and Small, 2014), resampling in BOOM conditions on the original sizes of the treatment and control group, following the advice of Hesterberg (2014). That is, rather than resampling N subjects without re- gard to treatment group, we resample N_t subjects from the treatment group and N_c subjects from the control group in order to condition on the observed information from the original dataset.

Step 5b: Generate distance measures on the bootstrap sample Here we generate the distance measures using the method developed in Step 2. For example, if using propensity scores estimated via logistic regression, we would keep the general model from Step 2, but fit the model to this particular bootstrap sample.

Step 5c: Create a matched sample In this step we use the matching algorithm selected in Step 3 and the resample-specific distance measures from Step 5b to create a matched sample. Depending on whether a caliper is specified (and exceeded) or not, this step will select all or some of the treated subjects in the bootstrap sample and match each to a single control subject. Note that a unique treated subject from the original sample could appear in the bootstrap sample more than once, and then also enter into the matched sample more than once, with each appearance matched to a different control subject, or even to

the same control subject if that control subject also appears in the bootstrap sample more than once.

Step 5d: Estimate and record the treatment effect in the matched sample As discussed in Section 4.1.1, there are several ways to estimate the treatment effect in a matched sample. If in Step 4, the analyst decided not to do further covariate adjustment after matching, the treatment effect estimate is simply the mean of the differences in outcome for the matched pairs, or equivalently the difference in group means for the two groups. On the other hand, if in Step 4, the analyst specified an outcome model, in this step the model will be fit to the matched sample, and the treatment effect estimate obtained using either the coefficient estimate for the treatment term (if no treatment interactions are used) or one of the methods described in Schafer and Kang (2008) and Imbens (2004).

4.2.1.6 Step 6: Aggregate to get the bagged estimate

The BOOM estimate is simply the average of the B treatment effect estimates.

In most cases, the analyst will also want to generate estimates of the standard error and confidence intervals in this step; more information follows.

Dalam dokumen The Average Treatment Effect on the Evenly Matchable Units (Halaman 60-63)