Directory UMM :Data Elmu:jurnal:J-a:Journal of Econometrics:Vol99.Issue2.Dec2000:

(1)

Simple resampling methods for censored

regression quantiles

Yannis Bilias

!,"

, Songnian Chen

#,

*

, Zhiliang Ying

$

!Departments of Economics and Statistics, 266 Heady Hall, Iowa State University, Ames, IA 50011, USA

"Department of Economics, University of Cyprus, P.O. Box 20537, 1678 Nicosia, Cyprus #Department of Economics, Hong Kong University of Science and Technology, Clear Water Bay,

Kowloon, Hong Kong

$Department of Statistics, Hill Center, Busch Campus, Rutgers University, Piscataway, NJ 08855, USA

Received 1 June 2000; accepted 25 June 2000

Abstract

Powell (Journal of Econometrics 25 (1984) 303}325; Journal of Econometrics 32 (1986) 143}155) considered censored regression quantile estimators. The asymptotic covariance matrices of his estimators depend on the error densities and are therefore di$cult to estimate reliably. The di$culty may be avoided by applying the bootstrap method (Hahn, Econometric Theory 11 (1995) 105}121). Calculation of the estimators, however, requires solving a nonsmooth and nonconvex minimization problem, resulting in high computational costs in implementing the bootstrap. We propose in this paper computa-tionally simple resampling methods by convexfying Powell's approach in the resampling stage. A major advantage of the new methods is that they can be implemented by e$cient linear programming. Simulation studies show that the methods are reliable even with moderate sample sizes. ( 2000 Elsevier Science S.A. All rights reserved.

JEL classixcation: C14; C24

Keywords: Censored regression quantiles; Least absolute deviation; Linear program-ming; Resampling

*Corresponding author. Tel.:#852-2358-7602; fax:#852-2358-2084. E-mail address:[email protected] (S. Chen).

(2)

1. Introduction

Censored quantile regression models proposed by Powell (1984, 1986) have attracted a great deal of interest in the recent literature, particularly due to their robustness to distributional misspeci"cation of the error term and unknown conditional heteroscedaticity. For empirical applications, see Buchinsky (1994), Chamberlain (1994), Chay and HonoreH (1998), Horowitz and Neumann (1987), and Melenberg and van Soest (1996), among others. A survey of the subject may be found in Buchinsky (1998).

In order to carry out valid statistical inferences for censored quantile regres-sion models, it is necessary to provide consistent estimators for the asymptotic variance}covariance matrices. However, the asymptotic variance}covariance matrices are di$cult to estimate reliably since they involve conditional densities of error terms. For the uncensored counterpart, Parzen et al. (1994) (PWY) proposed a resampling method that allows approximation of the distribution of regression quantile estimators to avoid density estimation. Their algorithm requires only repeated executions of the same linear programming algorithm (see, e.g., Koenker and Bassett, 1978a) and is computationally simple and e$cient. Hahn (1995), on the other hand, showed that the usual bootstrap approximation is also valid for both the uncensored and censored regression quantiles, and Buchinsky (1995) provided some simulation evidence. In prin-ciple, the bootstrap approximation can be used to approximate distribution of Powell's regression quantile estimators for censored model. In practice, how-ever, implementation of the bootstrap can be extremely time consuming as each bootstrap replication involves solving a nonlinear and nonconvex minimization problem and accurate approximation with the resampling methods typically requires large number of such replications. This could lead to potentially prohibitive computational costs in empirical applications.

The main objective of this paper is to propose computationally e$cient methods for approximating the distribution of Powell's estimator through certain resampling methods. This is accomplished via convexifying the objective function of Powell (1984) so that the standard linear programming algorithm can be used. Section 2 introduces our modi"ed bootstrap method and a similar modi"cation of the resampling method of Parzen et al. (1994). In Section 3, it is shown that both the resampling methods are asymptotically correct. Some simulation results are presented in Section 4. Section 5 contains some conclud-ing remarks.

2. Resampling methods for censored median regression

Consider the censored regression model

y

(3)

wheree

i are error terms whose conditional medians given byxi are 0,xi are vectors of independent variables andb₀is a conformable vector of true regres-sion parameters. Powell (1984) recently proposed (CLAD) to estimateb₀ by

bK"arg min

b n

+

i/1

Dy

i!maxMx@ib, 0ND. (2)

Under certain regularity conditions, he showed thatbK is consistent and asymp-totically normal in the sense that

Jn(bK!_b

0)P$ N(0,14J~10 <J~10 ), (3) where J

0"E[f(0Dx)xx@I(x@b0'0)] and <"E[xx@I(x@b0'0)] with f()Dx)

being the conditional density ofegivenx.

Note that the objective function in (2) is a nonconvex and nonsmooth function and, consequently, its minimizer is di$cult to calculate. A number of approaches have been developed for dealing with this problem. Womersley (1986) modi"ed the reduced-gradient algorithm for the linear programming and his method is aimed at "nding a local minimizer of the objective function. Buchinsky (1994) proposed an iterative linear programming algorithm, and Fitzenberger (1997) proposed an algorithm called BRECNS by adapting the standard Barrodale}Roberts Algorithm (Barrodale and Roberts, 1973). But, their methods do not guarantee convergence to global minimum. In addition, an interior point algorithm was developed by Koenker and Park (1996), but their method again does not guarantee convergence to the global minimizer. More recently, Fitzenberger and Winker (1999) proposed a new algorithm based on threshold accepting (TA), which requires a large number of iterations in its implementation. Even though their algorithm guarantees asymptotic conver-gence to the global optimum and a high probability of ending in a high-quality local optimum for a"nite number of iterations, it is computationally expensive.

Note that the limiting variance}covariance matrix of bK involves the condi-tional error density and thus can be di$cult to estimate reliably. To avoid direct estimation of the error density, we can in principle apply the bootstrap method. Hahn (1995) provided theoretical justi"cation for the bootstrap procedure. Buchinsky (1995) provided simulation evidence on the performance of the bootstrap in estimating the variance of quantile regression models. However, implementation of the bootstrap method requires the solving of nonconvex and nonsmooth minimization problem repeatedly, which could be prohibitively expensive computationally, especially when global maximization procedures, such as that of Fitzenberger and Winker (1999), are used.

(4)

1For Eqs. (6) and (8), the left-hand sides could be O

1(1) away from 0 due to its discontinunity in

b(see, e.g., Koenker and Bassett, 1978a, b; Powell, 1984).

estimator can be computed through the linear programming in the same way as that for the case of uncensored median regression.

Following the arguments in Powell (1984) and Pollard (1990), it is straightfor-ward to show that Powell's estimator is asymptotically equivalent to the solution of

The minimization problem in (4) is a convex one and can be easily solved by the linear programming. However, this objective function is not realizable because

b₀, or even a consistent estimator of it, is not available at that stage. Although the preceding convex modi"cation of the objective function cannot be used to obtain an approximate solution to Powell's estimator bK, it is suitable for obtaining the bootstrap estimators in the resampling stage. Speci"cally, let

(yH_i, xH_i), i"1,2,n, be a bootstrap sample from M(y_i, x_i),i"1,2,nN, de"ne

modi"ed bootstrap estimator

bKH"arg min

Note that in the resampling stage,bK has already been computed. The minimiz-ation is over a convex function and can be easily executed via the linear programming. Because we need to have a large number of such bKH in the resampling stage, the computational savings could be substantial. The convexifying idea used here is similar to that of Buchinsky and Hahn (1998), Chernozhukov and Hong (1999), and Khan and Powell (1998), which modify Powell's estimator.

Once a large number ofbKHare computed, conditional distribution ofbKH!bK can be approximated accurately. We will show in the next section that, under suitable regularity conditions, the conditional distribution scaled byJn con-verges to the limit of the unconditional distribution ofJn(bK!b₀).

Our second resampling method is based on the observation that the estima-ting equation corresponding to (4) is pivotal, extending the resampling method by Parzen et al. (1994) for the uncensored median regression. More speci"cally, the estimating function corresponding to (4) takes the form1

(5)

which is conditionally pivotal at b"b₀ in the sense that its conditional

iare i.i.d. Bernoulli trials with success probability 1/2. Our approach to extend the PWY method to censored data is to view Powell's estimatorbK as an approximate solution to the estimating equation based on (6) and, in the resampling stage, to substitute the unknownb₀ bybK. Following Parzen et al. (1994), we generate MB

i,i"1,2,nN, a sequence of i.i.d. Bernoulli random

variables with success probability1₂. Denote bybKHHas the solution to

n Parzen et al. (1994), this resampling method works for the median regression with stochastic as well as deterministic regressors.

The main advantage ofbKHHis computational. Following Parzen et al. (1994), we can computebKHHby minimizing a suitable¸

1-objective function that can be solved by linear programming. Speci"cally,bKHHis a minimizer of the following ¸

Parzen et al. (1994) for details.

To approximate the distribution ofJn(bKHH!_b_K), one can simply simulate a large number of ;_H_'_{s, say} ;_H

1,2,;HM, by generating M sequences i.i.d. Bernoulli random variables. For each;_H

j, solve (9) to getbKHj. Then the empirical distribution ofbKHH!_b_K from thebKHH

j should be close to the true distribution of

bK!_b

0, as will be shown in the next section.

3. Large sample properties

In this section, we show that the proposed resampling methods are asymp-totically valid. We make the following assumptions.

Assumption 1. The true value of the parameterb₀is in the interior of a known

(6)

Assumption 2. Conditional medians of the e

i given the xi are uniquely equal to 0.

Assumption 3. P(x@dO0Dx@b₀'0)'0, for dO0, P(x@b₀"0)"0 and

EDDxDD2(R.

Assumption 4. E[f(0Dx)xx@I(x@b₀'0)] is nonsingular.

Assumption 5. The conditional density f()Dx) is continuous and uniformly

bounded.

The assumptions are standard for the desirable properties of Powell's es-timator to hold and are similar to those made in Powell (1984), Newey and McFadden (1994) and Pollard (1990). In particular, Assumption 1 is a standard assumption for an estimator de"ned through an optimization procedure with a nonconvex objective function. Assumption 2 is essential forb₀to be uniquely de"ned; it is implied by the assumption that f(0Dx) is nonzero and f(tDx) is continuous att"0. Assumptions 3 and 4 are crucial to ensure consistency and asymptotic normality of Powell's estimator. Assumption 5 on the conditional density function is imposed to facilitate Taylor's expansions used in the proofs.

Theorem 1. Under Assumptions1}5given above,the distribution for the modixed

bootstrap estimatorbKH, P(Jn(bKH!_b_K ))zDx

i,yi,i"1,2,n),is consistent in the

sense that

P(Jn(bKH!_b_K))zDx

i,yi, i"1,2,n)!P(Jn(bK!b0))z)P0

in probability for every z.

Proof. LetF

ndenote the empirical distribution formed from (x1,y1),2, (xn,yn) and EH_n(PH_n) the corresponding expectation (probability measure) with respect to F

n. So (xHi,yHi) are i.i.d. withFn as their common distribution. De"ne objective function

GH n(b,b)"

1 n

n

+

i/1 [(DyH

i!xHi@bD!DyHi!xHi@bD])I(xHi@b'0).

LetG

n(b,b)"EHnGHn(b,b) andG(b,b)"EGn(b,b). In view of Assumptions 3 and

4, the class of functions,M(Dy!x@bD!Dy!x@bD)I(x@b'0):b,b3BN, is Euclidean with an integrable envelope in the sense of Pakes and Pollard (1989). Thus, by the uniform law of large numbers (Pollard, 1990),

G

(7)

Further, by the bootstrap uniform law of large numbers (GineH and Zinn, 1990, Theorem 2.6),

GH

n(b,b)!Gn(b,b)"oB(1) uniformly in (b,b),

where, for a random sequence v

n constructed from the bootstrap sample, v

n"oB(1) if for everyg'0, PHn(DvnD*g)P0 in probability (Hahn, 1995). Since

bK is consistent andGis continuous, it follows that

GH

Next, we show that the bootstrap distribution is asymptotically correct. Let

*

(8)

So the maximal inequality of Pollard (1990, p. 38) can be applied to get, for some

asnPRandrP0, where the last equality follows from Assumptions 3 and 5. Thus, uniformly inbandb over shrinking neighborhoods ofb₀,

n

uniformly in b and b over shrinking neighborhoods of b₀. But bK, Powell's estimator, satis"es+x

isgn(yi!xibK)I(xibK'0)"o1(Jn). Hence,

G

n(b,bK)"(b!bK)@J0(b!bK)#o1

A

Db!_b_K_D

Jn #Db!bKD2

B

.

Exactly the same arguments as those for (12) lead to

n

Since, conditional on M(y

i, xi),i"1,2,nN, =Hn(bK) is a sum of i.i.d. random

variables with mean 0 and variance}covariance matrix converging to

<"E[xx@I(x@b₀'0)], we know that the conditional distribution of =_H

(9)

converges to N(0,<_{) in probability. Thus, an application of Lemma A.1 below}

withh"b!bK andh

n"bKH!bK givesbKH!bK"OB(1/Jn). A further

applica-tion of Lemma A.2 results in

Jn(bKH!bK)"!1

2J~10 =Hn(bK)#oB(1),

whose conditional distribution clearly converges in probability to N(0, J~1₀ <_J_~1

0 ). Hence Theorem 1 holds. h

Theorem 2. Under Assumptions1}5,

P(Jn(bKHH!bK))zDx

i,yi,i"1,2,n)!P(Jn(bK!b0))z)P0

in probability for everyz3Rp`1.

Proof. The derivation given of Parzen et al. (1994, Appendix) can be very much

carried over to derive our result. So we will only sketch a few key steps in our proof. By Assumptions 1}4 and in view of (8), we can derive an asymptotic linearity result similar to (A1.1) in Parzen et al. (1994), i.e.,

!;_H"₊n

i/1 f(0Dx

i)xix@iI(x@ibK'0)(bKHH!bK)#o(nDbKHH!bKD#Jn)

"J

0n(bKHH!bK)#o(nDbKHH!bKD#Jn). Therefore, we have the following representation:

Jn(bKHH!bK)"!J~1₀ 1

Jn

;_H#o(1),

which clearly converges to N(0,J~1₀ <_J~1

0 /4), the same limiting distribution as that ofJn(bK!b₀). Hence the theorem holds. _h

4. A simulation study

Monte Carlo experiments were carried out to assess the performance of our methods. For the censored regression model, data are generated as follows in all of the designs:

y"maxM1#x

1b1#x2b2#e, 0N, (13) where x

1 is a Bernoulli random variable centered at zero with a success probability 1₂, x

(10)

N(0, 1) (Standard Normal), a heteroscedastic normal (1#x

2)*N(0, 1) (Hetero-scedastic Normal) and a normal mixture 0.75_*N(0, 1)#0.25_*N(0, 4) (Normal Mixture). As a result, the censoring level is approximately 30% for all the designs.

Here we investigate the "nite sample performance of three resampling methods}the direct heteroscedastic bootstrap method (Bootstrap), our

modi-"ed bootstrap (M-Bootstrap) and the resampling method (Resampling) based on Eq. (8), where Powell's CLAD estimator is chosen as the"rst-step estimator

bK. For that purpose, 1000 random samples are drawn with sample size equal to 100 according to Eq. (13). For each random sample M(y

i,xi), i"1,2, 100N,

1000 bootstrap samples are drawn and, accordingly, 1000 bootstrap estimates and 1000 modi"ed bootstrap estimates are obtained. In addition, 1000 replicates of;_H_{, the weighted sum of independent and centered Bernoulli variables in (8),} are drawn and their corresponding resampling estimates are obtained. The standard (S) and percentile (P) methods (Efron and Tibshirani, 1993) are then used to construct con"dence intervals of the regression coe$cient of the con-tinuous regressorx

2using the three methods. The empirical coverage probabil-ities are summarized in Table 1. All the three methods appear to give reasonable coverage probabilities and are comparable.

All the resampling methods are implemented in Gauss code, and each of our methods is about eight times faster than the direct bootstrap. Powell's CLAD was implemented through iterative linear programming (Buchinsky, 1994), which behaves reasonably well for the designs under consideration here. It is worth pointing out that, in general, there is no guarantee that the iterative linear programming or other locally based search method would converge, or con-verge to the global minimum. In cases where global minimization methods such as that of Fitzenberger and Winker (1999) are called for and/or a large number of bootstrap replicates are needed for accurate statistical inference (Efron and Tibshirani, 1993), the computational advantage of our resampling methods will be even more signi"cant.

5. Concluding remarks

(11)

Table 1

Empirical coverage probabilities (ECP) for con"dence intervals

Bootstrap Resample M-Bootstrap

Con"dence level ECP ECP ECP

(a) Standard normal

0.95 S 0.956 0.929 0.912

P 0.974 0.952 0.943

0.90 S 0.909 0.878 0.863

P 0.941 0.900 0.886

0.85 S 0.868 0.830 0.812

P 0.906 0.846 0.833

(b) Normal mixture

0.95 S 0.957 0.936 0.926

P 0.975 0.938 0.935

0.90 S 0.923 0.892 0.875

P 0.941 0.878 0.872

0.85 S 0.879 0.843 0.824

P 0.901 0.829 0.822

(c) Heteroscedastic normal

0.95 S 0.963 0.950 0.946

P 0.966 0.948 0.943

0.90 S 0.922 0.906 0.896

P 0.925 0.898 0.887

0.85 S 0.887 0.859 0.851

P 0.868 0.838 0.832

(Hahn, 1995), which convexi"es the nonconvex objective function so that the e$cient linear programming algorithm can be used to compute bootstrap estimates. The other is a similar modi"cation of a resampling method proposed by Parzen et al. (1994). Again the computation of resampling estimates can be achieved via linear programming.

(12)

Acknowledgements

We would like to thank two anonymous referees and an associate editor, as well as Bo HonoreH, Shakeeb Khan, Roger Koenker, Lungfei Lee and Jim Powell for their helpful comments. All remaining errors are ours.

Appendix

The following lemmas are bootstrap analogues of Theorems 1 and 2 of Sherman (1993). For a weakly consistent bootstrap estimatorh_n"o

B(1), de"ned by a minimization procedure, Lemma A.1 provides conditions under which Jnh

n"OB(1), while Lemma A.2 strengthens this Jn-consistency result to

weak convergence provided there exists a very good quadratic approximation to the objective function within O

B(n~1@2) neighborhoods of 0. The proofs of the theorems are similar to Sherman (1993), thus omitted here.

Lemma A.1. Let h

n be a minimizer of CHn(h). Suppose that hn"oB(1), and that

uniformly over o(1)neighborhoods of 0,

C_H

B(1). Suppose 0 is an interior point of H and that uniformly over

O

randomvector in probability. Then

Jnh

n"!<~1ZHn#oB(1).

References

Barrodale, I., Roberts, F., 1973. An improved algorithm for discretel

(13)

Buchinsky, M., 1994. Changes in the U.S. wage structure 1963}1987: application of quantile regression. Econometrica 62, 405}458.

Buchinsky, M., 1995. Estimating the asymptotic covariance matrix for quantile regression models: a Monte Carlo study. Journal of Econometrics 68, 303}338.

Buchinsky, M., 1998. Recent advances in quantile regression models. The Journal of Human Resources 27, 88}126.

Buchinsky, M., Hahn, J., 1998. An alternative estimator for the censored quantile regression model. Econometrica 66, 653}671.

Chamberlain, G., 1994. Quantile regression, censoring, and the structure of wages. In: Sims, C. (Ed.), Advances in Econometrics, Sixth World Congress, Vol. 1. Cambridge University Press, New York, pp. 171}209.

Chay, K., HonoreH, B., 1998. Estimation of semiparametric censored regression models: an applica-tion to changes in black}white earnings inequality during the 1960s. The Journal of Human Resources 27, 4}38.

Chernozhukov, V., Hong, H., 1999. Censored quantile regression with the extreme quantile estimate of the unknown censoring points, unpublished manuscript.

Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman & Hall, London.

Fitzenberger, B., 1997. A guide to censored quantile regressions. In: Maddala, G.S., Rao, C.R. (Eds.), Handbook of Statistics, Robust Inference, Vol. 15. North-Holland, Amsterdam, pp. 405}437.

Fitzenberger, B., 1998. The moving blocks bootstrap and robust inference for linear least squares and quantile regression. Journal of Econometrics 2, 235}288.

Fitzenberger, B., Winker, P., 1999. Improving the computation of censored quantile regressions, unpublished manuscript.

GineH, E., Zinn, J., 1990. Bootstrapping general empirical measures. The Annals of Probability 18, 851}869.

Hahn, J., 1995. Bootstrapping quantile regression estimators. Econometric Theory 11, 105}121. Horowitz, J.L., Neumann, G.R., 1987. Semiparametric estimation of employment duration models.

Econometric Reviews 6, 5}40.

Khan, S., Powell, J., 1998. 2-step estimation of semiparametric censored regression models, unpub-lished manuscript.

Koenker, R., Bassett, G.S., 1978a. Regression quantiles. Econometrica 46, 33}50.

Koenker, R., Bassett, G.S., 1978b. The asymptotic distribution of the least absolute error estimator. Journal of the American Statistical Association 73, 618}622.

Koenker, R., Bassett, G.S., 1982. Robust tests for heteroscedasticity based on regression quantiles. Econometrica 50, 43}61.

Koenker, R., Park, B.J., 1996. An interior point algorithm for nonlinear quantile regression. Journal of Econometrics 71, 265}283.

Melenberg, B., van Soest, A., 1996. Parametric and semiparametric modeling of vacation expendi-tures. Journal of Applied Econometrics 11, 59}76.

Newey, W.K., McFadden, D., 1994. Large sample estimation and hypothesis testing. In: Engle, R., McFadden, D. (Eds.), Handbook of Econometrics, Vol. 4. North-Holland, Amsterdam, pp. 2111}2245.

Parzen, M.I., Wei, L.J., Ying, Z., 1994. A resampling method based on pivotal estimation functions. Biometrika 81, 341}350.

Pakes, A., Pollard, D., 1989. Simulation and the asymptotics of optimization estimators. Econo-metrica 57, 1027}1057.

(14)

Powell, J., 1984. Least absolute deviations estimation for the censored regression model. Journal of Econometrics 25, 303}325.

Powell, J., 1986. Censored Regression Quantiles. Journal of Econometrics 32, 143}155.

Sherman, R.P., 1993. The Limiting Distribution of the Maximum Rank Correlation Estimator. Econometrica 61, 123}137.

Womersley, R.S., 1986. Censored discrete linearl