• Tidak ada hasil yang ditemukan

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 073500104000000695

N/A
N/A
Protected

Academic year: 2017

Membagikan "Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 073500104000000695"

Copied!
4
0
0

Teks penuh

(1)

Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=ubes20

Download by: [Universitas Maritim Raja Ali Haji] Date: 12 January 2016, At: 23:47

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Comment

Han Hong

To cite this article: Han Hong (2005) Comment, Journal of Business & Economic Statistics, 23:2, 158-160, DOI: 10.1198/073500104000000695

To link to this article: http://dx.doi.org/10.1198/073500104000000695

Published online: 01 Jan 2012.

Submit your article to this journal

Article views: 22

(2)

158 Journal of Business & Economic Statistics, April 2005

Comment

Han HONG

Department of Economics, Duke University, Durham, NC 27708 (hanhong@econ.duke.edu)

I am grateful for the opportunity to comment on this very insightful and interesting article by Professors Abowd and Vilhuber. They have addressed a very important problem of measurement errors in administrative data on the U.S. Census Bureau’s quarterly workforce indicators. They have devised and implemented novel statistical matching techniques to correct for errors in the unemployment insurance data that result from coding mistakes in the social security numbers (SSNs) of in-dividual observations. Using these statistical algorithm the au-thors are able to detect a significant amount of coding mistakes in SSNs and fill in many holes for observed employment dura-tion spells and wage data.

1. THE PROBLEM OF MEASUREMENT ERRORS

This article makes a significant contribution to one of the fundamental problem in empirical economics, the problem of measurement errors. The problem of measurement errors is a classical one in linear and nonlinear models of econometrics and statistics. Reviews of the commonly used techniques have been given by Fuller (1987) and Carroll, Ruppert, and Stefanski (1995). Econometric work on the classical independent additive measurement error model dates back to Frish (1934), who de-rived bounds on the slope and the constant term in linear regres-sion with measurement error. The instrumental variables (IV) method is a popular method for obtaining consistent estimators of the parameters of interest in linear models with classical in-dependent additive measurement error. In nonlinear regression models, Hausman, Ichimura, Newey, and Powell (1991) gen-eralized this IV method to polynomial functions in the pres-ence of double measurements on the mismeasured variables. Li (2002) and Schennach (2004) presented methods for non-linear regression models with classical measurement error and double measurements (see also Newey 2001; Hsiao and Wang 1995). Hong and Tamer (2003) and Taupin (2001) used distrib-utional assumptions on the measurement error to obtain a sim-ple estimator in nonlinear models when no auxiliary data are present. Chesher (1991) presented useful approximation meth-ods to the true distribution and parameters of interest. Most of those authors imposed the classical errors-in-variables assump-tion. In cases where the misclassified variable is a response vari-able, Horowitz and Manski (1995) and Abrevaya, Hausman, and Scott-Morton (1998) examined the effect of a mismeasured left-side binary variable.

There is an important concern, however, in the recent applied economics literature about the validity of the classical measure-ment error assumption. Without a careful validation study, it may be difficult to tell whether or not the measurement error is correlated with some of the true variables in the dataset. For ex-ample, Bound and Krueger (1991) compared reported income in the 1977 wave of the CPS with the social security match file

for respondents who provide a valid SSN. They found that mea-surement error is correlated with the true variable, especially in the tails of the income distribution. Bound, Brown, Duncan, and Rodgers (1994) used a validation study on one large firm from the PSID and found evidence against the classical measure-ment error model and especially against the independence as-sumption between the measurement error and the true variable. A good account of the impact of nonclassical measurement er-ror on inference in econometric models was given by Bound, Brown, and Mathiowetz (2001). Recent works by Carroll and Wand (1991), Sepanski and Carroll (1993), Lee and Sepanski (1995). Carroll and Wand (1991) and Chen, Hong, and Tamer (2003) proposed econometric estimators for nonlinear models that are robust against the presence of nonclassical measure-ment errors. Chen et al. (2003) also developed a semiparametric efficient estimator that optimally combines the information in the primary dataset and in the validation dataset, which is also allowed to come from a stratified sample. Mahajan (2002) stud-ied the problem of parameter identification and estimation in single-index models with a misclassified binary regressor where the measurement error may be correlated with the regressors in the index.

2. THE SIGNIFICANCE OF CORRECTIONS TO UI DATA

Many of the new estimators require the presence of a val-idation data sample in which the measurement error problem is corrected. Such datasets, however, are scarce and difficult to find in empirical research. The necessary correction to adminis-trate unemployment insurance data constructed by Abowd and Vilhuber provides another excellent source of validation dataset with which they examine the extend of the coding errors and the correlation of this coding error with the underlying employ-ment duration and wage data. First, they are able to docuemploy-ment a large amount of coding errors present in the data. For ex-ample, they find that when individual flow and stock statistics are aggregated to a higher level, only some errors are averaged out. Other errors are in fact exacerbated through the process of aggregation. Second, their results also lend strong support to the presence of nonclassical measurement errors in economic datasets.

The regression analysis conducted in section 5 is particularly interesting. In this section the authors regress the biases in the flow and stock employment duration and payroll data, which are computed as the difference between the relevant precorrection statistics and postcorrection statistics, on a collection of vari-ables including accession rate, separation rate, and net change

© 2005 American Statistical Association Journal of Business & Economic Statistics April 2005, Vol. 23, No. 2 DOI 10.1198/073500104000000695

(3)

Hong: Comment 159

in full-quarter employment. They find that for nearly all flow statistics, the bias is negatively related to the size of the flow rate and is significantly different from 0 even when the rate is con-trolled for. They also find that the flow estimates for the middle age bracket group of workers will be more biased by the errors than for young people. In addition, the number of nonemploy-ment periods for new hires, recalls, and separations is typically negatively related to both accession and separation rates of the particular cell. But the number of nonemployment periods for accessions is positively related to a cell’s separation rate. Sim-ilarly, they also find significant bias in the payroll information for full-quarter workers and for accessions and separations.

These findings provide solid evidence for the presence of nonclassical measurement errors in the administrative UI data that the authors study. They also suggest the need for statisti-cians and econometristatisti-cians to invest more into the development of large-scale validation data. The authors mention an SSN val-idation project by the Bureau of Labor Statistics in 1997 for eight states. Their dataset is on a much larger scale, and they use a more extensive method for correcting the coding mistakes. The authors should be congratulated for their diligent and care-ful work that makes a significant contribution to improving our understanding of the nature of measurement errors and coding errors in large-scale economic datasets. Much of the economet-ric literature on measurement errors has been devoted to the development of econometric methodologies that allow one to produce consistent estimators under the classical measurement error assumption. In contrast, with the exception of the litera-ture cited earlier, very little work has been done in econometrics and statistics to examine the nature of measurement errors and to actually reduce the amount of measurement errors in popular economic datasets. Abowd and Vilhuber’s work takes an im-portant step toward filling in this gap and provides an extremely valuable method to the community of empirical researchers in applied microeconomics. We hope that more researchers will follow the authors’ lead and spend more effort on examining and correcting coding errors in economic datasets.

3. IMPLICATIONS FOR ECONOMETRIC RESEARCH

The authors also present new challenges to the economet-ric research on measurement errors models. Although it is very useful to investigate the correlation between the errors induced by miscoding social security numbers and the underlying flow and stock statistics of employment duration and payroll infor-mation, it is more interesting to study the impact of these coding errors on parameter estimates of popular econometric models using these data. These models can range from wage regres-sions to highly nonlinear duration models and structural job search and recall models. As the authors point out, although investigating the effect of their correction in the context of a hazard regression or wage regression is beyond the scope of the article, such an investigation can be the subject of a line of fruitful research in the future.

As mentioned earlier, the corrected dataset that the authors construct can potentially be used as a validation dataset to aid empirical research that makes use of smaller datasets that are susceptible to measurement errors. However, there is also a

subtle difference between this corrected dataset and the val-idation dataset in the sense required in the econometrics and statistics literature of measurement errors. The theoretical liter-ature requires that the true variables be observed in the valida-tion dataset. The corrected UI dataset that the authors construct eliminates a large number of single-quarter interruptions in job histories. However, they also acknowledge that their correction method is a conservative one, which is useful for assuring the consistency of wage record data across many years but can po-tentially leave other errors undetected. In particular, they note that the most likely error to be corrected using their procedure is a random coding error. Such a random coding error might not be random relative to the variables of interest, such as job acces-sions and separations. On the other hand, their procedure will not be able to detect persistent errors, such as mistyped SSNs that are repeatedly transmitted to the state agency. Thus the cor-rected data can be considered a dataset in which some mea-surement errors are corrected but other are not. I am not aware of any currently available econometric methodologies that can construct consistent estimators by combining a primary dataset with such a corrected dataset in which the amount of measure-ment error is not eliminated, but is significantly reduced relative to the primary dataset. Developing such a methodology could be an interesting challenge to the theoretical econometric liter-ature on measurement errors.

If the corrected dataset constructed by the authors is to be used as a primary dataset for empirical research, then an in-teresting question might also arise as to whether the correction procedure itself may invalidate the classical measurement er-rors assumption. Suppose that the measurement error is orig-inally random, even relative to the variables of interest. For example, suppose that job durations are equally likely to be un-derreported and overreported. Then if the correction procedure eliminates all the underreported job durations, the measurement errors in the job duration will become upward biased instead of being unbiased. The correction procedure certainly reduces the variance of the measurement errors by a large amount, but might introduce additional bias to the measurement error that were not present before the correction. The impact of this variance–bias trade-off on the performance of econometric es-timates of model parameters is an empirical one that is worthy of future investigation.

Another issue that is important for empirical researchers who might intend to use this corrected dataset is this dataset’s confi-dentiality. In section 2 the authors carefully describe how they deliberately used limited information in the correction proce-dure to respect all of the relevant data stewardship statues un-der state and feun-deral laws governing privacy and confidentiality of administrative datasets. Empirical researchers who intend to use their corrected dataset as well as other regulated confiden-tial datasets will also need to go through various procedural requirements to meet the legal confidentiality and privacy stan-dards. Such a process can be very lengthy and difficult. It would tremendously benefit the empirical research community if Pro-fessors Abowd and Vilhuber could provide some guidance to empirical researchers as to the procedural requirements for ac-cessing their dataset, either in this article or elsewhere.

(4)

160 Journal of Business & Economic Statistics, April 2005

ADDITIONAL REFERENCES

Abrevaya, J., Hausman, J., and Scott-Morton, F. (1998), “Identification and Es-timation of Polynomial Errors-in-Variables Models,”Journal of Economet-rics, 87, 239–269.

Bound, J., Brown, C., Duncan, G., and Rodgers, W. (1994), “Evidence on the Validity of Cross-Sectional and Longitudinal Labor Market Data,”Journal of Labor Economics, 12, 345–368.

Bound, J., Brown, C., and Mathiowetz, N. (2001), “Measurement Error in Sur-vey Data,” inHandbook of Econometrics, Vol. 5, eds. J. J. Heckman and E. Leamer, Amsterdam: North-Holland, Chap. 59.

Bound, J., and Krueger, A. (1991), “The Extent of Measurement Error in Lon-gitudinal Earnings Data: Do Two Wrongs Make a Right,”Journal of Labor Economics, 12, 1–24.

Carroll, R. J., Ruppert, D., and Stefanski, L. A. (1995),Measurement Error in Nonlinear Models, New York: Chapman & Hall.

Carroll, R., and Wand, M. (1991), “Semiparametric Estimation in Logistic Measurement Error Models,”Journal of the Royal Statistical Society, 53, 573–585.

Chen, X., Hong, H., and Tamer, E. (2003), “Measurement Error Models With Auxiliary Data,” working paper; forthcoming inReview of Economic Studies. Chesher, A. (1991), “The Effect of Measurement Error,” Biometrika, 78,

451–462.

Frish, R. (1934),Statistical Confluence Study, Oslo: University Institute of Eco-nomics.

Fuller, W. (1987),Measurement Error Models, New York: Wiley.

Hausman, J., Ichimura, H., Newey, W., and Powell, J. (1991), “Measurement Errors in Polynomial Regression Models,”Journal of Econometrics, 50, 271–295.

Hong, H., and Tamer, E. (2003), “A Simple Estimator for Nonlinear Errors-in-Variables Models,”Journal of Econometrics, 117, 1–19.

Horowitz, J., and Manski, C. (1995), “Identification and Robustness With Con-taminated and Corrupted Data,”Econometrica, 63, 281–302.

Hsiao, C., and Wang, L. (1995), “A Simulation Based Semi-Parametric Estima-tion of Nonlinear Errors-in-Variables Models,” working paper, University of Southern California.

Lee, L. F., and Sepanski, J. H. (1995), “Estimation of Linear and Nonlinear Errors-in-Variables Models Using Validation Data,”Journal of the American Statistical Association, 90, 130–140.

Li, T. (2002), “Robust and Consistent Estimation of Nonlinear Errors-in-Variables Models,”Journal of Econometrics, 110, 1–26.

Mahajan, A. (2002), “Identification and Estimation of Single Index Models With Misclassified Regressors,” working paper, Stanford University. Newey, W. (2001), “Flexible Simulated Moment Estimation of

Nonlin-ear Errors-in-Variables Models,”Review of Economics and Statistics, 83, 616–627.

Schennach, S. M. (2004), “Estimation of Nonlinear Models With Measurement Error,”Econometrica, 75, 33–75.

Sepanski, J., and Carroll, R. (1993), “Semiparametric Quasi-Likelihood and Variance Estimation in Measurement Error Models,”Journal of Economet-rics, 58, 223–256.

Taupin, M. L. (2001), “Semiparametric Estimation in the Nonlinear Structural Errors-in-Variables Model,”The Annals of Statistics, 29, 66–93.

Comments

William W. C

OHEN

Center for Automated Learning & Discovery, Carnegie Mellon University, Pittsburgh, PA 15213 (wcohen@cs.cmu.edu)

Stephen E. FIENBERG

Department of Statistics, Center for Automated Learning & Discovery, Carnegie Mellon University, Pittsburgh, PA 15213 (fienberg@stat.cmu.edu)

Pradeep RAVIKUMAR

Center for Automated Learning & Discovery, Carnegie Mellon University, Pittsburgh, PA 15213 (pradeepr@cs.cmu.edu)

We congratulate the authors on an interesting and technically innovative article. The article illustrates the sophistication re-quired to deal with statistical issues of data integration involv-ing diverse government databases, especially in the economics domain; furthermore, it fits within a broader program of work on the creation of longitudinal economic datasets for secondary analysis, for which Abowd in particular has provided leader-ship (see, e.g., Abowd and Lane 2004; Abowd and Woodcock 2001, 2004). The article has also stimulated us to think about more general issues regarding probabilistic model-based meth-ods for linkage—the primary topic of our joint research.

1. RECONSIDERING THE PROBLEM

Abowd and Vilhuber show that certain “flow” statistics con-nected with longitudinal studies are surprisingly sensitive to linkage errors. To illustrate their main point, consider the fol-lowing simplified problem. Suppose that we are presented with a set ofN tuples(ai,bi′,ci), whereaiandci describe the em-ployeeifilling some positionxin the first and third quarters of

2003,bi′ describes the employeei′filling positionxin the sec-ond quarter, andimay or may not be identical toi. To resolve these ambiguities, we clean the data by applying a probabilistic linkage method (Fellegi and Sunter 1969; Winkler 2002) to the collection of(i,i)pairs, which links together some fractionp of the most-similar pairs.

Now consider using the linked data to count the number of “recalls,” occasions in which an employee left her job and then returned after one quarter. The best estimate for this from the linked data will beR=N(1−p). Because true recalls are likely

to be rare, however, a small number of linkage errors could eas-ily lead to an estimate quite different from the true recall rate (proportionally speaking). A statistic that is even more sensi-tive to linkage errors (in absolute terms) is the number of “job changes,” which would be estimated asC=2Np. Many other

© 2005 American Statistical Association Journal of Business & Economic Statistics April 2005, Vol. 23, No. 2 DOI 10.1198/073500104000000703

Referensi

Dokumen terkait

voGセ|ILIG jjセ|NIN|LセlL^M jI, イGセjI| TセNLMMゥN jカN^MQ|セ ---.i\'

[r]

• Siswa belum diberi kesempatan untuk mengemukakan cara atau idenya yang mungkin berbeda dari apa yang dalam telah diketengehkan buku.. • Beberapa buku telah memberikan

Figure 4.28 Sequence Diagram Master Pasien for Pelabuhan Hospital Staff 129 Figure 4.29 Sequence Diagram Master Pengguna for Pelabuhan Hospital Staff130 Figure 4.30

hubungan antara ciri-ciri khusus hewan dan tumbuhan dan lingkungan hidupnya; perkembangan dan pertumbuhan manusia, ciri perkembangan fisik anak laki-laki dan perempuan,

Siswa aktif melakukan kegiatan untuk menjawab permasalahan yang muncul di awal pembelajaran. Guru memberi konsultasi atau membantu jika siswa

Sehubungan dengan Pemilihan Langsung Paket Pekerjaan Rehab Puskesmas Sukapura Pada Dinas Kesehatan Kabupaten Probolinggo dari sumber dana Tahun Anggaran 2017, dengan.. ini

1) Asam sulfat pekat sering ditambahkan ke dalam sampel untuk mempercepat terjadinya oksidasi. Asam sulfat pekat merupakan bahan pengoksidasi yang kuat. Meskipun