• Tidak ada hasil yang ditemukan

DEVELOPMENT OF SMALL AREA ESTIMATION RESEARCH IN INDONESIA

N/A
N/A
Protected

Academic year: 2019

Membagikan "DEVELOPMENT OF SMALL AREA ESTIMATION RESEARCH IN INDONESIA"

Copied!
7
0
0

Teks penuh

(1)

DEVELOPMENT OF SMALL AREA ESTIMATION RESEARCH

IN INDONESIA

1

Khairil A. Notodiputro and

2

Anang Kurnia

1,2

Department of Statistics, Bogor Agricultural University, Jl. Meranti, Wing 22 Level 4 Kampus IPB Darmaga, Bogor – Indonesia 16680

e-mail : 1khairiln@bima.ipb.ac.id, 2anangk@ipb.ac.id

Abstract. There has been a rapidly growing demand for small area statistic in Indonesia as the country political system has shifted from centralized to more decentralized system. The demand for reliable statistic in smaller regions such as sub-district area is inevitable as a basis for a good planning and effective decision-making processes. The Central Bureau of Statistic (BPS) in Indonesia has put many efforts to meet this demand using direct estimation but there are some instances in which direct estimation fails to produce estimates with the required precision due to the limited number of effective sample size. The increasing demand for small area estimates has motivated the need to develop more reliable methods for producing small area estimates with higher precision compared to the direct estimates.

This paper discusses the development of research activities in small area statistic in Indonesia. The discussion includes the importance of small area statistic in Indonesia and research activities regarding models of small area statistic based on BPS data. A brief description of enhancement on model-based indirect estimation on area level small area models is also discussed.

Keywords: small area estimation, area level model, EB-EBLUP

1.

Introduction

A sub-population is small if the domain specific sample size is not large enough to support direct estimates of adequate precision. Small areas may also describe a “small domain”. In context of survey sampling, an area estimate is usually referred to as a “direct estimate” if it is based only on the specific sample data coming from that area.

An alternative estimator to improve the direct estimator, known as indirect estimators, has been proposed in the literature. These estimators are developed by borrowing strength from neighboring area having similar characteristics and hence increase effective sample size and precision (see Ghosh and Rao (1994), Rao (1999), Rao (2003), Rao (2005) and Jiang and Lahiri (2006) for detailed description). In contrast, Chambers and Chandra (2006) proposed a class of model-based direct estimator for small area quantities. They mentioned many practical advantages associated with such model-based direct estimator arising from the fact that they are computed as weighted linear combinations of the actual sample data from the small area of interest.

Initially the indirect estimators are basically synthetic and composite estimators. The idea of synthetics estimator (for small area mean as example) is the overall sample mean for all small areas. However for this case, the composite estimator of small area mean is the weighted average of the sample mean for that small area and the overall mean. See Chapter 4, Rao (2003) for detailed description of indirect estimator and their properties.

Enhancing of indirect estimators, model based small area estimation techniques are generally accepted in the small area literature. Model based estimators use explicit linking model. One major of advantage of model based technique is the ability to validate the explicit model from the sample data. Most commonly used small area models have sampling error component and an area specific random error component. Fay and Herriot

2nd

(2)

small area estimation. To date, report of development of SAE could be found in various literature such as Rao (2003), Rao (2005), Longford (2005), Jiang and Lahiri (2006), as well as Chambers and Chandra (2006).

Chambers et al (2006) described that many SAE researches over the last decade have focused on methods of estimation of the mean squared error of the estimates. The approximation-based methods produced satisfactory estimates provided the underlying model assumptions are valid. However, there is never any guarantee that this is true in any particular SAE application. Bootstrap-based methods offer an important avenue for calculating more robust estimates of mean squared error. At present, these methods rely on assumptions about the distribution of area effects. Development of fully nonparametric bootstrap methods for the mixed models used in SAE will be an important step forward.

Nowadays the development of SAE in Indonesia, especially at the Central Bureau of Statistic (BPS), has been started. Several factors are responsible for the slow adoption of these methods. One is that the methodology from view point of statistical theory is rather complex. Another factor is the fact that small area estimation procedures are predominantly model based, while BPS is generally rely on more traditional design-based for producing their official statistics.

On the other hand, it is an avoidable that the attention to SAE in Indonesia has increased along with the increasing demand of government and private sector to accurate information that can be access quickly, not only for national (large domain) level but also for district, sub-district or village (small area) levels. Moreover, since the political system in Indonesia has been shifted from centralized to more decentralized system then it is important to develop SAE to ensure that decision of effective public policy could be established based on accurate data both at national and district level. Certainly, the decision maker in local government will require statistical data for the establishment of local development plan.

The main purpose of this paper is to discuss current research activities on small area estimation in Indonesia as well as its future plan. This paper firstly discusses development of small area estimation in Indonesia followed by description of some research activities to produce various small area models for Indonesian context. We provide a brief description of some enhancement of model-based indirect estimation on area level for small area models.

2.

Brief Review of Small Area Estimation

Firstly the topic of small area estimation is focus on demographics method for estimated population and local per capita income. Purcell and Kish (1980) used the demographics method for small domain estimate parameter, that is make connected between the latest survey data and the latest census.

There is a varied methodology on developing small area estimation. We can find these methods in Ghosh and Rao (1994) and Rao (2003). An initial classification divides the different existing methods into two categories: traditional and model-based estimator. Traditional models include direct and indirect estimators and their combinations. Traditional direct estimators use only data from the small area of interest. Usually they are unbiased, but they display a high variation. Traditional indirect and model-based estimators are of higher precision since they also use observations from related or neighboring areas.

When information on one or more relevant auxiliary data is available, the synthetic estimator has been proposed in the small area literature (see Rao, 2003). A compromise between the direct estimation and the synthetic estimation is the composite estimation. Traditional composite estimators are linear combinations of direct and indirect estimators (synthetic estimation). Model-based estimators can be interpreted as composite estimators, but unlike the traditional estimators, the weighting factors depend on the structure of the estimator’s co-variances. More discussion on this topic can be obtained from Farrell, MacGibbon and Tomberlin (1997), Ghosh and Rao (1994), and Rao (2003).

(3)

The best linear unbiased predictor (BLUP) proposed by Henderson in his series of papers (1948-1975) adopted by Rao (2003) for the small area estimation case. The BLUP method includes unknown elements of the variance-covariance matrix of random variables. If the elements in the BLUP method proposed by Henderson are replaced by some type of estimators, we will obtain the predictor called the empirical BLUP (EBLUP) which is model-unbiased (Kackar and Harville, 1981). Kackar and Harville (1984) gave an approximation to the MSE of the predictor and proposed an estimator of the MSE. The MSE and estimators of the MSE were also studied by Prasad and Rao (1990) and Datta and Lahiri (2000).

Nowadays, a large volume of research of empirical Bayes (EB) and hierarchical Bayes (HB) have been used to small area estimation. MacGibbon and Tomberlin (1989) proposed EB estimation for small area proportion, but neglecting uncertainty of prior parameter. The empirical Bayes estimator was considered by among others, Morris (1983), Laird and Louis (1987), Butar and Lahiri (2003), Chen and Lahiri (2005). Empirical Bayes methods have been shown suitable to address small area estimation and related problems. Wan (1999) described advantages of this method which include robustness – no specific distributional assumption are required. The biggest advantage of these methods is their ability to enhance the precision of individual estimators by ‘borrowing strength’ from similar other estimators (Ghosh, Sinha and Kim, 2006). However, one challenging problem is how to measure uncertainty of EB which capture all sources of variation.

Various methods have been proposed to measure the uncertainty of empirical Bayes estimator. The jackknife

method has been used by Jiang, Lahiri and Wan (2002) to estimate the MSE(θˆiEB). Butar and Lahiri (2003)

proposed the bootstrap method for normal case. Pfeffermann and Glickman (2004) studied both the bootstrap parametric and non-parametric. Chen and Lahiri (2005) proposed the weighted jackknife for MSE estimator.

Hall and Maiti (2005) developed bootstrap method in MSE(θˆEBi ) estimator and proposed confidence interval

estimator.

In the hierarchical Bayes approach, the unknown model parameter is treated as random and the value is drawn from specified prior distribution. Posterior distribution for the small area characteristics of interest are obtained by integrating over these priors (Saei and Chambers, 2003). Ghosh and Rao (1994) reviewed the application of these estimation methods in small area estimation. Maiti (1998) has used non-informative priors for hyper-parameters when applying HB methods. You and Rao (2000) have used HB methods to estimate small area means under random effect models. Ghosh, Sinha and Kim (2006) studied the EB and HB estimation in finite population sampling under structural measurement error models. However, Saei and Chambers (2003) described that the HB method has two major disadvantages: the first is its lack of robustness to specification of the prior distribution and the other is its computational complexity.

3.

Small Area Statistics in Indonesia

3.1. Early Development

Formerly small area statistic in Indonesia is produced using census data which were broken down into smaller domain such as province or district. These data has been published in Province Statistic (Propinsi dalam Angka) or District Statistic (Kabupaten dalam Angka).

It is uneasy to determine how the small area statistic was first applied in Indonesia. However, the literature showed that small area estimation was first introduced by Smeru Research Institute in collaboration with BPS to produce poverty maps at three provinces in Indonesia (DKI Jakarta, East Java and East Kalimantan) as a pilot project. The ELL (Elbers, Lanjou and Lanjou) method was applied using consumption model to produce poverty maps. In principle, this method combines detailed information collected from a household survey with the complete population coverage of a population census. The pilot project has been done on 2001 – 2003.

Suryahadi et.al (2005) describes the efforts to develop small-area poverty maps in Indonesia. In particular, it was focused on the effort to develop a poverty map for the whole country using the ELL approach supported by the Ford Foundation’s Regional Research Initiative on Social Protection in Asia. The final result of this study is a poverty map for the whole country (30 province), disaggregated at provincial, district/city, sub-district, and village levels. The study used various data sources, such as: (1) Consumption Module SUSENAS 1999, (2) Core SUSENAS 1999, (3) Population CENSUS 2000, and (4) PODES (Village Census) 2000.

(4)

errors. In particular, the standard errors at the provincial, district/city, and sub-district levels are reasonably acceptable. At the village level, however, there has been a high variation in the precision of poverty headcount estimates across villages within a particular province. The implication of this is that the poverty map for the village level needs to be used with caution. For villages with high standard errors, additional information is required to verify the estimates.

A discussion in Indonesia Statistical Society Forum (Forum Masyarakat Statistik) held in Solo on December 2004, identified the need of better research quality in small area estimation models for Indonesia case. The idea has been given serious attention from academic institution, BPS and also research institutes. While initiation of development of small area estimation in Indonesia has been started by Smeru Research Institute (2003), Kurnia and Notodiputro (2005) have also studied the generalized linear mixed model approach in small area estimation for Indonesia context. The result of the study indicated that the generalized mixed model was potential to obtain small area statistic with higher precision although inconsistency of relative standard error in each small area of interest was still found.

3.2. Current Development

Research on small area estimation in Indonesia, especially at IPB, has been carried out through support from DGHE with title: Small Area Estimation Models and Its Application for BPS Data. This research is conducted in three years started from 2006 up to 2008. The research is divided into three phases in which each phase is carried out in one year. During the first phase (2006), the research was focused on evaluating and exploring the various models which have been existed in the literature. In this phase we intended to map the state of the art of small area statistic methodology. Furthermore, we also aimed at obtaining potential methods to be developed and adopted for Indonesia case, especially for use with BPS data.

In the second year, the research will be focused on development of method of small area estimation to increase the precision. Data simulation designed to mimics the BPS sampling will be generated and utilized. A survey will be conducted to compare between the indirect estimates (small sample size) and the direct estimates which are obtained from a survey using an optimum sample size.

The research activities during this phase have produced various papers presented in conferences and published in various journals. Kurnia and Notodiputro (2006a) showed that modification of Jiang, Lahiri, and Wan (2002) method by introducing covariate and heterogeneity of Di (sampling error) produced underestimated MSE in

amount of 13% - 19%. This evidence showed that there has been significant difference between homogeneous and heterogeneous cases in term of its measures of uncertainty θˆiEBLUP or θˆiEB as discuss in Rao (2003). This condition challenged for further research since in small area estimation in Indonesia is usually suffered from heterogeneity of variance. Kurnia and Notodiputro (2006b) studied the effect of estimating sampling error in SAE. As continuation from previous research, they indicated that influence of sampling error of estimation need to be given more attention especially for heterogeneity cases.

In term of effect of time series data in small area estimation, Sadik and Notodiputro (2006) discussed some approaches based on time series data (especially for random walk and AR models) and applied in small area estimation context. Moreover, Indahwati and Notodiputro (2006) studied the effect of inappropriate sampling design on reliability of small area estimates. Kismiantini et.al (2006) used empirical Bayes method to estimate risk of dengue fever based on Gamma models. They showed that a better accuracy of relative risk was gained, measured by MSE, when compared to direct and Bayes methods.

Two sources of main data are used in this study: (i) SUSENAS and (ii) PODES (Village Census). These data were collected by BPS. The data on household characteristics was obtained from the SUSENAS, while the data on village-level characteristics is obtained from the PODES.

(5)

PODES, meanwhile, is a complete enumeration of village data throughout Indonesia. The information collected through this village census includes village characteristics such as size of area, population, infrastructure and local industries. The questionnaires are filled out by the local sub-district officials who are responsible for collecting statistical data (mantri statistik). The information is obtained from official village documents as well as interviews with village officials. The PODES survey is usually conducted three times in every ten years, usually prior to and as a preparation for an agricultural census, an economic census, and a population census.

Currently, BPS and JICA are developing small area statistic for Indonesia context. In conference that held at March 7, 2007, BPS and JICA presented their current and future works on small area statistic. However, JICA and BPS focused on production of small area statistics compiled from the 2000 Population CENSUS and used direct estimation techniques. Attention of BPS and JICA include: (1) statistical maps based on small area data, (2) preparation of grid square statistical data, (3) usefulness and application of small area statistical analysis, especially for spatial analysis, and (4) population structures. They defined an area is a limited geographical surface on the globe and the small area is defined as smaller area less than province.

On the other hand, some authors affiliated to BPS have published various application of small area estimation. Walujadi and Muchlisoh (2006) used small area estimation technique to estimate proportion of under-nourished under-5 children in East Nusatenggara. Their study adopted and applied a poverty mapping method developed by Elbers, Lanjou and Lanjou (2002, 2003). They claimed that the result is considered to be a quite good with the relative standard error at an average about 2%.

4.

Discussion and Future Development

It is interesting to learn that our experience in developing small area statistics began with looking at empirical data, exploring different techniques available in literature, and doing simulation to evaluate performance of the methods. In the following phase we ended up with several ideas and strategy to develop models suitable for BPS data. During the last sixteen months of the project we identified the importance of clustering of small areas to improve the accuracy of small area statistics. Moreover our research showed that the relationship between auxiliary and the response variables sometimes is non-linear whereas the existing methods are based on linear relationship. Hence, to increase accuracy of small area statistics the future development of this research will be directed to modifying the existing methods to incorporate non-linear relationships between auxiliary and the response variables.

In the third year of this project, the research will be geared to evaluate the consistency of estimates produced by methods developed in the second year. Recommendation of the best methods of estimation as well as computer program to implement the recommended methods will represent output of this research.

Acknowledgements

This work was supported by a research grant from DGHE Ministry of National Education: Development of Small Area Estimation and Its Application for BPS’ Data, Batch IV 2nd years (2007).

REFERENCES

Butar, F. B. and Lahiri, P. 2003. On measures of uncertainty of empirical Bayes small area estimators. Model selection, model diagnostics, empirical Bayes and hierarchical Bayes, Journal of Statistical Planning and Inference, 112, pp: :63–76.

Chambers, R and Chandra, H. 2006. Improved Direct Estimator for Small Area. S3RI Methodology Working Paper M06/07.

Chambers, R., Brakel, J., Hedlin, D., Lehtonen, R. and Zhang, LC. 2006. Future Challenges of Small Area Estimation. Statistics in Transition, Vol. 7, No. 4, pp. 759—769.

Chen, S. and Lahiri, P. 2005. On mean squared prediction error estimation in small area estimation problems. In Proceedings of the Survey Research Methods Section. American Statistical Association.

(6)

Farrell, P.J, Macgibbon, B., Tomberlin, T.J. 1997. Empirical Bayes Small Area Estimation using Logistic Regression Models and Summary Statistics. Journal of Business & Economic Statistics. Vol. 15 (1), pp 101-108.

Fay, R.E. and Herriot, R.A. 1979. “Estimates of income for small places: an application of James-Stein procedures to Census data”. Journal of the American Statistical Association, Vol. 74, p:269-277 Ghosh, M. and Rao, J.N.K. 1994. “Small Area Estimation: An Appraisal”. Statistical Science, 9, No.1 pp:

55-93.

Ghosh, M., Sinha, K. and Kim, D. 2006. Empirical and Hierarchical Bayesian Estimation in Finite Population Sampling under Structural Measurement Error Models. Scandinavian Journal of Statistics. Vol 33, pp: 591–608.

Hall, P. and Maiti, P. 2006. On parametric bootstrap methods for small area prediction. Journal of Royal Statistics. Soc. B 68, Part 2, pp. 221–238.

Indahwati and Notodiputro, K.A. 2006. Effect of Inappropriate Sampling Design on Reliability of Small Area Estimates. Proceeding at the First International Conference on Mathematics and Statistics. MSMSSEA – Bandung.

Jiang, J. and Lahiri, P. 2006. Mixed Model Prediction and Small Area Estimation. Sociedad de Estad´ýstica e Investigaci´on Operativa, Test , Vol. 15, No. 1, pp. 111–999.

Jiang, J., Lahiri, P. and Wan, S.M. 2002. A Unified Jackknife Theory, Annals of Statistics, 30.

Kackar, R.N. and Harville, D.A. 1981. Unbiased of two-stage estimation and prediction procedure for mixed linear models. Communications in Statistics – Theory and Methods, A 10, pp: 1249-1261.

Kackar, R.N. and Harville, D.A. 1984. Approximations for standard errors of estimations of fixed and random effects in mixed linear models. Journal of the American Statistical Association, 79, pp: 853-862.

Kismiantini, Kurnia, A. and Notodiputro, K.A. 2006. Risk of Dengue Hemorrhagic Fever In Bekasi Municipality With Small Area Approach. Proceeding at the First International Conference on Mathematics and Statistics. MSMSSEA – Bandung.

Kurnia, A. and Notodiputro, K.A. 2005. Generalized Linear Mixed Model on Small Area Estimations. Forum Statistika dan Komputasi, Vol.10 No.2.

Kurnia, A. and Notodiputro, K.A. 2006a. EB-EBLUP MSE Estimator On Small Area Estimation with

Application to BPS Data. Proceeding at the First International Conference on Mathematics and Statistics. MSMSSEA – Bandung.

Kurnia, A. and Notodiputro, K.A. 2006b. Pengaruh Pendugaan Ragam Penarikan Contoh pada Small Area Estimation. Proseeding at Seminar Nasional Matematika II, UNJ – Jakarta.

Maiti, T. 1998. Hierarchical Bayes estimation of mortality rates for disease mapping. Journal of Statistical Planning and Inference, 69, pp: 339-348.

Pfeffermann, D., Barnard, C.H. 1991. Some New Estimators for Small Area Means with Application to the Assessment of Farmland Values'', Journal of Business & Economic Statistics. Vol. 9 (1), p. 73-84. Prasad, N.G.N. and Rao, J.N.K. 1990. “The Estimation of Mean Squared Errors of Small Area Estimators”.

Journal of American Statistical Association, 85, p:163-171.

Purcell, N.J. and Kish, L. 1980. Postcensal estimates for local area (domains). International Statistical Review, 48, pp: 3-18.

Rao, J.N.K. 1999. Some Recent Advances in Model-Based Small Area Estimation, Survey Methodology, Vol.25 No.2, p:175-186.

Rao, J.N.K. 2003. Small Area Estimation, New York : John Wiley and Sons.

Rao, J.N.K. 2005. Inferential Issues In Small Area Estimation: Some New Developments. Statistics In Transition, December 2005 Vol. 7, No. 3, pp. 513—526.

Sadik, K. and Notodiputro, K.A. 2006. Small Area Estimation with Time and Area Effects Using Two Stage Estimation. Proceeding at the First International Conference on Mathematics and Statistics. MSMSSEA – Bandung.

Saei, A. and Chambers, R. 2003. Small Area Estimation: A Review of Methods Based on the Application of Mixed Models. S3RI Methodology Working Paper M03/16.

(7)

Walujadi, D. dan Muchlisoh, S. 2006. Small Area Estimation of Children Under-5 Undernourished in East Nusatenggara. Jurnal Statistika, Th.2 No.1, p: 13-22.

Wan, S.M. 1999. Jackknife Methods in Small Area Estimation and Related Problems. PhD Thesis, Dept. of Mathematics and Statistics, University of Nebraska-Lincoln.

Referensi

Dokumen terkait

Tanda pelunasan pajak tahun terakhir (SPT tahun 2013) dan Laporan Bulanan Pajak (PPh pasal 21, PPh pasal 23 bila ada transaksi, PPh pasal 25/29 dan PPN) untuk 3 (tiga) bulan

Bila pipa kapiler dimasukkan ke dalam suatu zat cair, maka zat tersebut akan naik ke dalam pipa sampai gaya gesek ke atas diseimbangkan oleh gaya gravitasi ke bawah akibat berat

Pekerjaan : Rehabilitasi Ruang kelas Nomor : Lokasi : jln.. Raya

METODE ANTIM ( ACADEMY OF NETWORKED THINKING IN MUSIC) DALAM PEMBELAJARAN PIANO TINGKAT DASAR DI INDRA MUSIC SCHOOL BANDUNG.. Universitas Pendidikan Indonesia | repository.upi.edu

❖ Sekolah tidak dapat mengetahui keberhasilan proses pembelajaran terhadap kompetensi lulusannya dalam memanfaatkan kompetensi pengetahuan dan keterampilan siswa untuk

Penelitian serupa yang pernah dilakukan Ionescu, Badescu dan Acaluschi (2015) juga mendukung penelitian ini karena dalam penelitian tersebut didapatkan hasil

Dalam hal ini pemasar berusaha mempengaruhi konsumen dengan menggunakan stimulus-stimulus pemasaran seperti iklan dan sejenisnya, agar konsumen bersedia memilih produk

Misal sebuah data konsumen baru akan diklasifikasi apakah bermasalah atau tidak dalam pembayaran angsuran motor maka dilakukan perhitungan kedekatan antara kasus baru