GENERALIZED LINEAR MIXED MODELS
OF ORDINAL POVERTY RESPONSE
IN NESTED AREA
YEKTI WIDYANINGSIH
SCHOOL OF GRADUATE STUDIES BOGOR AGRICULTURAL UNIVERSITY
iii
THE STATEMENT OF DISSERTATION AND SOURCES OF INFORMATION
I hereby declare that the dissertation entitled "Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area" is my own work under direction of the supervisory committee and has not been submitted in any form to any university. Sources of information derived or quoted from the work published or unpublished of other authors mentioned in the text and listed in the Bibliography (References) at the end of this dissertation.
Bogor, July 2012
iv
PERNYATAAN MENGENAI DISERTASI DAN SUMBER INFORMASI
Dengan ini saya menyatakan bahwa disertasi berjudul "Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area" adalah karya saya sendiri di bawah arahan para pembimbing dan belum diajukan dalam bentuk apa pun kepada perguruan tinggi mana pun. Sumber informasi yang berasal atau dikutip dari karya yang diterbitkan atau tidak diterbitkan dari penulis lain telah disebutkan dalam teks dan dicantumkan dalam Daftar Pustaka (References) di bagian akhir disertasi ini.
Bogor, July 2012
v YEKTI WIDYANINGSIH. Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area. ASEP SAEFUDDIN, KHAIRIL A. NOTODIPUTRO, AJI HAMIM WIGENA.
The Linear Mixed Models in this study is a development of Spatial Generalized Linear Mixed Model proposed by Zhang and Lin (2008). As in Zhang’s and Lin’s model, spatial (regional) data in this study is concerned on the hotspot detection. Hotspot detection method used by Zhang and Lin was Circle Based Scan Statistic (SS) method of Kulldorf (1997), while research in this dissertation using Upper Level Set Scan Statistic (ULS) hotspot detection method of Patil and Taillie (2004). Application of this hotspot detection method begins by comparing the two methods through simulation to obtain 14 performance criteria, resulting that the ULS hotspot detection method is better than the other one. Furthermore, the ULS method is performed to detect hotspot of bad nutrition in some districts, the results are used as a covariate in the modeling. This study focuses on the development of models for regional data viewed from the proximity of nested observations. According to Cressie (1993) there is a tendency for adjacent observations have a stronger correlation than distant observations. In statistics, also could be said there are differences in the variation of individuals within a group with individuals between groups. This condition must be considered in the modeling. Generalized estimating equation (GEE) is a parameter estimation method accounts for the correlation between observations. Working correlation matrices (WCM) is an important part in the parameters estimation process. Three structures of correlation matrices are studied and implemented to know which structure is the most appropriate to the data. The results of parameters estimation of Nested GLM and Nested GLMM based on combinations of some WCMs and parameter estimation techniques were compared. Response variable used in the model is in ordinal scale having complexity in the modeling, which also a focus of this research, while response variable used in Zhang’s and Lin’s model is a count variable with Poisson distribution. This ordinal response is obtained by grouping the ranking result by ORDIT (Ordering Dually in Triangles) ranking method from Myer and Patil (2010). Through the development of the model in this study involving nested spatial data, better results is provided especially when using diagonal working correlation matrix.
vi
YEKTI WIDYANINGSIH. Model Campuran Linier Terampat untuk Respon Kemiskinan Ordinal dalam Area Tersarang. ASEP SAEFUDDIN, KHAIRIL A. NOTODIPUTRO, AJI HAMIM WIGENA.
Model Campuran Linier Terampat dalam penelitian ini merupakan pengembangan dari Spatial Generalized Linear Mixed Model (Spatial GLMM) yang sudah dikerjakan oleh Zhang dan Lin (2008). Sebagaimana pada model dari Zhang & Lin, data spatial yang digunakan dalam penelitian ini berkaitan dengan hasil pendeteksian hotspot. Zhang dan Lin (2008) menggunakan metode pendeteksian hotspot Circle Based Scan Statistic (SS) dari Kulldorf (1997), sedangkan penelitian dalam disertasi ini menggunakan metode pendeteksian
hotspot Upper Level Set Scan Statistic (ULS) dari Patil dan Taillie (2004). Aplikasi dari metode pendeteksian hotspot diawali dengan membandingkan kedua metode tersebut, yaitu SS dan ULS melalui simulasi untuk mendapatkan 14 kriteria kinerja. Hasil simulasi memberikan kesimpulan, bahwa metode pendeteksian hotspot ULS lebih baik. Selanjutnya dilakukan pendeteksian hotspot
gizi buruk pada beberapa kabupaten yang hasilnya digunakan sebagai peubah penyerta dalam pemodelan. Penelitian ini difokuskan pada pengembangan model untuk data spatial tersarang yang dipandang dari kedekatan pengamatannya. Menurut Cressie (1993) ada kecenderungan bahwa pengamatan-pengamatan yang berdekatan memiliki korelasi yang lebih kuat dibandingkan pengamatan-pengamatan yang berjauhan. Secara statistik dapat juga dikatakan ada perbedaan variasi individu-individu di dalam satu kelompok dengan individu-individu dari kelompok yang berbeda. Kondisi ini harus diperhatikan dalam pemodelan.
Generalized Estimating Equation (GEE) adalah suatu metode pendugaan parameter yang memperhatikan kondisi tersebut. Working correlation matrices
(WCM) yang merupakan bagian penting dalam pendugaan parameter dengan metode GEE dibahas dan diaplikasikan untuk beberapa struktur matriks korelasi, untuk mengetahui struktur WCM mana yang paling sesuai dengan kondisi data. Hasil pendugaan parameter dari Nested GLM dan Nested GLMM dengan kombinasi beberapa WCM dan teknik pendugaan parameter dibandingkan. Peubah respon yang digunakan dalam model adalah peubah respon berskala ordinal yang merupakan bagian teori yang cukup kompleks, yang juga menjadi fokus dalam penelitian. Sedangkan peubah respon yang digunakan pada model Zhang dan Lin adalah peubah tercacah yang berdistribusi Poisson. Peubah respon dengan skala ordinal diperoleh dari pengelompokan hasil metode ranking ORDIT
(Ordering Dually in Triangles) dari Myer dan Patil (2010). Melalui pengembangan model dalam penelitian ini, pemodelan yang melibatkan data lokasi (spatial) sebagai faktor acak memberikan hasil yang lebih baik, terutama apabila menggunakan matriks korelasi (working correlation matrix) yang diagonal (independent WCM).
vii YEKTI WIDYANINGSIH. Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area. ASEP SAEFUDDIN, KHAIRIL A. NOTODIPUTRO, AJI HAMIM WIGENA.
Ranking, hotspot detection and modeling are important techniques for almost all fields of study. These three techniques have important roles for decision makers, even in business, education, ecology, and socio economic, especially in government to increase the transparency of decision making. Every country in this world has several policies to arrange for several affairs. Due to the limitation of the sources, the right and apt decision is very important and urgent. To support the right decision in every area, the role of these techniques is needed. Optimistically, this dissertation is able to contribute ideas and thoughts to the government and ministries in decision making process related to poverty reduction.
Focus of study in this dissertation is modeling in Nested Generalized Linear Model (NGLM) and Nested Generalized Linear Mixed Model (NGLMM) as an expansion of Zhang’s and Lin’s Model (2008), a model as a strategy to detect hotspot through parameter estimates of spatial association in non-nested study area using count response variable. Modeling in this study is GLM and GLMM with hotspot detection result as an explanatory variable, applied in nested area using multinomial ordinal response variable. Before modeling, two studies, i.e. ranking method and hotspot detection methods are studied.
ORDIT (Ordering Dually in Triangle) ranking method is studied and implemented on poverty data. Actually, this method was developed to handle ranking process of many individuals based on many indicators. It is not easy to rank individuals with many indicators. This study explaines how to rank many individuals based on many indicators through some mathematical concepts, such as order theory, duality, and partial order set (poset). Due to the limitation of the data, this method is implemented to order sub districts based on poverty level using only two indicators, i.e. surkin (surat miskin) or poverty letters (PL) and
askeskin (asuransi kesehatan untuk orang miskin) or health insurance for the poor (HIP). Observation unit of this data is sub district (kecamatan). In this study, 1679 sub districts in Java Island are ordered based on poverty using those indicators,
HIP and PL. As the result, 6 of 10 most severe sub districts are in Jember district, and 5 and 3 of 10 least severe sub districts are in Probolinggo city and Surabaya, respectively. Based on the results of ranking method, it can be concluded globally, that the order from less severe to most severe levels of the three provinces are West Java, Central Java and East Java.
The work of this ranking method was continued by grouping the ranking result into 3 parts based on ranking order. The three poverty levels of sub districts are worst, moderate, and mild. Every sub-district has its own grade as 1 or 2 or 3. “One” is for the worst, “2” is for moderate, and “3” is for mild. The result of this grouping ranking is kept as a report and would be used as response variable for modeling.
viii
assumption, 8 data sets were built and computed in 10.000 times to obtain the output, which are the performances of the methods in 14 criteria. The mean and standard deviation of each criterion from each simulation and each data set are computed and then compared. From these outputs, 14 criteria are summarized, analysized and compared. As the result of comparison, it is believed, ULS hotspot detection is better than Circle based Scan Statistics (SS).
The research is continued on detection of bad nutrition hotspot in 8 districts that have been chosen randomly. In this result, we have hotspot status for every sub district in these 8 districts: 0 means sub district is not in the hotspot area and 1 means sub district is in the hotspot area. This result would be included in modeling as a dichotomy explanatory variable, to answer the question: does the hotspot of bad nutrition explain significantly on poverty level through Nested GLM and GLMM.
Modeling is started with data preparation, as follows. Three districts from West Java, 2 districts from Central Java, and 3 districts from East Java are chosen randomly for model implementation. The names of these 8 districts are Kuningan, Karawang, Majalengka, Cilacap, Boyolali, Ngawi, Blitar, and Jember. Three levels of poverty which is the result of study on ranking method is used as ordinal response in modeling, while bad nutrition hotspot status which is the result of study on ULS hotspot detection method is used as an explanatory variable. Moreover, other explanatory variables for modeling are number of farmer families, schools, and health personnel. The reason of this variables determination is based on Bappenas Report 2011. To simplify understanding in interpretation, values of explanatory variables are divided into three parts, i.e. low, moderate, and mild which are appropriate to some resources.
Based on Zhang’s and Lin’s models, modification is developed, that is (1) upgrading the model for nested data (districts nested in province), with assumption correlation of sub districts within district is higher than correlation of sub districts between districts, (2) using ordinal scale as response variable. Modeling was undertaken for the GLM and GLMM. In Nested GLM, Generalized Estimating Equation (GEE) method is used as parameter estimation to tackle clustered and correlated data problem, while in Nested GLMM, Pseudo Likelihood is used as the model parameter estimation method. In Nested GLMM, district is a random effect in the model.
Some working correlation matrices can be implemented through GEE method. Three types working correlation matrices (WCM), i.e. exchangeable, unstructured, and independent are studied. An objective of modeling is to know which WCM gives the best results. Assumed that the poverty data has unstructured pattern in correlation between sub districts in a district. As the result, independent WCM gave the minimum ratio of robust and model based standard errors. It is believed the data has independent correlation structure.
x
© Copyright owned by IPB, in 2012 All rights reserved
Prohibited from quoting part or all of these papers or the source is anonymous. Citations only for educational purposes, research, writing papers, preparing reports, writing criticism, or review an issue, and citations will not damage the normal interest IPB.
xi
OF ORDINAL POVERTY RESPONSE
IN NESTED AREA
YEKTI WIDYANINGSIH
Dissertation
Submitted to the School of Graduate Studies of Bogor Agricultural University
in partial fulfillment of the requirements for Doctorate degree in Statistics
SCHOOL OF GRADUATE STUDIES BOGOR AGRICULTURAL UNIVERSITY
xii
Closed Examination
(July 7, 2012) Examiner:
1. Dr. Ir. I Gusti Putu Purnaba, DEA. Department of Mathematics
Faculty of Mathematics and Natural Sciences Bogor Agricultural University
2. Dr. Ir. Hari Wijayanto, M.Si. Department of Statistics
Faculty of Mathematics and Natural Sciences Bogor Agricultural University
Open Examination
(July 30, 2012) Examiner:
1. Prof. Dr. Ir. Dadang Sukandar, M.Sc. Department of Community Nutrition
Faculty of Human Ecology, Bogor Agricultural University 2. Dr. Slamet Sutomo, SE., MS.
xiii Student Name: Yekti Widyaningsih
NRP: G161070011
Approved as to style and content by:
Dr. Ir. Asep Saefuddin,MSc. Chair of Committee
Prof. Dr. Ir. Khairil A. Notodiputro, MS. Dr. Ir. Aji Hamim Wigena, MSc. Member Member
Acknowledged,
Dr. Ir. Aji Hamim Wigena, MSc. Dr. Ir. Dahrul Syah, MSc. Agr. Head of Study Program Dean of School of Graduate Studies
Date of defense: Date of graduation:
xv Praise and thank to God Almighty for all His grace so that scientific work is successfully completed. Research has been conducted since mid-2009 under the title Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area.
I would never have been able to finish my dissertation without the guidance of my committee members, help from friends, and support from my family, especially my mother and my late father.
There are many people who through their generosity and knowledge have made important contributions to this dissertation. It would be impossible to list everyone who contributed or to adequately list the extent of the contributions for those who are mentioned.
First and foremost, I am extremely grateful to my advisor, Dr. Ir. Asep Saefuddin, MSc. for his guidance and support throughout my graduate study. I especially thank him for giving me the opportunity to participate in several of his research projects which deal with many challenging statistical issues. I wish to thank my committee members Prof. Dr. Ir. Khairil Anwar Notodiputro, MS. and Dr. Ir. Aji Hamim Wigena, MSc. who let me experience the research of data simulation in the field and practical issues beyond the textbooks, and patiently corrected my writing.
I would like to express my deepest gratitude to my advisor in Pennsylvania State University, Prof. Ganapati P. Patil for his excellent guidance, caring, patience, and providing me with an excellent atmosphere for doing research. I would like to thank Prof. Wayne L. Myers who let me experience the research of ranking method, also patiently corrected my writing. Many thanks to Prof. Sharad W. Joshi who as a good friend was always willing to help and give his best suggestions. It would have been a lonely work room without communicating with him by phone. My research would not have been possible without their helps.
Many thanks to the Directorate of Mendepdiknas for financial assistance,
xvi
statistics, faculty and employees of the school of graduate studies who have provided services to both teaching and administration.
Many thank to the Department of Statistics lecturers who always take the time to discuss and provide advice and encouragement, and also the statistics department employees for their helps.
Special thanks goes to Dr. Ir. I Gusti Putu Purnaba, DEA., Dr. Ir. Hari Wijayanto, M.Si., Prof. Dr. Ir. Dadang Sukandar, M.Sc. and Dr. Slamet Sutomo, SE., MS. who were willing to participate in my final defense committee at the last moment. I would also like to thank Drs. Tjiong Giok Pin, M.Si for possibility to use the map files.
I also wish to acknowledge my friends, S2 and S3, with whom I shared my joy, complaints, and laughter through these past years.
Finally, I would like to thank my parents, two elder sisters, and elder brother. They were always supporting me and encouraging me with their best wishes.
Even though I has benefited from the help and advice of many people, there are some bound to be things I have not grasped – so remaining mistakes and omissions remain my responsibility. I would be grateful for messages pointing out errors in this dissertation.
Bogor, July 2012
xvii Yekti Widyaningsih was born on September 15, 1967 in Bandung, West Java. Her father’s name is Prayuto and her mother’s name is Ambarwati. Yekti is the youngest of one brother and three sisters. She graduated with Dra in Mathematics from University of Indonesia in 1992 with Dra. Linggawati, M.S. as her advisor, and received master’s degree in statictics in 2002 from Bogor Agricultural University with advisors Dr. Ir. Amril Aman, M.Sc. and Dr. Ir. Hadi Sumarno. To strengthen her knowledge on statistics, she had finished Ph.D program in 2012 at the same place and received scholarship from BPPS. The advisors of her PhD dissertation were Dr. Ir. Asep Saefuddin, M.Sc., Prof. Dr. Ir. Khairil Anwar Notodiputro, MS. and Dr. Ir. Aji Hamim Wigena, MSc. A part of this dissertation was written in Pennsylvania State University with Prof. Ganapati P. Patil as her advisor and Prof. Wayne L. Myers and Prof. Sharad Joshi as her co-advisors. Research in PSU was supported by Directorate General of Higher Education Indonesia (DIKTI) as Doctoral Sandwich Program 2010. She also involved in Geoinformatic Research of Penelitian Hibah Pascasarjana DIKTI. As long as doing her Ph.D, she has written some papers, which published at national and international seminars. The papers are:
1. YektiWidyaningsih, Asep Saefuddin, Khairil Anwar Notodiputro, Aji Hamim Wigena. (2012). Nested Generalized Linear Mixed Model for Correlated Nested Data with Ordinal Response. Jurnal IPTEK ITS Volume 23/No.2/May 2012.
2. Yekti Widyaningsih, Wayne L. Myers, Asep Saefuddin. (2012). Sub Districts Poverty Level Determination using Ordering Dually in Triangle (ORDIT) Ranking Method, Jurnal Math Info Volume 5/No.2/July 2012.
3. YektiWidyaningsih, Asep Saefuddin, Khairil Anwar Notodiputro, Aji Hamim Wigena. (2012). Nested Generalized Linear Mixed Model with Ordinal Response: Simulation and application on Poverty Data in Java Island. AIP Conference Proceedings of The 5-th International Conference on Research and Education in Mathematics, Institut Teknologi Bandung, October 2011. 4. YektiWidyaningsih, Asep Saefuddin, Khairil Anwar Notodiputro, Aji Hamim
Wigena. (2011). Ordering Dually in Triangles (Ordit) and Hotspot Detection in Generalized Linear Model for Poverty and Infant Health in East Java. Paper in The 6-th SEAMS-GMU 2011 International Conference on
Mathematics and Its Applications, Universitas Gadjah Mada, Yogyakarta, July 2011.
xviii
Paper in ICCS-X, Cairo, Egypt, December 2009.
7. Yekti Widyaningsih and Siti Nurrohmah. (2009). The Application of Spatial Scan Statistics on The Tuberculosis Hotspot Detection in Indonesia.
Procceeding of IndoMs International Conference on Mathematics and Its Application (IICMA), Yogyakarta.
8. YektiWidyaningsih and Asep Saefuddin. (2008). Health Profile 2005 and Geoinformatic of Diseases in Indonesia. Paper in the International Workshop on Digital Governance and Hotspot GeoInformatics, Jalgaon, India, March 11-24, 2008.
9. Yekti Widyaningsih dan Tjiong Giok Pin. (2008). A Space-Time Scan
Statistics to Detect Cluster Alarms of Dengue Mortality in Indonesia 2005. An article of jurnal “Makara seri Sains” Volume 12 No.1/April 2008.
10.Yekti Widyaningsih and Asep Saefuddin. (2007). Disease Outbreak in Indonesia: The Application of Scan Statistics. Paper in the 1st International Conference on Theory and Practice of Electronic Governance. Macau Polytechnic Institute, 10-13 December 2007.
11.Yekti Widyaningsih. (2007). Model, Calculations, and Application of Spatial Scan Statistics. Paper in the International Conference on Mathematics and Its Applications, Universitas Gadjah Mada, Yogyakarta, August 2007.
12.Yekti Widyaningsih. (2007). A Space-Time Permutation Scan Statistics for Disease Outbreak Detection. Poster at 1st Joint Seminar UI-UKM 2007. Universitas Indonesia.
xix
!
"
#
$
%
xxi Page
ABSTRACT ……… v
SUMMARY ……… vii
ACKNOWLEDGEMENTS ……… xv
TABLE OF CONTENTS ... xxi
LIST OF TABLES ……….. xxiv
LIST OF FIGURES ………. xxv
LIST OF APPENDIXES ………... xxvii
LIST OF ABBREVIATIONS ……….. xxviii
GLOSSARY ………. xxix
1 INTRODUCTION 1 1.1 Background ... 1
1.2 The Purpose of Research ... 10
1.3 Research Framework... 11
1.4 The Outline of Disertassion ... 13
1.5 Novelty ... 14
2 SUB DISTRICTS POVERTY LEVEL DETERMINATION USING ORDERING DUALLY IN TRIANGLE (ORDIT) RANKING METHOD 15 2.1 Introduction ………... 15
2.2 Theoretical Background ……….. 17
2.2.1 Rating Relations/Rules for Ascribing Advantage ………... 17
2.2.2 Subordination Schematic and Ordering Dually in Triangles (ORDIT) ………. 18
2.2.3 Product-order Rating Regime ………. 19
2.2.4 The concepts of Askeskin or HIP and Surkin or PL ..……. 21
2.3 Methodology ………... 22
2.4 Results and Discussion ………... 25
xxii
BASED ON SIMULATION STUDY
3.1 Introduction ... 33
3.2 Theoretical Study ……….. 34
3.2.1 The Concept of Hotspot Detection ……… 34 3.2.2 Hypothesis testing for comparison between SS and ULS.. 37 3.2.3 Circle-based Scan Statistics (SS) Hotspot Detection .…. 38 3.2.4 Upper Level Set (ULS) Scan statistics ………. 39 3.3 The Methods ... 42 3.3.1 The Steps of Simulations ………. 43 3.3.2 The Fourteen Criteria ….……….. 44 3.4 The Results and Analysis ……..………. 46 3.5 ULS Hotspot Detection for Bad Nutrition Case in Java Island … 49 3.6 The Results of Bad-nutrition Hotspot Detection ……….. 49
3.7 Conclusion ……… 51
4 NESTED GENERALIZED LINEAR MIXED MODEL FOR
CORRELATED DATA
53
4.1 Introduction ……… 53
xxiii 4.4 Results and Discussion ……….. 100
4.4.1 Standard error of parameter estimates ……… 100 4.4.2 Significance (p-values) ………... 105
4.5 Conclusion ………... 109
5 GENERAL DISCUSSION 111
6 CONCLUSION AND RECOMMENDATION 117
6.1 Conclusion ... 117
6.2 Recommendation ……….. 118
xxiv
Page 1 Entities with 3 indicators ……….. 18 2 Six leading lines of the poverty dataset ……… 24 3 Description of indicators HIP and PL ……….. 24 4 The first 6 lines of the data: identity number (id), province, district,
sub district, indicators values, and sub district’s ranking based on indicator ………
27 5 The first 6 lines of the result obtained by applying the ProdOrdr
function to place poverty measurement rank of sub district ………….
27 6 The ten most severe sub districts according to ORDIT ranking …….. 29 7 The ten least severe sub districts according to ORDIT ranking ……... 29 8 Poverty level of sub districts in the West, Central and East Java ……. 30 9 Performance criteria comparison of ULS and SS for 5% significance 47 10 Performance criteria comparison of ULS and SS for 1% significance 48 11 General structure of data layout ………... 66 12 General structure of nested data layout ………... 67 13 Link function name, form, inverse of link function, and range of the
predicted mean ……….
81 14 The first and second derivatives of link function ……… 81 15 Provinces, districts and number of sub districts ………. 97
16 Data description ……… 97
xxv 1 Un-nested hotspot (dark color areas are the hotspots) ………. 4 2 Nested hotspots in three provinces ………. ……… 4
3 Research Diagram ……… 8
4 The systematic of the research activity ……… 12 5 Research diagram: relation among chapters ……….. 13 6 X-shaped Hasse diagram of five entities labeled as A, B, C, D and E 18 7 Subordination schematic with plotted instance dividing a right
triangle into two parts, a ‘trapezoidal triplet’ (of AA, SS and II) below, and a ‘topping triangle’ (of CCC, SS and II) above…………. 19 8 The Map of Java Island with districts identity ………..………… 23 9 Scatterplot of the indicators: HIP vs PL ………... 26 10 Boxplot of the indicators, HIP and PL………... 26 11 Precedence plot (based on place ranks) of subdistricts from R
commands ………. 28
12 A study area with zone and non zone areas ……….. 35 13 A part of circle based hotspot detection process ………... 39 14 A map and its adjacent matrix ……….. 41 15 ULS hotspot detection process (dark color is the hotspot) …………... 42 16 ULS Hotspot of bad nutrition in Kuningan, Karawang,
Majalengka, Temanggung, Boyolali, and Cilacap ……… 50 17 ULS Hotspot of bad nutrition in Blitar, Ngawi, and Jember ………… 51
18 The scheme of modeling……… 54
19 An ordered response and its latent variable ………. 63 20 Changes in the value of x that cause changes in the magnitude of
probability; a1, a2, a3 is the threshold ……….. 64 21 The effect of a covariate on the transformed cumulative
probabilities (pdf of Y for some values of x) ……… 64 22 Developing of Zhang’s and Lin’s GLMM ... 92 23 Study Area with 3 provinces {s = 1, 2, 3}, 3 districts are randomly
chosen from West and East Java {i = 1, 2, 3}, and 2 districts are randomly chosen from Central Java {i = 1, 2}. There are nsi sub districts in district i of province s...
93
xxvi
xxvii 1 Fourteen criteria of poverty ……….. 127 2 Concept of health insurance for the poor (hip) or askeskin and
certificate of cannot afford (PL) or surkin ……….. 128 3 Sum of Poisson Random Variables ………. 129 4 Multinomial distribution ……… 130 5 Maximum Likelihood Estimation ……….. 131 6 Conditional simulation with hotspot z assumed known ……….. 132 7 Output of true hotspot1 simulation Central Java ………. 133 8 Output of true hotspot2 simulation Central Java ………. 135 9 Output of true hotspot1 simulation Java Island ……… 137 10 Output of true hotspot2 simulation Java Island ……….. 139 11 Output of true hotspot1 simulation Map X ……….. 141 12 Output of true hotspot2 simulation Map X ……….. 143 13 Output of true hotspot1 simulation Map Y ……….. 145 14 Output of true hotspot2 simulation Map Y ……… 147 15 Fourteen criteria of hotspot method for p-value =0.05 ……….. 149 16 Fourteen criteria of hotspot method for p-value =0.01 ………. 150 17 ULS Hotspot of bad-nutrition in Kuningan and Karawang... 151 18 ULS Hotspot of bad nutrition in Majalengka and Temanggung ... 151 19 ULS Hotspot of bad nutrition in Boyolali and Cilacap... 152 20 ULS Hotspot of bad nutrition in Blitar and Ngawi... 152 21 ULS Hotspot of bad nutrition in Jember ……….. 153 22 Parameter Estimates and Standard Errors of Nested GLM ……… 154 23 Significance of Nested GLM parameter estimates ……… 155 24 Classification result of Nested GLM ……… 156 25 Parameter Estimates and Standard Errors of Nested GLMM ……. 157 26 Significance of Nested GLMM parameter estimates ……… 158 27 Matrix Equation for Nested GLMM (an example)………. 159 28 Theorem of Pearson residual Moran’s IPR and IaPR ……… 160
xxviii
CSHD Circle based hotspot detection GEE Generalized estimating equation GLM Generalized linear model GLMM Generalized linear mixed model ORDIT Ordering dually in triangles
SS Scan statistic
ULS Upper level set
WCM Working correlation matrix
xxix
Askeskin Health insurance for the poor (asuransi kesehatan untuk orang miskin).
Cluster A grouping containing ‘lower level’ elements. For example in a survey sample, a district (cluster) containing of sub districts. Explanatory
variable
An independent variable: in the fixed part of the model usually denoted by X and in the random part by Z.
Fixed part The part in a model represented by Xβ, that is the average relationship. The parameters β are referred to as ‘fixed parameters’.
Hotspot Unusual phenomenon, anomalies, aberrations, outbreaks, elevated clusters, or critical areas.
Kronecker product
An operation on two matrices of arbitrary size resulting in a block matrix.
⊗ =
Level A component of a hierarchical data.
Nested The clustering of units into a hierarchy (level). Random part That part of a model represented by u or Zu.
Regional Area (daerah).
Response part The part of a model represented by Y. Also known as a ‘dependent’ variable.
Spatial Happening or existing in space. Study area An area of examination.
Surkin (surat miskin)
Certificate of the Poor and Disadvantaged; Poverty letters;
SKTM (surat keterangan tidak mampu).
1
Chapter 1
INTRODUCTION
1.1 Background
Nowadays, the issues of poverty are often discussed. Although statistics
show that the number of poor in Indonesia decreased, from 30.02 million people
(12:49%) in March 2011 to 29.89 million people (12:36%) in September 2011,
Indonesia is still facing the problem of poverty (BPS 2011).
Related to poverty alleviation programs, many policies should be made,
especially at the time when the government needs to make decisions about which
area should get a priority to receive a treatment. In making this decision, ranking
and the hotspot detection technique should have a role to support the decisions.
Furthermore, modeling is also important to know which factors are related to
poverty.
Ranking, and hotspot detection, and modeling are three important methods in
statistics used to evaluate and examine data in everyday life and in many fields of
study. The data could be the number of disease cases, people in poverty, particular
animals or plants related to biodiversity or environment and ecology, and many
others. Related to an effort to alleviate poverty, SMERU1 also ranks areas
in several regions in Indonesia based upon poverty levels. Ranking in the
representation of poverty, can support an objective decision-making and will
increase the transparency of government decision making. Moreover, a
well-defined poverty level can lend credibility to government decision making
(Widyanti 2003). In addition, cases of bad nutrition currently occur in nearly
all parts of Indonesia. About 4 million children in Indonesia are exposed to the
risk of bad nutrition (Yurnaldi 2008). In this problem, hotspot areas need to be
known to support the objective decision in a poverty reduction programs.
Furthermore, modeling for poverty data by taking into account the different
conditions of a region from other regions and the resource constraints is necessary.
1
Statistical models that correspond to these conditions should able to overcome the
nested and random conditions.
Based on those thoughts and facts, the study in this dissertation is about
ranking and hotspot detection, and incorporating the results of these two methods
in the development of Nested Generalized Linear Mixed Model.
Currently, ranking, hotspot detection, and modeling techniques are being
developed by experts. The ranking method that is based on several indicators using
ecological and environmental data was developed by Myers and Patil (2010).
Hotspot detection method was developed by Kulldorf (1997), Patil and Taillie
(2004), and Duczmal, Tavares, Patil, and Cancado (2010), whereas modeling with
fixed factors and hotspot as covariates was developed by Zhang and Lin (2008).
This dissertation combines these three approaches, with a focus on model
development of Zhang’s and Lin’s model and applied on poverty data.
The ranking method applied in this study is ORDIT (ORdering Dually In
Triangles) used to rank individuals (unit observations) based on several indicators
(Myers and Patil 2010). Furthermore, a comparison of the two hotspot detection
methods, namely Circle-based scan statistics by Kulldorff (1997) and upper level
set scan statistics by Patil and Taillie (2004) has been studied. The development
and implementation of the model is based on nested GLM (Generalized Linear
Model) and nested GLMM (Generalized Linear Mixed Model) using GEE
(Generalized Estimating Equation) method and pseudo likelihood, respectively for
parameter estimation.
GLMM is a statistical model accommodating fixed effects and random
effects, while GLM only uses fixed effects. The distributions of the response
variables are not restricted to the normal distributions, but distributions within the
exponential family. Some GLMM principles used in the formation of spatial
models are mentioned by Lawson and Clark (2002) in their discussion of the
possibility of risk of non-continuity surface and Loh and Zhu (2007) calculated the
spatial correlation of the scan statistic with the GLMM spatial model in an effort to
obtain more accurate analysis results. Furthermore, some researchers have begun
to explore geographic and ecologic potentials to be used as explanatory variables
two breast cancer cases, Roche et al. (2002) compared these two geographic areas, cluster and non-cluster, and found that the two tend to be isolated due to a
language factor. This research suggests that identified risk factors may contribute
to the observed patterns, but since the cluster detection separates between control
and non-control factors, it is impossible to use it as a statistical conclusion.
Furthermore, a study by Zhang and Lin (2008) is to improve predictability of the
model through the incorporation of explanatory variables and the process of spatial
cluster detection in a frequentist approach. In other words, Zhang and Lin combine
hotspots and modeling, where explanatory variables and hotspot were observed
simultaneously.
Zhang and Lin (2008) apply a spatial GLMM with cluster (hotspot area) of
Kulldorff (1997) as explanatory variables. Through this modeling Zhang and Lin
have detected the hotspot significance which appeared in the spatial data. The
hotspots detected in this model is the common hotspot with a single level (not
nested) as presented in Figure 1. As geographical and ecological factors also
contribute to identifying the hotspot, the model also pays attention to the
geographic and ecological components. This research is aimed to further analyze,
what if the model is applied to the data with spatially nested form. In other words,
we develop models that have been introduced by Zhang and Lin (2008) with
respect to nested hotspots as shown in Figure 2. In the nested spatially GLMM
model, hotspots can be assumed as fixed effects, in which the response variable is
measured in ordinal scale. It is also important to note that the result of the
estimation will take into account for conditions in which the variables are as well
as independent variable and random effect.
One thing that is often ignored in statistical modeling is the variance of the
data. The variance of the data can be viewed as global or local variance. A global
variance is calculated and observed based on overall data variability while local
variance is calculated and observed based on a group of data. A research using
mixed-effects regression modeling with heterogeneous variances for analyzing
Ecological Momentary Assessment (EMA) data was made by Hedeker and
Figure 1 Un-nested hotspot (coloring areas are the hotspots)
It is always possible to find data with different variances in different
conditions or groups of data. For this kind of data, we must consider the existence
of local variance and parameter model estimation to be addressed. The appropriate
method of parameter estimation is GEE, the generalized estimating equation
(Hardin and Hilbe 2003).
GEE parameter estimation method uses a working correlation matrix to
estimate model parameters. The elements of this matrix are correlation values
between observations in a cluster. If the subject or district i has ni subdistricts, the
dimension of the working correlation matrix is ni × ni, and correlations are
computed based on ni pairs observations for all i. Higher correlations will be
appropriate to the smaller variance in a cluster (subject). Therefore, the GEE
method is used to estimate the model parameters for clustered data.
As mentioned before, issues relating to quality levels are often found in
everyday life. Ranking is important for problems related to priorities and
efficiency. This study will also discuss and implement the ranking method to
poverty data and then categorize the results into a few degrees to be used as an
ordinal response variable in modeling. This ordinal response represents the level of
subject quality. Similarly, the outbreak (hotspot) in a particular area is also
important and interesting to be studied. Associated with the model development,
hotspot status of a sub district will be included also as an explanatory variable in
the model to know its contribution to the poverty level. This part follows Zhang’s
and Lin’s model development.
Zhang and Lin (2008) used a spatial GLMM with cluster (hotspot area) on
data based on a vast land area (continental) and absence of nested area assumption.
This research, however, uses the spatial data analysis in nested spatial form by
taking into account the local variances. The necessary of nested can be caused by a
local variance of data generated by some factors including environmental
conditions or other factors like history and cultural characteristics of individuals in
the area as described in the following 4 paragraphs to differentiate among West,
Central, and East Java.
According to research of “Civilization Java” by Rahardjo (2011),
“East Java Survey of Poor Families” by Garner and Amaliah (1999), there are dissimilar characteristics among West, Central, and East Java.
The geographical characteristics make Central Java more closed than East
Java. Almost all the main mountains in Central Java are located in the center of
the province, and the coastline is like a thick wall that limited access to the outside
world in the ancient times. In contrast, the center of civilization in East Java is
much more open. Although there are several mountains in East Java, they do not
form an impenetrable barrier wall from or to the coast. Two of the largest and
longest rivers in Java (Brantas and Solo) can be navigated through to the interior.
Industry and trading activities have occurred much earlier in Central Java and East
Java. These conditions have formed the culture and characteristics of two peoples
who are rather different (Rahardjo 2006).
On the other hand, according to the report of household research conducted
by Hondai (2006), consumption expenditure varies considerably (significantly)
from one province to another. Central Java consist mainly rural areas except a few
medium sized urban areas, which are rather homogeneous in the region compared
to West Java. In the following analysis, the author investigates changes in
inequality of West Java as a representative of a rapidly growing industrial region
and Central Java as a representative of a rather homogeneous rural region of the
country.
Furthermore, a recent Survey of Poor Families reported that very high rates
of malnutrition were found in East Java, which was considerably higher than
national rates and those of other provinces in Java (Gardner 1999).
The purpose of this study is to develop a model GLMM of Zhang and Lin for
data with ordinal response variable with respect to the local variance (nested
conditions). Associated with a written statement by Hardin and Hilbe, the term
nested is synonymous with panel. Hardin and Hilbe (2003) stated (in their book
entitled Generalized Estimating Equations) there is a correlation between
observations in the panel data, and if the common likelihood is used for parameter
estimation regardless of the condition of data, then it is not correct, and can result
in an interpretation that is not true, because the variance matrix is assumed to have
consider the spatial correlation between areas in the nested condition. Many
components affect the variance; for example: natural resources, climate,
environment, language, culture, customs, demographics, life style, and others.
Related to the spatial modeling, Figure 3 represents the studies of spatial
GLM and GLMM developed by researchers, and the position of the current study,
which is nested GLMM using spatial data in this dissertation. Figure 3 provides an
explanation about the Spatial Model. In general, the spatial model is divided into
two parts, that is, the spatial GLM and spatial GLMM.
Spatial GLM has been developed by Schabenberger and Gotway (2005) as
Fixed Effects and the Marginal Specification. Parameter estimation in these models can be handled widely based on a specification of two first moments of
outcome, using Generalized Estimating Equation or Quasi-likelihood from
Wedderburn (1974). Schabenberger and Gotway (2005) also developed spatial
GLMM that they named the Mixed Models and the Conditional Specification,
which is a GLM with the conditional approach incorporating the unobserved
spatial process as random effects within the mean function. The conditional mean
and variance of outcome are modeled as a function of both fixed covariate effects
and random effects deriving from the unobserved spatial process. In spatial
GLMM, Schabenberger and Gotway used Penalized Quasi likelihood estimation
from Breslow and Clayton (1993) and Pseudo-likelihood estimation from Wolfinger and O’Connell (1993).
Hao Zhang (2002) developed spatial GLMM using MMSE prediction
Metropolis-Hasting algorithm, while GLMM for point referenced spatial data was
developed by Gamperli and Vounatsou (2004) using Laplace Approximation.
Furthermore, approximate Bayesian inference in spatial GLMM was developed by
Jo Eidsvik, Sara Martino and Håvard Rue (2007). Some other spatial GLMMs
developed are approximate Bayesian inference with skew normal latent variables
by Hosseini (2011) and estimating spatial pattern using GLMM by Kwak
et.al.(2012). As mentioned before, spatial GLMM with clusters (hotspot area) of Zhang and Lin (2008) is one of the spatial GLMMs in this diagram, which has
F
igure
3 Re
se
arc
h D
ia
gra
[image:37.595.44.477.65.756.2]showed by the light blue rectangular of the diagram at Figure 3. Nested GLM and
GLMM with spatial data will be developed and the GEE method is used for
Nested GLM parameter estimation and pseudo likelihood is used for Nested
GLMM parameter estimation. Comparison of parameter estimations based on
some working correlation matrices will be studied, and the application of the
model is on assessing the effect of some covariates on some poverty level.
Other models in the diagram are Multilevel Models for ordinal data by
Kenett (2011), Multilevel models with ordinal outcomes and their application on
psychology data using Maximum Likelihood and Penalized Quasi Likelihood as
the parameter estimation method (Bauer and Sterba 2011). Ordinal response was
used in all these models.
Based on the explanation above and literature reviews related to spatial GLM
and GLMM, it appears that the majority of research over the last few years
concern parameters estimation technique, while research in this dissertation
concerns correlation within cluster. The Focus and novelty of this research is the
modification of Zhang’s and Lin’s model on ordinal response variable and taking into account the clustered nested data. The attention to the correlation matrix is to deal with clustered conditions of data.
Research in this dissertation focuses on developing of GLM and GLMM of
ordinal response in nested spatially data with study on some working correlation
matrices, showed in blue rectangles at Figure 3. Some interesting studies related to
this research are developed and discussed in chapters 2 and 3 of this dissertation.
As mentioned before, poverty issues have often been discussed and is still a major
problem in Indonesia. In an effort to be able to contribute ideas and thoughts, this
dissertation takes poverty as the main application problem with ranking, hotspot
detecting and modeling as the methods used for analysis.
Based on the statements, ideas, thoughts and explanations above, the
questions in this study are
1. How to determine the level of severity or poverty of sub districts in Java
and which areas are the most and the least severe in poverty.
2. How to know the best hotspot method and which areas are a hotspot of a
3. How to build a model for clustered nested data with multinomial ordinal
response, and how to estimate the model parameters.
4. How is the influence of working correlation matrix structure on the
estimation of model parameters for nested correlated data?
5. How are the differences of the model parameters estimate between Nested
GLM and Nested GLMM?
1.2 The Purpose of Research
Based on several initiatives and issues described in section 1.1, the purpose
of this dissertation is
1. To determine sub districts level of severity (poverty) in Java.
2. To obtain the best hotspot detection method between the two methods that
will be studied and to apply this best method on a factor for modeling.
3. To build a model for nested data with multinomial ordinal response, and to
estimate the model parameters. Furthermore, to know parameter estimate
of explanatory variables in every province used in modeling.
4. To study the influence of the working correlation matrix structure on the
estimation of model parameters for data with clustered nested condition.
5. To study the differences of the model parameter estimation between Nested
GLM and Nested GLMM.
The purpose of this study have been achieved and described in chapters 2, 3,
and 4. The first purpose is achieved in chapter 2, as the study of ORDIT ranking
method and its implementation on the poverty data. The second purpose is
achieved in Chapter 3 as the study and comparison of two hotspot detection
methods. The best method has been used to detect bad nutrition hotspot area.
Finally, the third, fourth and fifth purposes are achieved in Chapter 4, modeling
and its implementation on poverty data using the result of Chapter 2 as the
dependent variable, while the result of chapter 3 is used as an independent variable
1.3 The Research Framework
The systematic of the research is described in Figure 4. The left side of
diagram in Figure 4 is ORDIT ranking method, which is studied and implemented
on poverty data in Java. This method is built to carry on ranking of individuals
based on several indicators, but due to the limitation of data, this study uses only 2
indicators of poverty, i.e. health insurance for the poor (hip) or askeskin and
statement of inability to pay or poverty letter (pl) or surkin. The result of ranking
is grouped into three levels and used as ordinal response in the modeling.
The right side of the diagram is a study and comparison between two hotspot
methods, i.e. Circle based Scan Statistics (SS) hotspot detection and Upper Level
Set Scan Statistics (ULS) hotspot detection. As a result, the best method is ULS,
which is used to detect hotspots of bad nutrition in some districts. The result of this
detection is used as an explanatory variable in the modeling.
In the middle of diagram is the development of GLM and GLMM modeling.
The spatial GLMM of Zhang and Lin (2008), spatial correlated in modeling from
Cressie (1993), and the problem related to global variance (variance among
subjects) and local variances (variance among sub subjects) in the spatial data
(Goldstein 1995), and Managing Clustered Data Using Hierarchical Linear
Modeling (Warne et.al. 2012) are the support theory of working correlation
matrices (WCM) determination for nested GLM and nested GLMM.
The work on modeling is as follows. The determination of a working
correlation matrix is the beginning step to estimate the model parameters. In this
study, two types of working correlation matrices are discussed and the most
appropriate type for the data in this research is determined. The model parameters
of GLM are estimated by the GEE method (Hardin and Hilbe 2003), while the
model parameters of GLMM by Pseudo likelihood approach (Wolfinger and O’Connell 1993). Furthermore, multinomial ordinal response variables gives complexity to the model building and parameters estimation. Applications
presented in this dissertation are analysis of poverty data. The source of the data is
F
igure
4 T
he
s
ys
te
m
at
ic
of
re
se
arc
h a
ct
iv
it
[image:41.595.81.454.76.748.2]1.4 The Outline of Dissertation
This dissertation is divided structured into six chapters as described in Figure
5. Chapter 1 is an introduction that contains the background, the purpose of
research, the research framework, and the dissertation outline.
Chapter 2 discusses a method of ranking, namely the Ordering Dually in
Triangles (ORDIT) and its implementation on severity or poverty data from BPS.
The method applied on data concerning the level of poverty in several sub-districts
in Java is based on the number of statements of inability to pay and number of
health insurance of the poor. Furthermore, the result of poverty ranking is grouped
[image:42.595.96.491.46.760.2]into 3 levels and used as ordinal response in modeling in Chapters 4.
Chapter 3 is a comparison between two hotspot detection methods, namely
the Circle Based Hotspot Detection from Kulldorf (1997) and the Upper Level Set
Scan Statistics from Patil and Taillie (2004) using the 14 criteria. The best method
is used to detect bad nutrition cases in some districts and the result is used as an
explanatory variable in the modeling
Chapter 4 discusses the nested GLM and nested GLMM and their
implementations on the poverty data from BPS. Chapter 5 is a general discussion
of modeling in Chapter 4. Chapter 6 contains conclusions and recommendations
and well a summary of the research in this dissertation.
1.5 Novelty
As described in Section 1.1, the model of Zhang and Lin (2008) was based
on un-nested spatial area. It is important to develop a model for nested spatial,
where the forms of correlations or variance-covariance matrix is based on the
condition of the data considered. As an archipelago, Indonesia is an example of the
nested spatial condition where lines, distance, and water act as borders that set off
one area from another. When a hotspot emerges in such areas, we need to know
how significant they are in generally. This question has been successfully
answered by the development of a nested-spatial model.
The novelty of this dissertation is the model that involves an interesting hotspots used as an explanatory variable and a spatial factor used as the random
effect in nested area. This model is called as the Nested Generalized Linear Mixed
Model (NGLMM), a development from the Zhang and Lin’s model (2008). However, the spatial factor in this model is not given the same treatment as spatial
factors in other spatial modeling. Spatially unstructured random effect is assumed
to be identical independent, and normal distributed. The GEE approach is used for
inference and is adapted for the nested spatial condition for GLM with a
multinomial ordinal response variable which gives complexity to the parameter
15
Chapter 2
SUB DISTRICTS POVERTY LEVEL DETERMINATION
USING ORDERING DUALLY IN TRIANGLE (ORDIT)
RANKING METHOD
2.1 Introduction
Poverty is one of major problems faced by Indonesia and other developing
countries from year to year (BPS 2011). As mentioned in Chapter 1, although the
number of the poor in Indonesia has decreased, Indonesia still faces the problem of
poverty. This problem is becoming increasingly unable to be resolved. Poverty is
both a cause and a consequence of poor health (Alisjahbana 2011). Poverty
increases the chances of poor health. Poor health in turn traps communities in
poverty. Infectious and neglected tropical diseases kill and weaken millions of the
poorest and most vulnerable people each year (Schirnding and Mulholland 2001).
There are several criteria that can be used as a basis for determining the
poverty threshold. It depends on the policy of a government to determine the
indicator. The most commonly used definition of global poverty is the absolute
poverty line set by the World Bank. Poverty is set at an income of $2 a day or less,
and extreme poverty is set at $1 a day or less. This line was first created in 1990
when the World Bank published its World Development Report and found that
most developing countries set their poverty lines at $1 a day. The $2 mark was
created for developing nations with slightly better income levels than their $1 a
day. In Indonesia, the number of people in poverty is a population that is under the
poverty line. The poverty line is used as a boundary to determine whether or not a
person is poor. The poor are the people who have an average per capita
expenditure per month below the poverty line of Rp 211.726 (about U.S. $ 20)
(BPS 2011).
Poverty line is a criterion to detect a person as the poor or not the poor. But,
because of different standard of living in different regions, it is not easy to
determine number of people in poverty using this criterion. Poverty criteria used in
(PL or surkin or SKTM). As indicators of poverty, HIP and PL will be counted
aggregately for every sub district. The concepts of askeskin and surkin will be
explained in section 2.2.4.
Connected to the poverty problem, poverty alleviated program is a concern of the
Government as a way out to reduce poverty. Policies should be made related to yield
the right decision for fund allocated. The poverty alleviated fund should be
received by the right sub districts or districts. For this reason, Government should
know the priority areas, which can be obtained from accurate ranking. ORdering
Dually In Triangle (ORDIT), a ranking method developed by Myers and Patil, is
used to rank sub districts based on poverty using two indicators.
ORDIT was developed with the purpose to provide convenient
computational capability and visualizations for preliminary partial or progressive
prioritization based largely on concepts of partial order theory and implemented in
R software. It was illustrated in a context of conservation and sustainable
stewardship across landscapes with ecosystem services as a complex
multidimensional domain that must be placed in public and private perspective in
pursuit of multi-resource management (Myers and Patil 2010). In other words,
ORDIT was developed for ranking with convenient computational and
visualization providing. The method is also powerful for ranking process with
multi indicators as shown through the illustration by Myers and Patil. Due to the
limitation of data resources, the number of indicators in this study is just a few,
that is 2 indicators, i.e. number of health insurances for the poor and number of
poverty letters.
The objective of this chapter is to rank sub districts in Java Island base on
poverty or severity level using ORDIT. According to the ranking result, 7 most
severe and 7 least severe sub districts will be revealed. The result of ranking will
be grouped into 3 levels, i.e. worst, moderate, and mild to simplify interpretation
and to distinguish two sub districts in terms of poverty level. Furthermore, this
2.2 Theoretical Background
ORDIT ranking method uses multi-indicators in the ranking process, where
its theoretical concepts are discussed in the following sub sections based on Myers
and Patil (2010).
2.2.1 Rating Relations/Rules for Ascribing Advantage
This section describes the protocols for comparing cases or collectives of
cases via ratings, rules and relations that ascribe advantage to some cases over
some others or fail to do so for particular pairs. There are 3 possibilities in
comparing a pair of cases, where one case is denoted by and the other by . “Г
aa ” wherein Г has ascribed advantage over . Г has subordinate statusto (“Г ss ”) which implies “ aa Г”. “Г ii ” whereby these are indefinite instances
without ascribed advantage and without subordinate status, which implies “ iiГ”.
Thus the protocol either designates one member of a pair as having ascribed
advantage and the other subordinate status, or that pairing as being indefinite.
ff(aa) is the frequency of number of occurrences for the ascribed advantage. ff(ss)
is the frequency of number of occurrences for the subordinate status. ff(ii) is the
frequency of number of occurrences for being indefinite. Each of the N cases can
be compared on this basis to all others in the deleted domain DD=N–1 of
competing cases with the percent occurrence of these relations being tabulated as
follows, AA = 100 × ff(aa)/DD, SS = 100 × ff(ss)/DD, II = 100 × ff(ii)/DD.
Clearly, AA + SS + II = 100%; and for later use let CCC = 100 – AA as the
complement of case condition relative to ascribed advantage (AA) (Wayne and Patil 2010).
Figure 6 gives is a Hasse Diagram of entities at Table 1. This simple
X-shaped Hasse diagram is used to illustrate wherein entity A has ascribed advantage
over B, D, and E while being indefinite with regard to C. Entity B has subordinate
status to A and C with ascribed advantage over D and E. The deleted domain
Table 1 Entities with 3 indicators
Entity Ranking
Indicator1 Indicator2 Indicator3
A 1.5 2 3
B 3 5 4
C 1.5 3 2
D 8 6 7
E 7 8 6
Figure 6 X-shaped Hasse diagram of five entities labeled as A, B, C, D and E
2.2.2 Subordination Schematic and ORDIT
Subordination can be symbolized diagrammatically (Wayne and Patil 2010)
in a triangle depicted in Figure 7, where the point with coordinates (SS, AA)
representing a sub district makes a triangle divided into two parts, ‘trapezoidal
triplet’ (of AA, SS, and II) and a topping triangle (of CCC and II). The
combination of these two parts forms a right triangle with the ‘tip’ at AA = 100%
in the upper-left and the toe at SS = 100% in the lower-right. The hypotenuse is a
right-hand ‘limiting line’ of plotting positions because AA+SS+II=100%. Topping
triangle provides the basis for an ‘ORDIT ordering’ of the districts or instances.
According to the Figure 7, an idealized district has AA = 100% of the deleted
domain (DD) of other districts, that is the frequency of ascribed advantage being
equal to the number of competing districts, so if the ideal actually occurs, then the
Figure 7 Subordination schematic with plotted instance dividing a right triangle into two parts, a ‘trapezoidal triplet’ (of AA, SS and II) below, and a ‘topping triangle’ (of CCC, SS and II) above.
The numbers for ORDIT can be coupled as a decimal value ccc.bbb. The ccc
component is obtained by rounding CCC to two decimal places and then
multiplying by 100. The bbb component is obtained by dividing SS by CCC, and
imposing 0.999 as an upper limit. And then add these two values as ccc.bbb. This
ordering is assigned the acronym ORDIT and preserves all aspects of AA, SS and
II except for the actual number of districts. Simple rank ordering of ORDIT values
becomes salient scaling of the district (Myers and Patil 2010).
2.2.3 Product-order Rating Regime
A general relational rule for ascribing advantage is product-order whereby
advantage is gained by having all criteria at least as good and at least one better.
Conversely, subordinate status lies with having all criteria at least as poor and at
least one poorer. This relational rule is applicable to all kinds of criteria as long as
they have the same polarity (sense of better and worse). Scheme 2 (Myers and
Patil 2010) in Function Facilities gives an R function called ProdOrdr that
determines ORDITs and salient scaling according to product-order. This function
takes as its inputs a vector of IDs for instances, a data frame of same-sense criteria.
All indicators are either positive sense or negative sense. Indicators in this research
more severe the district. The output is a data frame of ORDITs and salient scaling
values
According to Figure 7 and its computation, ORDIT ordering is the ranking of
the instances based on their indicators. ORDITs and salient scaling according to
product-order are determined by Scheme 2 (Myers and Patil 2010) in Function
Facilities of R function. ORDIT is topping triangle in Figure 7 and Salient is the
ranking of ORDIT.
Precedence Plots
Based on computation of AA and SS for a particular district, the structure in
the lower part of the subordination schematic can be used to prepare a ‘precedence plot’ for visualization. The precedence plot is a plot for AA (ascribed advantage) as Y-axis and SS (subordinate status) as the X-axis. Prominent position declines
from top (upper-left) to toe (lower-right). Primary prominence varies vertically
showing that there is a larger percentage of ascribed advantage (greater severity)
with increasing height. Horizontal variation on a given level shows clarity of
comparison. Farther to the right is greater clarity as more definite advantage (less
severe) with a larger percentage of subordinate status versus indefinite instances
among the couplets where ascribed advantage is lacking. In other words, more
indefinite instances constitute increased lack of clarity (incomparability in the
usual parlance of partial ordering). Scheme 3 (Myers and Patil 2010) in Function
Facilities gives an R function named PrecPlot which accepts the output of the
ProdOrdr function and produces a precedence plot.
Representative Ranks
Representative ranks show descriptive statistics of indicator rankings of each
district. In other words, representative ranks show descriptive statistics for each
district according to ranking of indicators. The function of representative ranks is
to see how crucial a district is. The rank numbers received by a given district
across all criteria can be placed in a single array and sorted in ascending order.
component for the case. This computation is effective for several indicators.
Because the only two indicators, this study did not raise this calculation
2.2.4 The concepts of Askeskin or HIP and Surkin or PL
Health is a right and an investment for all citizens. All citizens are allowed to
health, including the poor. It required a system that regulates the implementation
of the effort to fulfill the right of citizens to remain healthy, with emphasis on
health care for the poor.
Basic right to receive health care is a policy launched by Government
included in handling th