Generalized linear mixed models of ordinal poverty response in nested area

(1)

GENERALIZED LINEAR MIXED MODELS

OF ORDINAL POVERTY RESPONSE

IN NESTED AREA

YEKTI WIDYANINGSIH

SCHOOL OF GRADUATE STUDIES BOGOR AGRICULTURAL UNIVERSITY

(2)

(3)

iii

THE STATEMENT OF DISSERTATION AND SOURCES OF INFORMATION

I hereby declare that the dissertation entitled "Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area" is my own work under direction of the supervisory committee and has not been submitted in any form to any university. Sources of information derived or quoted from the work published or unpublished of other authors mentioned in the text and listed in the Bibliography (References) at the end of this dissertation.

Bogor, July 2012

(4)

iv

PERNYATAAN MENGENAI DISERTASI DAN SUMBER INFORMASI

Dengan ini saya menyatakan bahwa disertasi berjudul "Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area" adalah karya saya sendiri di bawah arahan para pembimbing dan belum diajukan dalam bentuk apa pun kepada perguruan tinggi mana pun. Sumber informasi yang berasal atau dikutip dari karya yang diterbitkan atau tidak diterbitkan dari penulis lain telah disebutkan dalam teks dan dicantumkan dalam Daftar Pustaka (References) di bagian akhir disertasi ini.

Bogor, July 2012

(5)

v YEKTI WIDYANINGSIH. Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area. ASEP SAEFUDDIN, KHAIRIL A. NOTODIPUTRO, AJI HAMIM WIGENA.

The Linear Mixed Models in this study is a development of Spatial Generalized Linear Mixed Model proposed by Zhang and Lin (2008). As in Zhang’s and Lin’s model, spatial (regional) data in this study is concerned on the hotspot detection. Hotspot detection method used by Zhang and Lin was Circle Based Scan Statistic (SS) method of Kulldorf (1997), while research in this dissertation using Upper Level Set Scan Statistic (ULS) hotspot detection method of Patil and Taillie (2004). Application of this hotspot detection method begins by comparing the two methods through simulation to obtain 14 performance criteria, resulting that the ULS hotspot detection method is better than the other one. Furthermore, the ULS method is performed to detect hotspot of bad nutrition in some districts, the results are used as a covariate in the modeling. This study focuses on the development of models for regional data viewed from the proximity of nested observations. According to Cressie (1993) there is a tendency for adjacent observations have a stronger correlation than distant observations. In statistics, also could be said there are differences in the variation of individuals within a group with individuals between groups. This condition must be considered in the modeling. Generalized estimating equation (GEE) is a parameter estimation method accounts for the correlation between observations. Working correlation matrices (WCM) is an important part in the parameters estimation process. Three structures of correlation matrices are studied and implemented to know which structure is the most appropriate to the data. The results of parameters estimation of Nested GLM and Nested GLMM based on combinations of some WCMs and parameter estimation techniques were compared. Response variable used in the model is in ordinal scale having complexity in the modeling, which also a focus of this research, while response variable used in Zhang’s and Lin’s model is a count variable with Poisson distribution. This ordinal response is obtained by grouping the ranking result by ORDIT (Ordering Dually in Triangles) ranking method from Myer and Patil (2010). Through the development of the model in this study involving nested spatial data, better results is provided especially when using diagonal working correlation matrix.

(6)

vi

YEKTI WIDYANINGSIH. Model Campuran Linier Terampat untuk Respon Kemiskinan Ordinal dalam Area Tersarang. ASEP SAEFUDDIN, KHAIRIL A. NOTODIPUTRO, AJI HAMIM WIGENA.

Model Campuran Linier Terampat dalam penelitian ini merupakan pengembangan dari Spatial Generalized Linear Mixed Model (Spatial GLMM) yang sudah dikerjakan oleh Zhang dan Lin (2008). Sebagaimana pada model dari Zhang & Lin, data spatial yang digunakan dalam penelitian ini berkaitan dengan hasil pendeteksian hotspot. Zhang dan Lin (2008) menggunakan metode pendeteksian hotspot Circle Based Scan Statistic (SS) dari Kulldorf (1997), sedangkan penelitian dalam disertasi ini menggunakan metode pendeteksian

hotspot Upper Level Set Scan Statistic (ULS) dari Patil dan Taillie (2004). Aplikasi dari metode pendeteksian hotspot diawali dengan membandingkan kedua metode tersebut, yaitu SS dan ULS melalui simulasi untuk mendapatkan 14 kriteria kinerja. Hasil simulasi memberikan kesimpulan, bahwa metode pendeteksian hotspot ULS lebih baik. Selanjutnya dilakukan pendeteksian hotspot

gizi buruk pada beberapa kabupaten yang hasilnya digunakan sebagai peubah penyerta dalam pemodelan. Penelitian ini difokuskan pada pengembangan model untuk data spatial tersarang yang dipandang dari kedekatan pengamatannya. Menurut Cressie (1993) ada kecenderungan bahwa pengamatan-pengamatan yang berdekatan memiliki korelasi yang lebih kuat dibandingkan pengamatan-pengamatan yang berjauhan. Secara statistik dapat juga dikatakan ada perbedaan variasi individu-individu di dalam satu kelompok dengan individu-individu dari kelompok yang berbeda. Kondisi ini harus diperhatikan dalam pemodelan.

Generalized Estimating Equation (GEE) adalah suatu metode pendugaan parameter yang memperhatikan kondisi tersebut. Working correlation matrices

(WCM) yang merupakan bagian penting dalam pendugaan parameter dengan metode GEE dibahas dan diaplikasikan untuk beberapa struktur matriks korelasi, untuk mengetahui struktur WCM mana yang paling sesuai dengan kondisi data. Hasil pendugaan parameter dari Nested GLM dan Nested GLMM dengan kombinasi beberapa WCM dan teknik pendugaan parameter dibandingkan. Peubah respon yang digunakan dalam model adalah peubah respon berskala ordinal yang merupakan bagian teori yang cukup kompleks, yang juga menjadi fokus dalam penelitian. Sedangkan peubah respon yang digunakan pada model Zhang dan Lin adalah peubah tercacah yang berdistribusi Poisson. Peubah respon dengan skala ordinal diperoleh dari pengelompokan hasil metode ranking ORDIT

(Ordering Dually in Triangles) dari Myer dan Patil (2010). Melalui pengembangan model dalam penelitian ini, pemodelan yang melibatkan data lokasi (spatial) sebagai faktor acak memberikan hasil yang lebih baik, terutama apabila menggunakan matriks korelasi (working correlation matrix) yang diagonal (independent WCM).

(7)

vii YEKTI WIDYANINGSIH. Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area. ASEP SAEFUDDIN, KHAIRIL A. NOTODIPUTRO, AJI HAMIM WIGENA.

Ranking, hotspot detection and modeling are important techniques for almost all fields of study. These three techniques have important roles for decision makers, even in business, education, ecology, and socio economic, especially in government to increase the transparency of decision making. Every country in this world has several policies to arrange for several affairs. Due to the limitation of the sources, the right and apt decision is very important and urgent. To support the right decision in every area, the role of these techniques is needed. Optimistically, this dissertation is able to contribute ideas and thoughts to the government and ministries in decision making process related to poverty reduction.

Focus of study in this dissertation is modeling in Nested Generalized Linear Model (NGLM) and Nested Generalized Linear Mixed Model (NGLMM) as an expansion of Zhang’s and Lin’s Model (2008), a model as a strategy to detect hotspot through parameter estimates of spatial association in non-nested study area using count response variable. Modeling in this study is GLM and GLMM with hotspot detection result as an explanatory variable, applied in nested area using multinomial ordinal response variable. Before modeling, two studies, i.e. ranking method and hotspot detection methods are studied.

ORDIT (Ordering Dually in Triangle) ranking method is studied and implemented on poverty data. Actually, this method was developed to handle ranking process of many individuals based on many indicators. It is not easy to rank individuals with many indicators. This study explaines how to rank many individuals based on many indicators through some mathematical concepts, such as order theory, duality, and partial order set (poset). Due to the limitation of the data, this method is implemented to order sub districts based on poverty level using only two indicators, i.e. surkin (surat miskin) or poverty letters (PL) and

askeskin (asuransi kesehatan untuk orang miskin) or health insurance for the poor (HIP). Observation unit of this data is sub district (kecamatan). In this study, 1679 sub districts in Java Island are ordered based on poverty using those indicators,

HIP and PL. As the result, 6 of 10 most severe sub districts are in Jember district, and 5 and 3 of 10 least severe sub districts are in Probolinggo city and Surabaya, respectively. Based on the results of ranking method, it can be concluded globally, that the order from less severe to most severe levels of the three provinces are West Java, Central Java and East Java.

The work of this ranking method was continued by grouping the ranking result into 3 parts based on ranking order. The three poverty levels of sub districts are worst, moderate, and mild. Every sub-district has its own grade as 1 or 2 or 3. “One” is for the worst, “2” is for moderate, and “3” is for mild. The result of this grouping ranking is kept as a report and would be used as response variable for modeling.

(8)

viii

assumption, 8 data sets were built and computed in 10.000 times to obtain the output, which are the performances of the methods in 14 criteria. The mean and standard deviation of each criterion from each simulation and each data set are computed and then compared. From these outputs, 14 criteria are summarized, analysized and compared. As the result of comparison, it is believed, ULS hotspot detection is better than Circle based Scan Statistics (SS).

The research is continued on detection of bad nutrition hotspot in 8 districts that have been chosen randomly. In this result, we have hotspot status for every sub district in these 8 districts: 0 means sub district is not in the hotspot area and 1 means sub district is in the hotspot area. This result would be included in modeling as a dichotomy explanatory variable, to answer the question: does the hotspot of bad nutrition explain significantly on poverty level through Nested GLM and GLMM.

Modeling is started with data preparation, as follows. Three districts from West Java, 2 districts from Central Java, and 3 districts from East Java are chosen randomly for model implementation. The names of these 8 districts are Kuningan, Karawang, Majalengka, Cilacap, Boyolali, Ngawi, Blitar, and Jember. Three levels of poverty which is the result of study on ranking method is used as ordinal response in modeling, while bad nutrition hotspot status which is the result of study on ULS hotspot detection method is used as an explanatory variable. Moreover, other explanatory variables for modeling are number of farmer families, schools, and health personnel. The reason of this variables determination is based on Bappenas Report 2011. To simplify understanding in interpretation, values of explanatory variables are divided into three parts, i.e. low, moderate, and mild which are appropriate to some resources.

Based on Zhang’s and Lin’s models, modification is developed, that is (1) upgrading the model for nested data (districts nested in province), with assumption correlation of sub districts within district is higher than correlation of sub districts between districts, (2) using ordinal scale as response variable. Modeling was undertaken for the GLM and GLMM. In Nested GLM, Generalized Estimating Equation (GEE) method is used as parameter estimation to tackle clustered and correlated data problem, while in Nested GLMM, Pseudo Likelihood is used as the model parameter estimation method. In Nested GLMM, district is a random effect in the model.

Some working correlation matrices can be implemented through GEE method. Three types working correlation matrices (WCM), i.e. exchangeable, unstructured, and independent are studied. An objective of modeling is to know which WCM gives the best results. Assumed that the poverty data has unstructured pattern in correlation between sub districts in a district. As the result, independent WCM gave the minimum ratio of robust and model based standard errors. It is believed the data has independent correlation structure.

(9)

(10)

x

Prohibited from quoting part or all of these papers or the source is anonymous. Citations only for educational purposes, research, writing papers, preparing reports, writing criticism, or review an issue, and citations will not damage the normal interest IPB.

(11)

xi

OF ORDINAL POVERTY RESPONSE

IN NESTED AREA

YEKTI WIDYANINGSIH

Dissertation

Submitted to the School of Graduate Studies of Bogor Agricultural University

in partial fulfillment of the requirements for Doctorate degree in Statistics

SCHOOL OF GRADUATE STUDIES BOGOR AGRICULTURAL UNIVERSITY

(12)

xii

Closed Examination

(July 7, 2012) Examiner:

1. Dr. Ir. I Gusti Putu Purnaba, DEA. Department of Mathematics

Faculty of Mathematics and Natural Sciences Bogor Agricultural University

2. Dr. Ir. Hari Wijayanto, M.Si. Department of Statistics

Faculty of Mathematics and Natural Sciences Bogor Agricultural University

Open Examination

(July 30, 2012) Examiner:

1. Prof. Dr. Ir. Dadang Sukandar, M.Sc. Department of Community Nutrition

Faculty of Human Ecology, Bogor Agricultural University 2. Dr. Slamet Sutomo, SE., MS.

(13)

xiii Student Name: Yekti Widyaningsih

NRP: G161070011

Approved as to style and content by:

Dr. Ir. Asep Saefuddin,MSc. Chair of Committee

Prof. Dr. Ir. Khairil A. Notodiputro, MS. Dr. Ir. Aji Hamim Wigena, MSc. Member Member

Acknowledged,

Dr. Ir. Aji Hamim Wigena, MSc. Dr. Ir. Dahrul Syah, MSc. Agr. Head of Study Program Dean of School of Graduate Studies

Date of defense: Date of graduation:

(14)

(15)

xv Praise and thank to God Almighty for all His grace so that scientific work is successfully completed. Research has been conducted since mid-2009 under the title Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area.

I would never have been able to finish my dissertation without the guidance of my committee members, help from friends, and support from my family, especially my mother and my late father.

There are many people who through their generosity and knowledge have made important contributions to this dissertation. It would be impossible to list everyone who contributed or to adequately list the extent of the contributions for those who are mentioned.

First and foremost, I am extremely grateful to my advisor, Dr. Ir. Asep Saefuddin, MSc. for his guidance and support throughout my graduate study. I especially thank him for giving me the opportunity to participate in several of his research projects which deal with many challenging statistical issues. I wish to thank my committee members Prof. Dr. Ir. Khairil Anwar Notodiputro, MS. and Dr. Ir. Aji Hamim Wigena, MSc. who let me experience the research of data simulation in the field and practical issues beyond the textbooks, and patiently corrected my writing.

I would like to express my deepest gratitude to my advisor in Pennsylvania State University, Prof. Ganapati P. Patil for his excellent guidance, caring, patience, and providing me with an excellent atmosphere for doing research. I would like to thank Prof. Wayne L. Myers who let me experience the research of ranking method, also patiently corrected my writing. Many thanks to Prof. Sharad W. Joshi who as a good friend was always willing to help and give his best suggestions. It would have been a lonely work room without communicating with him by phone. My research would not have been possible without their helps.

Many thanks to the Directorate of Mendepdiknas for financial assistance,

(16)

xvi

statistics, faculty and employees of the school of graduate studies who have provided services to both teaching and administration.

Many thank to the Department of Statistics lecturers who always take the time to discuss and provide advice and encouragement, and also the statistics department employees for their helps.

Special thanks goes to Dr. Ir. I Gusti Putu Purnaba, DEA., Dr. Ir. Hari Wijayanto, M.Si., Prof. Dr. Ir. Dadang Sukandar, M.Sc. and Dr. Slamet Sutomo, SE., MS. who were willing to participate in my final defense committee at the last moment. I would also like to thank Drs. Tjiong Giok Pin, M.Si for possibility to use the map files.

I also wish to acknowledge my friends, S2 and S3, with whom I shared my joy, complaints, and laughter through these past years.

Finally, I would like to thank my parents, two elder sisters, and elder brother. They were always supporting me and encouraging me with their best wishes.

Even though I has benefited from the help and advice of many people, there are some bound to be things I have not grasped – so remaining mistakes and omissions remain my responsibility. I would be grateful for messages pointing out errors in this dissertation.

Bogor, July 2012

(17)

xvii Yekti Widyaningsih was born on September 15, 1967 in Bandung, West Java. Her father’s name is Prayuto and her mother’s name is Ambarwati. Yekti is the youngest of one brother and three sisters. She graduated with Dra in Mathematics from University of Indonesia in 1992 with Dra. Linggawati, M.S. as her advisor, and received master’s degree in statictics in 2002 from Bogor Agricultural University with advisors Dr. Ir. Amril Aman, M.Sc. and Dr. Ir. Hadi Sumarno. To strengthen her knowledge on statistics, she had finished Ph.D program in 2012 at the same place and received scholarship from BPPS. The advisors of her PhD dissertation were Dr. Ir. Asep Saefuddin, M.Sc., Prof. Dr. Ir. Khairil Anwar Notodiputro, MS. and Dr. Ir. Aji Hamim Wigena, MSc. A part of this dissertation was written in Pennsylvania State University with Prof. Ganapati P. Patil as her advisor and Prof. Wayne L. Myers and Prof. Sharad Joshi as her co-advisors. Research in PSU was supported by Directorate General of Higher Education Indonesia (DIKTI) as Doctoral Sandwich Program 2010. She also involved in Geoinformatic Research of Penelitian Hibah Pascasarjana DIKTI. As long as doing her Ph.D, she has written some papers, which published at national and international seminars. The papers are:

1. YektiWidyaningsih, Asep Saefuddin, Khairil Anwar Notodiputro, Aji Hamim Wigena. (2012). Nested Generalized Linear Mixed Model for Correlated Nested Data with Ordinal Response. Jurnal IPTEK ITS Volume 23/No.2/May 2012.

2. Yekti Widyaningsih, Wayne L. Myers, Asep Saefuddin. (2012). Sub Districts Poverty Level Determination using Ordering Dually in Triangle (ORDIT) Ranking Method, Jurnal Math Info Volume 5/No.2/July 2012.

3. YektiWidyaningsih, Asep Saefuddin, Khairil Anwar Notodiputro, Aji Hamim Wigena. (2012). Nested Generalized Linear Mixed Model with Ordinal Response: Simulation and application on Poverty Data in Java Island. AIP Conference Proceedings of The 5-th International Conference on Research and Education in Mathematics, Institut Teknologi Bandung, October 2011. 4. YektiWidyaningsih, Asep Saefuddin, Khairil Anwar Notodiputro, Aji Hamim

Wigena. (2011). Ordering Dually in Triangles (Ordit) and Hotspot Detection in Generalized Linear Model for Poverty and Infant Health in East Java. Paper in The 6-th SEAMS-GMU 2011 International Conference on

Mathematics and Its Applications, Universitas Gadjah Mada, Yogyakarta, July 2011.

(18)

xviii

Paper in ICCS-X, Cairo, Egypt, December 2009.

7. Yekti Widyaningsih and Siti Nurrohmah. (2009). The Application of Spatial Scan Statistics on The Tuberculosis Hotspot Detection in Indonesia.

Procceeding of IndoMs International Conference on Mathematics and Its Application (IICMA), Yogyakarta.

8. YektiWidyaningsih and Asep Saefuddin. (2008). Health Profile 2005 and Geoinformatic of Diseases in Indonesia. Paper in the International Workshop on Digital Governance and Hotspot GeoInformatics, Jalgaon, India, March 11-24, 2008.

9. Yekti Widyaningsih dan Tjiong Giok Pin. (2008). A Space-Time Scan

Statistics to Detect Cluster Alarms of Dengue Mortality in Indonesia 2005. An article of jurnal “Makara seri Sains” Volume 12 No.1/April 2008.

10.Yekti Widyaningsih and Asep Saefuddin. (2007). Disease Outbreak in Indonesia: The Application of Scan Statistics. Paper in the 1st International Conference on Theory and Practice of Electronic Governance. Macau Polytechnic Institute, 10-13 December 2007.

11.Yekti Widyaningsih. (2007). Model, Calculations, and Application of Spatial Scan Statistics. Paper in the International Conference on Mathematics and Its Applications, Universitas Gadjah Mada, Yogyakarta, August 2007.

12.Yekti Widyaningsih. (2007). A Space-Time Permutation Scan Statistics for Disease Outbreak Detection. Poster at 1st Joint Seminar UI-UKM 2007. Universitas Indonesia.

(19)

xix

!

"

#

$

%

(20)

(21)

xxi Page

ABSTRACT ……… v

SUMMARY ……… vii

ACKNOWLEDGEMENTS ……… xv

TABLE OF CONTENTS ... xxi

LIST OF TABLES ……….. xxiv

LIST OF FIGURES ………. xxv

LIST OF APPENDIXES ………... xxvii

LIST OF ABBREVIATIONS ……….. xxviii

GLOSSARY ………. xxix

1 INTRODUCTION 1 1.1 Background ... 1

1.2 The Purpose of Research ... 10

1.3 Research Framework... 11

1.4 The Outline of Disertassion ... 13

1.5 Novelty ... 14

2 SUB DISTRICTS POVERTY LEVEL DETERMINATION USING ORDERING DUALLY IN TRIANGLE (ORDIT) RANKING METHOD 15 2.1 Introduction ………... 15

2.2 Theoretical Background ……….. 17

2.2.1 Rating Relations/Rules for Ascribing Advantage ………... 17

2.2.2 Subordination Schematic and Ordering Dually in Triangles (ORDIT) ………. 18

2.2.3 Product-order Rating Regime ………. 19

2.2.4 The concepts of Askeskin or HIP and Surkin or PL ..……. 21

2.3 Methodology ………... 22

2.4 Results and Discussion ………... 25

(22)

xxii

BASED ON SIMULATION STUDY

3.1 Introduction ... 33

3.2 Theoretical Study ……….. 34

3.2.1 The Concept of Hotspot Detection ……… 34 3.2.2 Hypothesis testing for comparison between SS and ULS.. 37 3.2.3 Circle-based Scan Statistics (SS) Hotspot Detection .…. 38 3.2.4 Upper Level Set (ULS) Scan statistics ………. 39 3.3 The Methods ... 42 3.3.1 The Steps of Simulations ………. 43 3.3.2 The Fourteen Criteria ….……….. 44 3.4 The Results and Analysis ……..………. 46 3.5 ULS Hotspot Detection for Bad Nutrition Case in Java Island … 49 3.6 The Results of Bad-nutrition Hotspot Detection ……….. 49

3.7 Conclusion ……… 51

4 NESTED GENERALIZED LINEAR MIXED MODEL FOR

CORRELATED DATA

53

4.1 Introduction ……… 53

(23)

xxiii 4.4 Results and Discussion ……….. 100

4.4.1 Standard error of parameter estimates ……… 100 4.4.2 Significance (p-values) ………... 105

4.5 Conclusion ………... 109

5 GENERAL DISCUSSION 111

6 CONCLUSION AND RECOMMENDATION 117

6.1 Conclusion ... 117

6.2 Recommendation ……….. 118

(24)

xxiv

Page 1 Entities with 3 indicators ……….. 18 2 Six leading lines of the poverty dataset ……… 24 3 Description of indicators HIP and PL ……….. 24 4 The first 6 lines of the data: identity number (id), province, district,

sub district, indicators values, and sub district’s ranking based on indicator ………

27 5 The first 6 lines of the result obtained by applying the ProdOrdr

function to place poverty measurement rank of sub district ………….

27 6 The ten most severe sub districts according to ORDIT ranking …….. 29 7 The ten least severe sub districts according to ORDIT ranking ……... 29 8 Poverty level of sub districts in the West, Central and East Java ……. 30 9 Performance criteria comparison of ULS and SS for 5% significance 47 10 Performance criteria comparison of ULS and SS for 1% significance 48 11 General structure of data layout ………... 66 12 General structure of nested data layout ………... 67 13 Link function name, form, inverse of link function, and range of the

predicted mean ……….

81 14 The first and second derivatives of link function ……… 81 15 Provinces, districts and number of sub districts ………. 97

16 Data description ……… 97

(25)

xxv 1 Un-nested hotspot (dark color areas are the hotspots) ………. 4 2 Nested hotspots in three provinces ………. ……… 4

3 Research Diagram ……… 8

4 The systematic of the research activity ……… 12 5 Research diagram: relation among chapters ……….. 13 6 X-shaped Hasse diagram of five entities labeled as A, B, C, D and E 18 7 Subordination schematic with plotted instance dividing a right

triangle into two parts, a ‘trapezoidal triplet’ (of AA, SS and II) below, and a ‘topping triangle’ (of CCC, SS and II) above…………. 19 8 The Map of Java Island with districts identity ………..………… 23 9 Scatterplot of the indicators: HIP vs PL ………... 26 10 Boxplot of the indicators, HIP and PL………... 26 11 Precedence plot (based on place ranks) of subdistricts from R

commands ………. 28

12 A study area with zone and non zone areas ……….. 35 13 A part of circle based hotspot detection process ………... 39 14 A map and its adjacent matrix ……….. 41 15 ULS hotspot detection process (dark color is the hotspot) …………... 42 16 ULS Hotspot of bad nutrition in Kuningan, Karawang,

Majalengka, Temanggung, Boyolali, and Cilacap ……… 50 17 ULS Hotspot of bad nutrition in Blitar, Ngawi, and Jember ………… 51

18 The scheme of modeling……… 54

19 An ordered response and its latent variable ………. 63 20 Changes in the value of x that cause changes in the magnitude of

probability; a1, a2, a3 is the threshold ……….. 64 21 The effect of a covariate on the transformed cumulative

probabilities (pdf of Y for some values of x) ……… 64 22 Developing of Zhang’s and Lin’s GLMM ... 92 23 Study Area with 3 provinces {s = 1, 2, 3}, 3 districts are randomly

chosen from West and East Java {i = 1, 2, 3}, and 2 districts are randomly chosen from Central Java {i = 1, 2}. There are nsi sub districts in district i of province s...

93

(26)

xxvi

(27)

xxvii 1 Fourteen criteria of poverty ……….. 127 2 Concept of health insurance for the poor (hip) or askeskin and

certificate of cannot afford (PL) or surkin ……….. 128 3 Sum of Poisson Random Variables ………. 129 4 Multinomial distribution ……… 130 5 Maximum Likelihood Estimation ……….. 131 6 Conditional simulation with hotspot z assumed known ……….. 132 7 Output of true hotspot1 simulation Central Java ………. 133 8 Output of true hotspot2 simulation Central Java ………. 135 9 Output of true hotspot1 simulation Java Island ……… 137 10 Output of true hotspot2 simulation Java Island ……….. 139 11 Output of true hotspot1 simulation Map X ……….. 141 12 Output of true hotspot2 simulation Map X ……….. 143 13 Output of true hotspot1 simulation Map Y ……….. 145 14 Output of true hotspot2 simulation Map Y ……… 147 15 Fourteen criteria of hotspot method for p-value =0.05 ……….. 149 16 Fourteen criteria of hotspot method for p-value =0.01 ………. 150 17 ULS Hotspot of bad-nutrition in Kuningan and Karawang... 151 18 ULS Hotspot of bad nutrition in Majalengka and Temanggung ... 151 19 ULS Hotspot of bad nutrition in Boyolali and Cilacap... 152 20 ULS Hotspot of bad nutrition in Blitar and Ngawi... 152 21 ULS Hotspot of bad nutrition in Jember ……….. 153 22 Parameter Estimates and Standard Errors of Nested GLM ……… 154 23 Significance of Nested GLM parameter estimates ……… 155 24 Classification result of Nested GLM ……… 156 25 Parameter Estimates and Standard Errors of Nested GLMM ……. 157 26 Significance of Nested GLMM parameter estimates ……… 158 27 Matrix Equation for Nested GLMM (an example)………. 159 28 Theorem of Pearson residual Moran’s IPR and IaPR ……… 160

(28)

xxviii

CSHD Circle based hotspot detection GEE Generalized estimating equation GLM Generalized linear model GLMM Generalized linear mixed model ORDIT Ordering dually in triangles

SS Scan statistic

ULS Upper level set

WCM Working correlation matrix

(29)

xxix

Askeskin Health insurance for the poor (asuransi kesehatan untuk orang miskin).

Cluster A grouping containing ‘lower level’ elements. For example in a survey sample, a district (cluster) containing of sub districts. Explanatory

variable

An independent variable: in the fixed part of the model usually denoted by X_{and in the random part by}Z.

Fixed part The part in a model represented by Xβ, that is the average relationship. The parameters β are referred to as ‘fixed parameters’.

Hotspot Unusual phenomenon, anomalies, aberrations, outbreaks, elevated clusters, or critical areas.

Kronecker product

An operation on two matrices of arbitrary size resulting in a block matrix.

⊗ =

Level A component of a hierarchical data.

Nested The clustering of units into a hierarchy (level). Random part That part of a model represented by u or Zu.

Regional Area (daerah).

Response part The part of a model represented by Y. Also known as a ‘dependent’ variable.

Spatial Happening or existing in space. Study area An area of examination.

Surkin (surat miskin)

Certificate of the Poor and Disadvantaged; Poverty letters;

SKTM (surat keterangan tidak mampu).

1

(30)

Chapter 1

INTRODUCTION

1.1 Background

Nowadays, the issues of poverty are often discussed. Although statistics

show that the number of poor in Indonesia decreased, from 30.02 million people

(12:49%) in March 2011 to 29.89 million people (12:36%) in September 2011,

Indonesia is still facing the problem of poverty (BPS 2011).

Related to poverty alleviation programs, many policies should be made,

especially at the time when the government needs to make decisions about which

area should get a priority to receive a treatment. In making this decision, ranking

and the hotspot detection technique should have a role to support the decisions.

Furthermore, modeling is also important to know which factors are related to

poverty.

Ranking, and hotspot detection, and modeling are three important methods in

statistics used to evaluate and examine data in everyday life and in many fields of

study. The data could be the number of disease cases, people in poverty, particular

animals or plants related to biodiversity or environment and ecology, and many

others. Related to an effort to alleviate poverty, SMERU1 also ranks areas

in several regions in Indonesia based upon poverty levels. Ranking in the

representation of poverty, can support an objective decision-making and will

increase the transparency of government decision making. Moreover, a

well-defined poverty level can lend credibility to government decision making

(Widyanti 2003). In addition, cases of bad nutrition currently occur in nearly

all parts of Indonesia. About 4 million children in Indonesia are exposed to the

risk of bad nutrition (Yurnaldi 2008). In this problem, hotspot areas need to be

known to support the objective decision in a poverty reduction programs.

Furthermore, modeling for poverty data by taking into account the different

conditions of a region from other regions and the resource constraints is necessary.

1

(31)

Statistical models that correspond to these conditions should able to overcome the

nested and random conditions.

Based on those thoughts and facts, the study in this dissertation is about

ranking and hotspot detection, and incorporating the results of these two methods

in the development of Nested Generalized Linear Mixed Model.

Currently, ranking, hotspot detection, and modeling techniques are being

developed by experts. The ranking method that is based on several indicators using

ecological and environmental data was developed by Myers and Patil (2010).

Hotspot detection method was developed by Kulldorf (1997), Patil and Taillie

(2004), and Duczmal, Tavares, Patil, and Cancado (2010), whereas modeling with

fixed factors and hotspot as covariates was developed by Zhang and Lin (2008).

This dissertation combines these three approaches, with a focus on model

development of Zhang’s and Lin’s model and applied on poverty data.

The ranking method applied in this study is ORDIT (ORdering Dually In

Triangles) used to rank individuals (unit observations) based on several indicators

(Myers and Patil 2010). Furthermore, a comparison of the two hotspot detection

methods, namely Circle-based scan statistics by Kulldorff (1997) and upper level

set scan statistics by Patil and Taillie (2004) has been studied. The development

and implementation of the model is based on nested GLM (Generalized Linear

Model) and nested GLMM (Generalized Linear Mixed Model) using GEE

(Generalized Estimating Equation) method and pseudo likelihood, respectively for

parameter estimation.

GLMM is a statistical model accommodating fixed effects and random

effects, while GLM only uses fixed effects. The distributions of the response

variables are not restricted to the normal distributions, but distributions within the

exponential family. Some GLMM principles used in the formation of spatial

models are mentioned by Lawson and Clark (2002) in their discussion of the

possibility of risk of non-continuity surface and Loh and Zhu (2007) calculated the

spatial correlation of the scan statistic with the GLMM spatial model in an effort to

obtain more accurate analysis results. Furthermore, some researchers have begun

to explore geographic and ecologic potentials to be used as explanatory variables

(32)

two breast cancer cases, Roche et al. (2002) compared these two geographic areas, cluster and non-cluster, and found that the two tend to be isolated due to a

language factor. This research suggests that identified risk factors may contribute

to the observed patterns, but since the cluster detection separates between control

and non-control factors, it is impossible to use it as a statistical conclusion.

Furthermore, a study by Zhang and Lin (2008) is to improve predictability of the

model through the incorporation of explanatory variables and the process of spatial

cluster detection in a frequentist approach. In other words, Zhang and Lin combine

hotspots and modeling, where explanatory variables and hotspot were observed

simultaneously.

Zhang and Lin (2008) apply a spatial GLMM with cluster (hotspot area) of

Kulldorff (1997) as explanatory variables. Through this modeling Zhang and Lin

have detected the hotspot significance which appeared in the spatial data. The

hotspots detected in this model is the common hotspot with a single level (not

nested) as presented in Figure 1. As geographical and ecological factors also

contribute to identifying the hotspot, the model also pays attention to the

geographic and ecological components. This research is aimed to further analyze,

what if the model is applied to the data with spatially nested form. In other words,

we develop models that have been introduced by Zhang and Lin (2008) with

respect to nested hotspots as shown in Figure 2. In the nested spatially GLMM

model, hotspots can be assumed as fixed effects, in which the response variable is

measured in ordinal scale. It is also important to note that the result of the

estimation will take into account for conditions in which the variables are as well

as independent variable and random effect.

One thing that is often ignored in statistical modeling is the variance of the

data. The variance of the data can be viewed as global or local variance. A global

variance is calculated and observed based on overall data variability while local

variance is calculated and observed based on a group of data. A research using

mixed-effects regression modeling with heterogeneous variances for analyzing

Ecological Momentary Assessment (EMA) data was made by Hedeker and

(33)

[image:33.595.62.480.74.782.2]

Figure 1 Un-nested hotspot (coloring areas are the hotspots)

(34)

It is always possible to find data with different variances in different

conditions or groups of data. For this kind of data, we must consider the existence

of local variance and parameter model estimation to be addressed. The appropriate

method of parameter estimation is GEE, the generalized estimating equation

(Hardin and Hilbe 2003).

GEE parameter estimation method uses a working correlation matrix to

estimate model parameters. The elements of this matrix are correlation values

between observations in a cluster. If the subject or district i has ni subdistricts, the

dimension of the working correlation matrix is ni × ni, and correlations are

computed based on ni pairs observations for all i. Higher correlations will be

appropriate to the smaller variance in a cluster (subject). Therefore, the GEE

method is used to estimate the model parameters for clustered data.

As mentioned before, issues relating to quality levels are often found in

everyday life. Ranking is important for problems related to priorities and

efficiency. This study will also discuss and implement the ranking method to

poverty data and then categorize the results into a few degrees to be used as an

ordinal response variable in modeling. This ordinal response represents the level of

subject quality. Similarly, the outbreak (hotspot) in a particular area is also

important and interesting to be studied. Associated with the model development,

hotspot status of a sub district will be included also as an explanatory variable in

the model to know its contribution to the poverty level. This part follows Zhang’s

and Lin’s model development.

Zhang and Lin (2008) used a spatial GLMM with cluster (hotspot area) on

data based on a vast land area (continental) and absence of nested area assumption.

This research, however, uses the spatial data analysis in nested spatial form by

taking into account the local variances. The necessary of nested can be caused by a

local variance of data generated by some factors including environmental

conditions or other factors like history and cultural characteristics of individuals in

the area as described in the following 4 paragraphs to differentiate among West,

Central, and East Java.

According to research of “Civilization Java” by Rahardjo (2011),

(35)

“East Java Survey of Poor Families” by Garner and Amaliah (1999), there are dissimilar characteristics among West, Central, and East Java.

The geographical characteristics make Central Java more closed than East

Java. Almost all the main mountains in Central Java are located in the center of

the province, and the coastline is like a thick wall that limited access to the outside

world in the ancient times. In contrast, the center of civilization in East Java is

much more open. Although there are several mountains in East Java, they do not

form an impenetrable barrier wall from or to the coast. Two of the largest and

longest rivers in Java (Brantas and Solo) can be navigated through to the interior.

Industry and trading activities have occurred much earlier in Central Java and East

Java. These conditions have formed the culture and characteristics of two peoples

who are rather different (Rahardjo 2006).

On the other hand, according to the report of household research conducted

by Hondai (2006), consumption expenditure varies considerably (significantly)

from one province to another. Central Java consist mainly rural areas except a few

medium sized urban areas, which are rather homogeneous in the region compared

to West Java. In the following analysis, the author investigates changes in

inequality of West Java as a representative of a rapidly growing industrial region

and Central Java as a representative of a rather homogeneous rural region of the

country.

Furthermore, a recent Survey of Poor Families reported that very high rates

of malnutrition were found in East Java, which was considerably higher than

national rates and those of other provinces in Java (Gardner 1999).

The purpose of this study is to develop a model GLMM of Zhang and Lin for

data with ordinal response variable with respect to the local variance (nested

conditions). Associated with a written statement by Hardin and Hilbe, the term

nested is synonymous with panel. Hardin and Hilbe (2003) stated (in their book

entitled Generalized Estimating Equations) there is a correlation between

observations in the panel data, and if the common likelihood is used for parameter

estimation regardless of the condition of data, then it is not correct, and can result

in an interpretation that is not true, because the variance matrix is assumed to have

(36)

consider the spatial correlation between areas in the nested condition. Many

components affect the variance; for example: natural resources, climate,

environment, language, culture, customs, demographics, life style, and others.

Related to the spatial modeling, Figure 3 represents the studies of spatial

GLM and GLMM developed by researchers, and the position of the current study,

which is nested GLMM using spatial data in this dissertation. Figure 3 provides an

explanation about the Spatial Model. In general, the spatial model is divided into

two parts, that is, the spatial GLM and spatial GLMM.

Spatial GLM has been developed by Schabenberger and Gotway (2005) as

Fixed Effects and the Marginal Specification. Parameter estimation in these models can be handled widely based on a specification of two first moments of

outcome, using Generalized Estimating Equation or Quasi-likelihood from

Wedderburn (1974). Schabenberger and Gotway (2005) also developed spatial

GLMM that they named the Mixed Models and the Conditional Specification,

which is a GLM with the conditional approach incorporating the unobserved

spatial process as random effects within the mean function. The conditional mean

and variance of outcome are modeled as a function of both fixed covariate effects

and random effects deriving from the unobserved spatial process. In spatial

GLMM, Schabenberger and Gotway used Penalized Quasi likelihood estimation

from Breslow and Clayton (1993) and Pseudo-likelihood estimation from Wolfinger and O’Connell (1993).

Hao Zhang (2002) developed spatial GLMM using MMSE prediction

Metropolis-Hasting algorithm, while GLMM for point referenced spatial data was

developed by Gamperli and Vounatsou (2004) using Laplace Approximation.

Furthermore, approximate Bayesian inference in spatial GLMM was developed by

Jo Eidsvik, Sara Martino and Håvard Rue (2007). Some other spatial GLMMs

developed are approximate Bayesian inference with skew normal latent variables

by Hosseini (2011) and estimating spatial pattern using GLMM by Kwak

et.al.(2012). As mentioned before, spatial GLMM with clusters (hotspot area) of Zhang and Lin (2008) is one of the spatial GLMMs in this diagram, which has

(37)

F

igure

3 Re

se

arc

h D

ia

gra

[image:37.595.44.477.65.756.2]

(38)

showed by the light blue rectangular of the diagram at Figure 3. Nested GLM and

GLMM with spatial data will be developed and the GEE method is used for

Nested GLM parameter estimation and pseudo likelihood is used for Nested

GLMM parameter estimation. Comparison of parameter estimations based on

some working correlation matrices will be studied, and the application of the

model is on assessing the effect of some covariates on some poverty level.

Other models in the diagram are Multilevel Models for ordinal data by

Kenett (2011), Multilevel models with ordinal outcomes and their application on

psychology data using Maximum Likelihood and Penalized Quasi Likelihood as

the parameter estimation method (Bauer and Sterba 2011). Ordinal response was

used in all these models.

Based on the explanation above and literature reviews related to spatial GLM

and GLMM, it appears that the majority of research over the last few years

concern parameters estimation technique, while research in this dissertation

concerns correlation within cluster. The Focus and novelty of this research is the

modification of Zhang’s and Lin’s model on ordinal response variable and taking into account the clustered nested data. The attention to the correlation matrix is to deal with clustered conditions of data.

Research in this dissertation focuses on developing of GLM and GLMM of

ordinal response in nested spatially data with study on some working correlation

matrices, showed in blue rectangles at Figure 3. Some interesting studies related to

this research are developed and discussed in chapters 2 and 3 of this dissertation.

As mentioned before, poverty issues have often been discussed and is still a major

problem in Indonesia. In an effort to be able to contribute ideas and thoughts, this

dissertation takes poverty as the main application problem with ranking, hotspot

detecting and modeling as the methods used for analysis.

Based on the statements, ideas, thoughts and explanations above, the

questions in this study are

1. How to determine the level of severity or poverty of sub districts in Java

and which areas are the most and the least severe in poverty.

2. How to know the best hotspot method and which areas are a hotspot of a

(39)

3. How to build a model for clustered nested data with multinomial ordinal

response, and how to estimate the model parameters.

4. How is the influence of working correlation matrix structure on the

estimation of model parameters for nested correlated data?

5. How are the differences of the model parameters estimate between Nested

GLM and Nested GLMM?

1.2 The Purpose of Research

Based on several initiatives and issues described in section 1.1, the purpose

of this dissertation is

1. To determine sub districts level of severity (poverty) in Java.

2. To obtain the best hotspot detection method between the two methods that

will be studied and to apply this best method on a factor for modeling.

3. To build a model for nested data with multinomial ordinal response, and to

estimate the model parameters. Furthermore, to know parameter estimate

of explanatory variables in every province used in modeling.

4. To study the influence of the working correlation matrix structure on the

estimation of model parameters for data with clustered nested condition.

5. To study the differences of the model parameter estimation between Nested

GLM and Nested GLMM.

The purpose of this study have been achieved and described in chapters 2, 3,

and 4. The first purpose is achieved in chapter 2, as the study of ORDIT ranking

method and its implementation on the poverty data. The second purpose is

achieved in Chapter 3 as the study and comparison of two hotspot detection

methods. The best method has been used to detect bad nutrition hotspot area.

Finally, the third, fourth and fifth purposes are achieved in Chapter 4, modeling

and its implementation on poverty data using the result of Chapter 2 as the

dependent variable, while the result of chapter 3 is used as an independent variable

(40)

1.3 The Research Framework

The systematic of the research is described in Figure 4. The left side of

diagram in Figure 4 is ORDIT ranking method, which is studied and implemented

on poverty data in Java. This method is built to carry on ranking of individuals

based on several indicators, but due to the limitation of data, this study uses only 2

indicators of poverty, i.e. health insurance for the poor (hip) or askeskin and

statement of inability to pay or poverty letter (pl) or surkin. The result of ranking

is grouped into three levels and used as ordinal response in the modeling.

The right side of the diagram is a study and comparison between two hotspot

methods, i.e. Circle based Scan Statistics (SS) hotspot detection and Upper Level

Set Scan Statistics (ULS) hotspot detection. As a result, the best method is ULS,

which is used to detect hotspots of bad nutrition in some districts. The result of this

detection is used as an explanatory variable in the modeling.

In the middle of diagram is the development of GLM and GLMM modeling.

The spatial GLMM of Zhang and Lin (2008), spatial correlated in modeling from

Cressie (1993), and the problem related to global variance (variance among

subjects) and local variances (variance among sub subjects) in the spatial data

(Goldstein 1995), and Managing Clustered Data Using Hierarchical Linear

Modeling (Warne et.al. 2012) are the support theory of working correlation

matrices (WCM) determination for nested GLM and nested GLMM.

The work on modeling is as follows. The determination of a working

correlation matrix is the beginning step to estimate the model parameters. In this

study, two types of working correlation matrices are discussed and the most

appropriate type for the data in this research is determined. The model parameters

of GLM are estimated by the GEE method (Hardin and Hilbe 2003), while the

model parameters of GLMM by Pseudo likelihood approach (Wolfinger and O’Connell 1993). Furthermore, multinomial ordinal response variables gives complexity to the model building and parameters estimation. Applications

presented in this dissertation are analysis of poverty data. The source of the data is

(41)

F

igure

4 T

he

s

ys

te

m

at

ic

of

re

se

arc

h a

ct

iv

it

[image:41.595.81.454.76.748.2]

(42)

1.4 The Outline of Dissertation

This dissertation is divided structured into six chapters as described in Figure

5. Chapter 1 is an introduction that contains the background, the purpose of

research, the research framework, and the dissertation outline.

Chapter 2 discusses a method of ranking, namely the Ordering Dually in

Triangles (ORDIT) and its implementation on severity or poverty data from BPS.

The method applied on data concerning the level of poverty in several sub-districts

in Java is based on the number of statements of inability to pay and number of

health insurance of the poor. Furthermore, the result of poverty ranking is grouped

[image:42.595.96.491.46.760.2]

into 3 levels and used as ordinal response in modeling in Chapters 4.

(43)

Chapter 3 is a comparison between two hotspot detection methods, namely

the Circle Based Hotspot Detection from Kulldorf (1997) and the Upper Level Set

Scan Statistics from Patil and Taillie (2004) using the 14 criteria. The best method

is used to detect bad nutrition cases in some districts and the result is used as an

explanatory variable in the modeling

Chapter 4 discusses the nested GLM and nested GLMM and their

implementations on the poverty data from BPS. Chapter 5 is a general discussion

of modeling in Chapter 4. Chapter 6 contains conclusions and recommendations

and well a summary of the research in this dissertation.

1.5 Novelty

As described in Section 1.1, the model of Zhang and Lin (2008) was based

on un-nested spatial area. It is important to develop a model for nested spatial,

where the forms of correlations or variance-covariance matrix is based on the

condition of the data considered. As an archipelago, Indonesia is an example of the

nested spatial condition where lines, distance, and water act as borders that set off

one area from another. When a hotspot emerges in such areas, we need to know

how significant they are in generally. This question has been successfully

answered by the development of a nested-spatial model.

The novelty of this dissertation is the model that involves an interesting hotspots used as an explanatory variable and a spatial factor used as the random

effect in nested area. This model is called as the Nested Generalized Linear Mixed

Model (NGLMM), a development from the Zhang and Lin’s model (2008). However, the spatial factor in this model is not given the same treatment as spatial

factors in other spatial modeling. Spatially unstructured random effect is assumed

to be identical independent, and normal distributed. The GEE approach is used for

inference and is adapted for the nested spatial condition for GLM with a

multinomial ordinal response variable which gives complexity to the parameter

(44)

15

Chapter 2

SUB DISTRICTS POVERTY LEVEL DETERMINATION

USING ORDERING DUALLY IN TRIANGLE (ORDIT)

RANKING METHOD

2.1 Introduction

Poverty is one of major problems faced by Indonesia and other developing

countries from year to year (BPS 2011). As mentioned in Chapter 1, although the

number of the poor in Indonesia has decreased, Indonesia still faces the problem of

poverty. This problem is becoming increasingly unable to be resolved. Poverty is

both a cause and a consequence of poor health (Alisjahbana 2011). Poverty

increases the chances of poor health. Poor health in turn traps communities in

poverty. Infectious and neglected tropical diseases kill and weaken millions of the

poorest and most vulnerable people each year (Schirnding and Mulholland 2001).

There are several criteria that can be used as a basis for determining the

poverty threshold. It depends on the policy of a government to determine the

indicator. The most commonly used definition of global poverty is the absolute

poverty line set by the World Bank. Poverty is set at an income of $2 a day or less,

and extreme poverty is set at $1 a day or less. This line was first created in 1990

when the World Bank published its World Development Report and found that

most developing countries set their poverty lines at $1 a day. The $2 mark was

created for developing nations with slightly better income levels than their $1 a

day. In Indonesia, the number of people in poverty is a population that is under the

poverty line. The poverty line is used as a boundary to determine whether or not a

person is poor. The poor are the people who have an average per capita

expenditure per month below the poverty line of Rp 211.726 (about U.S. $ 20)

(BPS 2011).

Poverty line is a criterion to detect a person as the poor or not the poor. But,

because of different standard of living in different regions, it is not easy to

determine number of people in poverty using this criterion. Poverty criteria used in

(45)

(PL or surkin or SKTM). As indicators of poverty, HIP and PL will be counted

aggregately for every sub district. The concepts of askeskin and surkin will be

explained in section 2.2.4.

Connected to the poverty problem, poverty alleviated program is a concern of the

Government as a way out to reduce poverty. Policies should be made related to yield

the right decision for fund allocated. The poverty alleviated fund should be

received by the right sub districts or districts. For this reason, Government should

know the priority areas, which can be obtained from accurate ranking. ORdering

Dually In Triangle (ORDIT), a ranking method developed by Myers and Patil, is

used to rank sub districts based on poverty using two indicators.

ORDIT was developed with the purpose to provide convenient

computational capability and visualizations for preliminary partial or progressive

prioritization based largely on concepts of partial order theory and implemented in

R software. It was illustrated in a context of conservation and sustainable

stewardship across landscapes with ecosystem services as a complex

multidimensional domain that must be placed in public and private perspective in

pursuit of multi-resource management (Myers and Patil 2010). In other words,

ORDIT was developed for ranking with convenient computational and

visualization providing. The method is also powerful for ranking process with

multi indicators as shown through the illustration by Myers and Patil. Due to the

limitation of data resources, the number of indicators in this study is just a few,

that is 2 indicators, i.e. number of health insurances for the poor and number of

poverty letters.

The objective of this chapter is to rank sub districts in Java Island base on

poverty or severity level using ORDIT. According to the ranking result, 7 most

severe and 7 least severe sub districts will be revealed. The result of ranking will

be grouped into 3 levels, i.e. worst, moderate, and mild to simplify interpretation

and to distinguish two sub districts in terms of poverty level. Furthermore, this

(46)

2.2 Theoretical Background

ORDIT ranking method uses multi-indicators in the ranking process, where

its theoretical concepts are discussed in the following sub sections based on Myers

and Patil (2010).

2.2.1 Rating Relations/Rules for Ascribing Advantage

This section describes the protocols for comparing cases or collectives of

cases via ratings, rules and relations that ascribe advantage to some cases over

some others or fail to do so for particular pairs. There are 3 possibilities in

comparing a pair of cases, where one case is denoted by and the other by . “Г

aa ” wherein Г has ascribed advantage over . Г has subordinate statusto (“Г ss ”) which implies “ aa Г”. “Г ii ” whereby these are indefinite instances

without ascribed advantage and without subordinate status, which implies “ iiГ”.

Thus the protocol either designates one member of a pair as having ascribed

advantage and the other subordinate status, or that pairing as being indefinite.

ff(aa) is the frequency of number of occurrences for the ascribed advantage. ff(ss)

is the frequency of number of occurrences for the subordinate status. ff(ii) is the

frequency of number of occurrences for being indefinite. Each of the N cases can

be compared on this basis to all others in the deleted domain DD=N–1 of

competing cases with the percent occurrence of these relations being tabulated as

follows, AA = 100 × ff(aa)/DD, SS = 100 × ff(ss)/DD, II = 100 × ff(ii)/DD.

Clearly, AA + SS + II = 100%; and for later use let CCC = 100 – AA as the

complement of case condition relative to ascribed advantage (AA) (Wayne and Patil 2010).

Figure 6 gives is a Hasse Diagram of entities at Table 1. This simple

X-shaped Hasse diagram is used to illustrate wherein entity A has ascribed advantage

over B, D, and E while being indefinite with regard to C. Entity B has subordinate

status to A and C with ascribed advantage over D and E. The deleted domain

(47)

Table 1 Entities with 3 indicators

Entity Ranking

Indicator1 Indicator2 Indicator3

A 1.5 2 3

B 3 5 4

C 1.5 3 2

D 8 6 7

E 7 8 6

Figure 6 X-shaped Hasse diagram of five entities labeled as A, B, C, D and E

2.2.2 Subordination Schematic and ORDIT

Subordination can be symbolized diagrammatically (Wayne and Patil 2010)

in a triangle depicted in Figure 7, where the point with coordinates (SS, AA)

representing a sub district makes a triangle divided into two parts, ‘trapezoidal

triplet’ (of AA, SS, and II) and a topping triangle (of CCC and II). The

combination of these two parts forms a right triangle with the ‘tip’ at AA = 100%

in the upper-left and the toe at SS = 100% in the lower-right. The hypotenuse is a

right-hand ‘limiting line’ of plotting positions because AA+SS+II=100%. Topping

triangle provides the basis for an ‘ORDIT ordering’ of the districts or instances.

According to the Figure 7, an idealized district has AA = 100% of the deleted

domain (DD) of other districts, that is the frequency of ascribed advantage being

equal to the number of competing districts, so if the ideal actually occurs, then the

(48)

Figure 7 Subordination schematic with plotted instance dividing a right triangle into two parts, a ‘trapezoidal triplet’ (of AA, SS and II) below, and a ‘topping triangle’ (of CCC, SS and II) above.

The numbers for ORDIT can be coupled as a decimal value ccc.bbb. The ccc

component is obtained by rounding CCC to two decimal places and then

multiplying by 100. The bbb component is obtained by dividing SS by CCC, and

imposing 0.999 as an upper limit. And then add these two values as ccc.bbb. This

ordering is assigned the acronym ORDIT and preserves all aspects of AA, SS and

II except for the actual number of districts. Simple rank ordering of ORDIT values

becomes salient scaling of the district (Myers and Patil 2010).

2.2.3 Product-order Rating Regime

A general relational rule for ascribing advantage is product-order whereby

advantage is gained by having all criteria at least as good and at least one better.

Conversely, subordinate status lies with having all criteria at least as poor and at

least one poorer. This relational rule is applicable to all kinds of criteria as long as

they have the same polarity (sense of better and worse). Scheme 2 (Myers and

Patil 2010) in Function Facilities gives an R function called ProdOrdr that

determines ORDITs and salient scaling according to product-order. This function

takes as its inputs a vector of IDs for instances, a data frame of same-sense criteria.

All indicators are either positive sense or negative sense. Indicators in this research

(49)

more severe the district. The output is a data frame of ORDITs and salient scaling

values

According to Figure 7 and its computation, ORDIT ordering is the ranking of

the instances based on their indicators. ORDITs and salient scaling according to

product-order are determined by Scheme 2 (Myers and Patil 2010) in Function

Facilities of R function. ORDIT is topping triangle in Figure 7 and Salient is the

ranking of ORDIT.

Precedence Plots

Based on computation of AA and SS for a particular district, the structure in

the lower part of the subordination schematic can be used to prepare a ‘precedence plot’ for visualization. The precedence plot is a plot for AA (ascribed advantage) as Y-axis and SS (subordinate status) as the X-axis. Prominent position declines

from top (upper-left) to toe (lower-right). Primary prominence varies vertically

showing that there is a larger percentage of ascribed advantage (greater severity)

with increasing height. Horizontal variation on a given level shows clarity of

comparison. Farther to the right is greater clarity as more definite advantage (less

severe) with a larger percentage of subordinate status versus indefinite instances

among the couplets where ascribed advantage is lacking. In other words, more

indefinite instances constitute increased lack of clarity (incomparability in the

usual parlance of partial ordering). Scheme 3 (Myers and Patil 2010) in Function

Facilities gives an R function named PrecPlot which accepts the output of the

ProdOrdr function and produces a precedence plot.

Representative Ranks

Representative ranks show descriptive statistics of indicator rankings of each

district. In other words, representative ranks show descriptive statistics for each

district according to ranking of indicators. The function of representative ranks is

to see how crucial a district is. The rank numbers received by a given district

across all criteria can be placed in a single array and sorted in ascending order.

(50)

component for the case. This computation is effective for several indicators.

Because the only two indicators, this study did not raise this calculation

2.2.4 The concepts of Askeskin or HIP and Surkin or PL

Health is a right and an investment for all citizens. All citizens are allowed to

health, including the poor. It required a system that regulates the implementation

of the effort to fulfill the right of citizens to remain healthy, with emphasis on

health care for the poor.

Basic right to receive health care is a policy launched by Government

included in handling th