Prediction Model for
H
l N l DiseaseA project submitted to Dean of Research and Postgraduate Studies Office in full Fulfillment of the requirement for the degree
Master of Science (Intelligent System) Universiti Utara Malaysia
BY
Amy Ling Mei Yin
I
t
I
KOLEJ SASTERA D M SAINS
(College of Arts and Sciences) Universitf Utara Malaysia
PERAKUAN KERJA KERTAS PROJEK
(Ceryfrcate
of h j e c t Paper)
Saya, yang bertandatangan, memperakukan bahawa
(7,the undersigned, certifies that)
calon untuk Ijazah
(candidate
for the degree ofl Msc. flntelligent Smteml
telah mengemukakan kertas projek yang bertajuk
(haspresented his/her project of the following title)
PREDICTION MODEL FOR HlAl DISEASE
wperti yang tercatat di xnuka surat tajuk dan kulit kertas projek (as it appears on the title page andfmnt cover of project)
bahawa kertas projek tersebut boleh ditorima dati segi bentuk
sertakandungan
d mmeliputi bidang ilmu dengan memuaskan.
(that this project is in acceptable form and content, and that a satisfactory knowledge of the field is covered by the project).
Nama Penyelia
(Name of Superuisor)
:MISS ANIZA MOHAMED DIN Tamlatangan
(Sgnafu re) (Date)
: z + / 3 / ~ 0 1 1,
.
&dww
Nama Penilai
(Name of Evaluator)
:Tanciatangan
(Signature)
:w
-/Tariich (Date)
:WAN Z A W BIN WAN MUOA lAaww
~ - - r * y k c h n o k o ~ WMm
PERMISSION TO USE
In presenting this project in partial fulfillment of the requirements for a postgraduate degree from Universiti Utara Malaysia, I agree that the University Library may make it freely available for inspection. I further agree that permission for copying of this project in any manner, in whole or in part, for scholarly purpose may be granted by my supervisor(s) or, in their absence by the Dean of Postgraduate and Research. It is understood that any copying or publication or use of this project or parts thereof for financial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given t o me and t o Universiti Utara Malaysia for any scholarly use which may be made of any material from my project.
Requests for permission t o copy or t o make other use of materials in this project, in whole or in part, should be addressed t o
Dean of Research and Postgraduate Studies College of Arts and Sciences
Universiti Utara Malaysia 06010 UUM Sintok Kedah Darul Aman
Malaysia
ABSTRAK
Kajian ini mengunakan data H l N l daripada Hong Kong yang di kumpulkan daripada pesakit dari klinik (sektor persendirian dan swasta) di seluruh Hong Kong dengan influenza yang sama.
Objektif kajian ini adalah untuk menbina model ramalan untuk penyakit HlNl dengan mengunakan Multilayer Perceptron. Eperiment ini mengunakan
WEKA
machine learning sebagai. perkakas untuk mencipta nilai parameter untuk data tersebut. General Methodology of Design Research (GMDR) and Knowledge Discovery in Databases (KDD) telah digunakan sebagai pengukur rujukan dalarn kajian ini. Model ramalan untuk H l N l mengunakan MLP telah dihasilkan dan MLP menunjukkan keputusan prestasi yang baik dengan nilai ketepatan untuk penyakit H l N l adalah88.57%.
Kata kunci:
HlNl, Multilayer Perceptron, Nilai ketepatanABSRACT
This research has used the H l N l disease based on the data collected from outpatient clinics (private and public sectors) across Hong Kong with influenza like illness. The objective of this project is to develop a prediction model of H l N l disease using Multilayer Perceptron. The experiment using WEKA machine learning tool produced the best parameter's values for the datasets. The General Methodology of Design Research (GMDR) and Knowledge Discovery in Databases (KDD) has been used throughout the study as a guideline. Prediction model for HlN 1 disease using MLP has been generated and MLP has perfoms the good result where the value of accuracy for the H l N l disease is 88.57%.
Keywords:
HINI
disease, Multilayer Perceptron, Accuracy's valuesACKNOWLEDGEMENTS
First of all,
I
would like to express my gratitude and appreciation to my supervisor Miss Aniza for her help, guidance and encouragement. Without her encouragement and guidance, it will not be easy for me to reach till this extends in my report completion.Secondly,
I
would like to thanks my beloved fbture husband, Hong Kok Pan who always is ready to help and support me throughout my report. His full support remains the mainstay for me in overcoming all the difficulties in completing this project. Without his all angle support, it is impossible for me to finish up my project on time.TABLE OF CONTENTS
PERMISION TO USE
ABSTRACT (BAHASA MALAYSIA)
ABSTRACT (ENGLISH)
ACKNOWLEDGEMENT
LIST OF TABLES
LIST OF FIGURES
CHAPTER ONE: INTRODUCTION
1.1
The Context of the Study
1.2Statement of the Problem
1.3Objectives of the Study
1.4Significance of Study
1.5
Scope, Assumptions and Limitations of the Study
1.5.1Scope
1.5.2
Assumptions of the Study
1.5.3Limitations of the Study
1.6Organization of the Report
CHAPTER TWO: LITRERATURE REVIEW
Page
I1
I11
IV
v
VIII
IX
2.1
Data Mining
2.2
Neural Network
2.3Prediction in Medical
2.4Influenza A (HIN1)
CHAPTER THREE: METHODOLOGY
3.1
Introduction to WEKA Software Machine Learning Tools
3.2
Methodology
3.2.1
Awareness of Problem
3.2.2Requirement Gathering
3.2.3
Rule Extraction
3.2.4
Evaluation
CHAPTER FOUR: RESULT ANALYSIS
4.1
To determine the most suitable number of Hidden Units
4.2To determine the most suitable Learning Rate
4.3
To determine the most suitable Momentum Rate
4.4To determine the most suitable Number of Epoch
4.5To determine the most suitable Percentage Split
4.6The Network Architecture
4.7
Summary
CHAPTER FIVE: CONCLUSIONS
5.1
Recommendation and Future Work
REFERENCES
C
p.
I
b
alm
Ir
Y
Y,
Isl
I
iL.
111
I)
C
111
I
List of table
List Descriptions
Table 3.2.3 Percentage of Splitting data
Table 4 Starting parameters
Table 4.1 (a) Result to determine the best number of hidden unit
Table 4.1 (b) Result to determine the best Hidden Unit using various Weight Seed
Table 4.2 Result to determine the best Learning Rate Table 4.3 (a) Result to determine the best Momentum Rate
Tables 4.3 (b) Results of using Momentum 0.1,0.2 and 0.3 using various Weight Seeds Table 4.4 Result to determine the best Number of Epoch
Table 4.5 Result to determine the best Split Percentage
Table 4.7 Neural Network Model and the optimum parameters
LIST OF FIGURE
List Descriptions
Figure 2.2 (a) A Biological Neuron Figure 2.2 (b) An Artificial Neuron
Figure 3.1 Example of WEKA's Interface software
Figure 3.2 General Methodology of Design Research (GMDR) and KDD Process (Fayyad et al., (1996))
Figure 3.2.3 (a) Original Data
Figure 3.2.3.b (i) Layout of data imported to WEKA
Figure 3.2.3.b (ii) Missing value before preprocessing data Figure 3.2.3.g Layout of changing parameter in WEKA
Figure 4.1 Result to determine the best Hidden Unit using various Weight Seed Figure 4.3 Result to determine the best Momentum Rate using various Weight Seed Figure 4.6 Neural Network Architecture
CHAPTER
1INTRODUCTION
1.1
The Context of the Study
In
the spring of 2009, a newly identified flu virus called influenza A (or H l N l ) spread rapidly among people (Mabrouk & Marzouk, 2010). Based on the information from the Centers for Disease Control and Prevention (CDC), within a week, the virus spread worldwide to 30 countries by animal-to-human and human-to-human. According to the latest World Health Organization (WHO) statistics, there are more than 18,000 people died because of this virus since it was identified on April 2009. H l N l virus has spread to enough countries to be considered as a global pandemic. Influenza epidemics can seriously affect the health of all ages particularly children younger than 2 years old and adult age65
or older. People especially with certain medical conditions such as liver, lung, chronic heart, kidney, blood or metabolic diseases or weakened immune systems are at higher risk of being contacted with this disease.Patients of H l N l disease suffer because this disease is still unknown. Consequently, the determination of H l N l or common flu would require the current model such as Multilayer Perceptron (MLP) .Our project intents to focus on the MLP model and how this model can be used to predict HlN1.
The contents of the thesis is for
internal user
only
References
Ali, A., Khan, U., Tufail, A. & Minkoo Kim. (2010). Analyzing potential of SVM based classifiers for intelligent and less invasive breast cancer prognosis. Computer Engineering and Applications (ICCEA).
Alenezi, J.K., Awny, M.M., & Fahmy, M.M.M.(2009). Effectiveness of artificial neural networks in forecasting failure risk for pre-medical students. Computer Engineering & Systems, 2009.
ICCES 2009. International Conference.
Altiparmak, F., Ferhatosmanoglu, H., Erdal, S., & Trost, D.C. (2006). Information mining over heterogeneous and high-dimensional time-series data in clinical trials databases. Information Technology in Biomedicine.
Alty, S.R., Millasseau, S.C., Chowienczyc, P.J., & Jakobsson, A. (2003). Cardiovascular disease prediction using support vector machines. Micro-NanoMechatronics and Human Science, 2003 IEEE International Symposium.
Bigus, J.P. (1996). Data Mining with neural networks: Solving Business Problems- from application Development to Decision Support, McGraw Hill, New Work.
Boyd, T., Savel, T., Kesarinath, G., Lee, B., & Stim, J. (2010). The use of public health grid technology in the united states centers for disease control and prevention H l N l pandemic response. Advanced Information Networking and Applications Workshops (WAlNA), 2010 IEEE 24th International Conference.
CDC. Outbreak of swine-origin influenza A ( H l N I ) virus infection-Mexico, March-April 2009. MMWR Morb Mortal Wkly Rep, vol. 58, pp. 463-466, May 2009.
Camps-Valls, G., Porta-Oltra, B., Soria-Olivas, E., Martin-Guerrero, J.D., Serrano-Lopez, A.J., Perez-Ruixo, J.J., & Jimenez-Torres, N.V.(2003) Prediction of cyclosporine dosage in patients after kidney transplantation using neural networks.
Campos, M.M., Stengard, P.J., & Milenova, B.L. (2005) Data-centric automated data mining Machine Learning and Applications, 2005. Proceedings Fourth International Conference on Publication Year: 2005
Cao, J.R. & Liu, Y.J. (2010). Analysis on spatial heterogeneity of H l N l flu based on GIS spatial analysis technology. 201 0 International Conference on Multimedia Technology (ICMT).
Chen, C.Y., Bau, D.T., Tsai, M. H., Hsu, Y.M., Ho, T.Y., Huang, H.J., Chang, Y.H., Tsai F.J., Tsai, C.H., & Chen, C.Y. (2009). Drug design for the influenza A virus subtype HlN1.
Biomedical Engineering and Informatics, 2009. BMEI '09. 2nd International Conference.
Cowling, B. J., Chan, K. H., Fang, V. J., (2010). Comparative epidemiology of pandemic and seasonal influenza A in households. New England Journal of Medicine, 2010; 362:2175-84.
Dancea, O., Gordan, M., Dragan, M., Stoian, I., & Nedevschi, S.(2008) Postoperatory risk classification of prostate cancer patients using support vector machines. Automation, Quality and Testing, Robotics, 2008. AQTR 2008. IEEE International Conference.
Dancey, D., Bandar, Z.A., & McLean, D. (2010). Rule extraction from neural networks for medical domains Neural Networks (IJCNN), The 201 0 International Joint Conference
DARPA (1988), Neural Network Study, AFCEA International Press, New York.
Delshadpour, S. (2003) Improved MLP neural network as chromosome classifier. Biomedical Engineering, 2003. IEEE EMBS Asian-Pacific Conference.
Fayyad. U., Piatetsky-Shapiro, G., & Smyth P., From data mining to knowledge discovery in database, A1 Magazine (1996) pp. 37-54
Garcia-Orellana, C.J., Gallardo-Caballero, R., Macias-Macias, M., & Gonzalez-Velasco, H.
(2007). SVM and neural networks comparison in mammographic CAD.
Glotsos. D., Spyridonos. P., Petalas P., Cavouras. D., Zolota. V., Dadioti. P., Lekka. I., &
Nikiforidis. G. (2003). A hierarchical decision tree classification scheme for brain tumors aztrocytoma grading using support vector machines. Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis (2003).
Hafner. M., Gangl. Wrba. R., Kastinger. Ch., Uhl. A., Thonhauser. K., Schmidt. H.-P., & Vecsei.
A. (2007). Comparison of K-NN, SVM and NN in pit pattern classification of zoom- endoscopic colon images using co-occurrence histograms.
Haykin, S. (2009). Neural Network and Learning Machines. (3d ed.). Prentice Hall.
Jaffar, M.A., Ahmed, B., Hussain, A., Naveed, N., Jabeen, F., & Mirza, A.M. (2009) Multi domain features based classification of mammogram images using SVM and MLP. Innovative Computing, Information and Control (ICICIC), 2009 Fourth International Conference.
Jajoo, R., Mital, D., Haque, S., & Srinivasan, S. (2002) Prediction of hepatitis C using artificial neural network. Control, Automation, Robotics and Vision, 2002. ICARCV 2002. 7th International Conference.
Joshi, S., Shenoy, D., Vibhudendra Simha, G.G., Rrashmi, P.L., Venugopal, K.R., & Patnaik, L.M.
(2010)
.
Machine Learning and Computing (ICMLC), 201 0 Second International Conference.Koay, J., Herry, C., & Frize, M. (2004) Analysis of breast thermography with an artificial neural network. Engineering in Medicine and Biology Society, 2004. IEMBS '04. 26th Annual International Conference of the IEEE,
Kholghi, M., Hassanzadeh, H., & Keyvanpour, M. (2010) Classification and evaluation of data mining techniques for data stream requirements. Computer Communication Control and Automation (3CA), 2010 International Symposium
KO$, S., Yilmaz, G., & Kabak, Y. (2010). The clinical guidelines usage towards the diagnosis and treatment of H l N I . Health Informatics and Bioinformatics (HIBIT), 2010 5th International Symposium.
Li. T., Li. Q., Zhu. S. H., & Ogihara. M., (2002). A Survey on Wavelet Applications in Data Mining. SIGKDD Explor. Newsl.
Liu Y. (2010). Investigation of prediction and establishment of SIR model for HlN 1 epidemic disease. Bioinformatics and Biomedical Engineering (iCBBE), 2010 4th International Conference.
Lo, J.Y. (1999). Application of artificial neural networks for diagnosis of breast cancer.
Evolutionaty Computation, 1999. CEC 99. Proceedings of the 1999 Congress.
Mabrouk, M., S, & Marzouk. S., Y. (2010). A chaotic study on pandemic and classical (HlNl) using EIIP sequence indicators. 2010 2nd International Conference on Computer Technology and Development (ICCTD 201 0).
Nirkhi, S. (2010) Potential use of artificial neural network in data mining. Computer and Automation Engineering (ICCAE), 2010 The 2nd International Conference on Volume: 2.
Palaniappan, S., & Awang, R.(2008) Intelligent heart disease prediction system using data mining techniques. Computer Systems and Applications, 2008. AICCSA 2008. IEEE/ACS International Conference.
Qi, F., Zhu, C. J., & Liu, Y. (2010) Predicting breast cancer recurrence using data mining techniques. Bioinformatics and Biomedical Technology (ICBBT), 2010 International Conference
Sadeghzadeh, N., Afshar, A., & Menhaj, M.B. (2008). An MLP neural network for time delay prediction in networked control systems. Control and Decision Conference, 2008. CCDC 2008.
Schwarzer, G., Vach, W., & Schumacher, M., (2000). On the misuses of artificial neural network for prognostic and diagnostic classification in oncology. Statistics in Medicine.
Seker, H., Odetayo, M., Petrovic, D., Naguib, R.N.G., Bartoli, C., Alasio, L., Lakshmi, M.S., &
Sherbet, G.V.(2001) Prognostic comparison of statistical, neural and fuzzy methods of analysis of breast cancer image cytometric data. Engineering in Medicine and Biolog~
Society, 2001. Proceedings of the 23rd Annual International Conference of the IEEE Sewak. M., Vaidya. P., Chan. C. C., & Duan. Z. D. (2007). SVM approach to breast cancer classification. Second International Multisymposium on Computer and Computational Sciences 200 7.
Segovia-Vargas M.J., Gil-Fana J.A., Heras-Martinez A., Vilar-Zanon, J.L. and Sanchis-Arellano A.
(2003). Using rough sets to predict insolvency of Spanish non life insurance companies.
Smith,
G.
J., Vijaykrishna,D., Bahl,
J., Lycett, S. J., Worobey, M., Pybus, 0. G.et al.
(2009). Origins and evolutionary genomics o f the 2009 swineorigin H l N l influenza
A
epidemic.Nature,
vol. 459, pp. 1122-1 125,June
2009.Su, M. W., Chen, P.C., Chu, W.C., & Yuan, H.S. (2010). Host specific codon usage pattern of H 1 N l influenza A viruses. Bioinformatics and Biomedical Engineering (iCBBE), 201 0 4th International Conference.
Tay, S. S. (2009). Predicting Employment Condition o f TAR'S ICT Graduates Using Baclcpropagation Neural Network.
Tian, Z. G., & Zuo, M.J.(2010). Health Condition Prediction of Gears Using a Recurrent Neural Network Approach Reliability, IEEE Transactions
.
Tourassi, G.D., Floyd, C.E., Jr, & Lo, J.Y. (1999)A constraint satisfaction neural network for medical diagnosis. Neural Networks, 1999. IJChW '99. International Joint Conference.
Tsoukalas, L.
H.
& Uhrig, R.E.
(1997). Fuzzy and Neural Approaches in Engineering.New Work: John Wileys & Sons.
Ultsch, A., Korus, D., & Kleine, T. 0. (1995). Integration of neural networks and knowledge-based system in medicine. Hans-Meenvein-Strabe, Lahnberge, Marburg.
Yarmand, H., Ivy, J. S., Roberts, S. D., Bengtson, M.W., & Bengtson, N. M. (2010). Cost- effectiveness analysis of vaccination and self-isolation in case of H1Nl. Winter Simulation Conference (WSC), Proceedings of the 2010.
Walczak, S. (2005) Artificial neural network medical decision support tool: predicting transfusion requirements of ER patients. Information Technology in Biomedicine, IEEE Transactions.
Wang, F.P., & He, X. H. (2010) Prediction of HLA-A*0201 restricted cytotoxic T lymphocyte epitopes in Influenza A H l N l Virus and the similarity analysis of these epitopes with those existing in other influenza viruses. Bioinformatics and Biomedical Engineering (iCBBE), 2010 4th International Conference.
Wang, L. M., Chen, J. X., Pei, Y., Zhao, X., Cui, H. T., & Cui, H. Z. (2010) Feature selection and prediction of sub-health state using SVM-RFE. Art$cial Intelligence and Computational Intelligence (AICI), 2010 International Conference.
Xing, Y. W., Wang, J., Zhao, Z.H., & Yonghong Gao. (2007) Combination data mining methods with new medical data to predicting outcome of coronary heart disease convergence.
Information Technology, 2007. International Conference.
Zhang, W., Zeng. F., Wu. X., Zhang. X., & Jiang. R. (2009). A comparative study of ensemble learning approaches in the classification of breast cancer metastasis. 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.
Zhang, Z., & Fenstermacher, D. (2003) SEP: score for expression profile-a novel method for predicting clinical outcome in breast cancer. Engineering in Medicine and Biology Society, 2003. Proceedings of the 25th Annual International Conference of the IEEE