View/Open - DSpace Repository

This project titled "A Comparative Study on Different Machine Learning Algorithms for Achieving Accurate Prediction of Heart Disease" presented by Pronab Ghosh, Khobayeb Ahmed and Madhob Karmaker at the Department of Computer Science and Engineering, Daffodil International University , has been accepted as satisfactory for partial fulfillment of the requirements for the degree of B.Sc. Finally, we are grateful to all the faculty members of Daffodil International University who have inspired and motivated us throughout the university program. Over the years, heart disease has become one of the most common causes of death.

Heart disease is usually discovered at the very end stages; therefore, accurate prediction can reduce the catastrophe associated with heart disease. The best prediction was achieved by the Random Forest algorithm, an ensemble version of the Decision Tree algorithm. The heart is considered one of the fundamental and essential parts of every human life.

Malfunction of the heart can lead to damage to some other vital parts of the body such as kidneys and brain. These days, most case studies related to heart disease fall into the categories of acquired heart disease. Most common heart-related diseases are cardiovascular disease, heart attack, coronary heart disease and stroke.

Stroke is the term for heart disease caused by narrowing, blockage or hardening of the blood vessels supplying the brain or by high blood pressure [1, 2]. Different factors are responsible for different acquired heart diseases.

Motivation

The rationale of the study

Because there have been a large number of deaths due to heart disease, the research topic was chosen. Finally, the paper has been working on this to provide a better suggestion that will help us reduce the number of deaths for our modern peoples.

Research Questions

Expected Output

Report Layout

These have been presented in different chapters according to the following guidelines: Chapter 2 has been given as background to this research. The research methodology was shown in chapter 3, where the process of its implementation was shown and a predictable source of data was also used. The experimental results and discussion given in Section 4 to show our expected results based on the given data set.

Finally, we have reviewed some aspects through Summary, conclusion, recommendation and implication for future research which is also explained in chapter 5.

BACKGROUND

Introduction
Related Works
Research Summary
Scope of the Problem
Challenges

Latha Parthiban et al [12] proposed neural network capabilities with fuzzy logic and genetic algorithm. Kiyong Noh et al [13] used a classified method that collected data from different functions and ECG. Peter et al [14] used pattern recognition and various types of data mining techniques including naive database, decision tree, KNN and neural networks.

Patil et al. Jabber et al [16] has proposed a new system based on serial number and group transaction data set, which is followed by mining strategy strategy for implementation by C programming. After we made the decision based on the current situation, we wanted to create a system that provides better performance due to the disease and understand the situation of affected patients.

Finally, we touch on our expected goal for God's blessing, which we thought to realize. The scope of problems in my thesis was to include data collection, missing value selection, data preprocessing and implementation process, and selection of an appropriate algorithm to obtain accurate prediction accuracy of a given system. After completing the implementation process, it produces a score that helps predict what stage of heart disease the patients are in, somehow predicts an accuracy of less than 70, and then is considered unusable due to low prediction.

At that time we have to choose another topic that we cannot think about in our dream. So we just divide our thesis work into several parts so that we can easily cope with it and cope with any problems, after which we can overcome it. After all, I completed the tasks and shared this idea with my friends and a younger brother. They are eager to accept this kind of system. I also share this thought with many of our respected teachers and the nice thing was that they encouraged me to complete this dissertation.

After preprocessing, our dataset has no zero value and helps us get a good prediction. Then, feature scaling helps to bring all feature values to the same scale in terms of value.

RESEARCH METHODOLOGY

Introduction
Research Subject and Instrumentation

Decision Tree
Multiple Linear Regression
Random forest
Support Vector Machine (SVM)

Data Collection Procedure

Input attributes
Key attribute
Predictable attribute

Statistical Analysis
Implementation Requirements
Introduction
Experimental Results
Descriptive Analysis
Summary

There are different types of machine learning algorithms, but we used some of them for our system. K defines the number of classes of target features, Pi defines the number of instances of class, i is divided by the total number of instances. Multiple linear regression is a model that creates the relationship between one dependent variable and two or more independent variables and fits them through the linear regression to observed data.

It also analyzes the data using for regression and classification and the work process is described by V. The most important part of this paper is to create intelligent heart disease prediction systems that help in the diagnosis of heart diseases using our set of heart disease data in Cleveland. UCI [17] has many cardiovascular disease databases in its machine learning repository [17], among which we obtained the Cleveland heart disease database which has 303 records.

Num - diagnosis of heart disease (angiographic disease status) (value 0: < 50% diameter narrowing, value 1: > 50% diameter narrowing). From this figure, we can easily understand that Random forest is the best solution among them. After downloading, the dataset has been applied to the encoding segment through the pandas library function.

Many researchers have implemented their system according to their proposed procedure for cardiovascular disease. So how good the results will be depends on the given data set and how much correct value is given in the given data set. In this study, we introduced various strategies to help get the right results based on the given data set.

We used four different methods in the proposed dataset, which included Multiple Linear Regression (MLR), Random Forest (RF), Decision Trees (DT) and Support Vector Machine (SVM). Third, the random forest gives us over 90% of the experience which was so good, and finally we hit 86% with Support Vector Machines. However, the researchers applied their system to their knowledge based on the Cleveland heart disease dataset.

However, our prediction system provides better accuracy than other existing systems and it gives us about 91% based on dataset. Finally, we overcome all kinds of obstacles to make accurate prediction and arrive at a stage that helps to identify the situation of disease.

Summary, Conclusion, Recommendation and Implication for Future Research

Summary of the Study
Conclusions
Recommendations
Implication for Further Study

However, we are satisfied with the results of our work and our results can be considered quite good compared to other related works. After that, the accuracy may change if we apply more features to a large dataset. This system is designed for affected patients who have no idea about the real signs of heart disease.

They will gain knowledge of all kinds of existing signs that are harmful to a healthy life. Eventually, the person affected by heart disease will be able to detect the disease in its initial stage. In modern times, the heart disease is considered as a formidable disease of all types of diseases.

For this reason in the future, we want to add more features according to that situation and will try to provide a better solution from the existing capabilities of the system.

7] Ma.jabbar, Dr.prirti Chandra, B.L.Deekshatulu,” cluster based association rule mining for heart attack prediction”, Journal of Theoretical and Applied Information Technology, 2011. Kumaraswamy,” Extraction of meaningful patterns from heart disease warehouses for heart attack Prediction”, (IJCSNS) International Journal of Computer Science and Network 228 Security, 2009. 9] Sellappan Palaniappan, Rafiah Awang, "Intelligent Heart Disease Prediction System Using Data Mining Techniques", IJCSNS International Journal of Computer Science and Network Security, Vol. 8 No.8, 2008.

10] Niti Guru, Anil Dahiya, Navin Rajpal, "Decision Support System for Heart Disease Diagnosis using Neural Network", Delhi Business Review, Vol. 11] Latha Parthiban and R.Subramanian, “Intelligent heart disease prediction system using CANFIS and genetic algorithm”, International Journal of Biological and Life Sciences, Vol. Kumaraswamy, “Extracting significant patterns from heart disease repositories for heart attack prediction,” International Journal of Computer Science and Network Security (IJCSNS), vol.

Deekshatulu, "CLUSTER BASED ASSOCIATION RULE MINING FOR," Journal of Theoretical & Applied Information Technology, vol. Mining in Prediction of Heart Disease Using Risk Factors,” i Proceedings of 2013 IEEE Conference on Information and Communication Technologies, nr. 19] Mai Shouman, Tim Turner, Rob Stocker,” Using Decision Tree for Diagnosing Heart Disease Patients”, Proceedings of the 9-th Australasian Data Mining Conference (AusDM'11), Ballarat, Australien.

Ramteke, "Diagnosis and Medical Prescription of Heart Disease Using Support Vector Machine and Back Propagation Technique", International Journal on Computer Science and Engineering Vol.

APPENDIX

X_train=preprocessing.normalize(X_train, norm='l1') X_test=preprocessing.normalize(X_test, norm='l1') from sklearn.preprocessing import StandardScaler sc_X=StandardScaler().