• Tidak ada hasil yang ditemukan

This Report Presented in Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in Computer Science and Engineering

N/A
N/A
Protected

Academic year: 2023

Membagikan "This Report Presented in Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in Computer Science and Engineering"

Copied!
47
0
0

Teks penuh

This project titled “Diabetes Prediction Using Data Mining” was submitted by Khadiza Khanom, Farzana Hasnat Mou and Firuj Samiha Mimto, Department of Computer Science. Science and Engineering, Daffodil International University, has been accepted as satisfactory in partial fulfillment of the requirements for the degree of B.Sc. We hereby declare that we have carried out this project ourselves under the supervision of Ohidujjaman, Senior Lecturer, CSE Daffodil International University Department.

We also declare that neither this project nor any part of this project has been submitted elsewhere for the award of any degree or diploma. The deep knowledge and great interest of our supervisor in the field of "data mining" to carry out this project. But the modern improvement of data mining and machine learning approaches are finding a solution to this difficult problem.

This article is a description of our study to predict early-stage diabetes in women using data mining algorithms. To make it work, we used six data mining algorithms, namely K Nearest Neighbors, Decision Tree, Logistic Regression, Support Vector Classifier, Naive Bayes, and Random Forest.

LIST OF TABLES

Introduction

  • Introduction
  • Motivations
  • Research Questions
  • Expected Outcome
  • Report Layout
  • Preliminaries
  • Related Works
  • Research summary
  • Scope & Challenges
  • Introduction
  • Research Subject and Instrumentation
  • Data source and collection
  • Descriptive Statistic
  • Data Visualization
  • Data preprocessing
    • Criteria for selection
  • Working Flowchart
  • Implemented Algorithm
    • Support Vector Classifier
    • Naive Bayes
    • Decision Tree
    • Logistic Regression
    • Random Forest Machine Learning
  • Experimental Results & Analysis
    • Confusion Matrix and Heat Map of Support Vector
    • Confusion Matrix and Heat Map of Naive Bayes
    • Confusion Matrix and Heat Map of Decision Tree
    • Confusion Matrix and Heat Map of Logistic Regression
    • Confusion Matrix and Heat Map of Random Forest
    • Confusion Matrix and Heat Map of K Nearest Neighbors
  • Evaluation measures

The management ability of a large amount of data is the power of machine learning and Data Mining algorithms. In this work, K Nearest Neighbors, Decision Tree, Logistic Regression, Support Vector Classifier, Naive Bayes and Random Forest machine learning classification algorithms are used to predict diabetes disease in a patient's body. This study is also finding out the highest accuracy rate among the six data mining algorithms by machine learning approaches as we have used six algorithms here.

Since machine learning algorithms can manage a large data set and predict diabetes, there is a need for some normal data that can be obtained from a simple test, data mining techniques are used for this. There are so many techniques to predict diabetes via data mining and machine learning algorithms. Data mining and machine learning algorithms also have such a huge positive impact on medical files.

Using computing skills and algorithms such as machine learning and data mining, charge-efficient, effective, and faster techniques can be formed to predict diabetes, the collaborative studies show. Data mining and machine learning have become the essential techniques to explore and gain knowledge from complicated and huge attributed data recorded by medical. This is the reason why there are so many studies and studies around predicting diabetes using data mining and machine learning.

As we can see from the above studies, there are so many machine learning algorithms and data mining techniques have been used to predict diabetes. Machine learning approaches are also used in this direction and in this era machine learning is a key technique for any kind of experiment as it is easy to handle and faster. But we did not find any research that has six data mining algorithms with machine learning approach using Python programming language that gives a clear comparison between them.

For our work, we have to use machine learning techniques like LR, KNN, NB, RF, SVC and DT. The main objective of supervised machine learning is to divide the datasets into classes to identify the two marginal hyperplanes, identify the best peak margin to isolate two classes, and minimize the structural risk and kernel features to improve efficiency. Separation of the classes supervised machine learning will produce hyperplanes that correctly separate the classes.

Finally for predictive diabetes confusion matrix used to evaluate performance of supervised machine learning. The Naive Bayes is also a machine learning classifier that works for imbalance problems and missing data. In decision tree it has decision nodes and leaf nodes and branches where decision tree follows supervised machine learning technique using confusion matrix.

Finally, k nearest neighbors follow the classified machine learning algorithm to identify the number of nearest neighbors and count all training samples.

Table 2: Dataset Description
Table 2: Dataset Description

Impact on Society, Environment and Sustainability

  • Impact on Society
  • Impact on Environment
  • Ethical Expects
  • Sustainability Plan

The diabetes initiative has shown that there is a key strategy that can increase program sustainability for an effective program.

Summary, Conclusion, Recommendation and Implication for Future Research

Conclusion

Future Work

Future work in our study, we used the modern technology as our world is now in the modern era. Since our work is only about the women, dataset containing information about male patients cannot be used in our algorithms. So, our future work will be to integrate the two systems, one is which can predict women's diabetes and another is which can also predict men's diabetes.

One can provide their information in a website, web application or software and which will be able to predict diabetes in that patient. We will also try to diagnose diabetes in a patient using machine learning and data mining algorithms that can help people get more accurate results. After several years of study, new algorithms can be adopted to perform in this regard to provide a better solution.

2014) "Detection of diabetes mellitus and non-proliferative diabetic retinopathy using tongue color, texture and geometric features." 20] S.Selvakumar1, K.Senthamarai Kannan2 and S.GothaiNachiyar3, Prediction of Diabetes Diagnosis Using Classification Based Data Mining Techniques.

Gambar

Table 1 Related studies on Diabetes Prediction 6-7
Table 2: Dataset Description
Figure 2: Histogram of Pregnancies.
Figure 3: Histogram of Glucose.
+7

Referensi

Dokumen terkait

In this study, we assessed six different machine learning methods, including Support Vector Machines (SVM), Random Forest (RF), Naive Bayes (NB), Logistic Regression

In this article, machine learning algorithms are implemented, such as the support vector method, logistic regression, and the k nearest neighborhood method for classifying texts