Naim Jannat Nipu”, Id No to the Department of Computer Science and Engineering, Daffodil International University, has been accepted as satisfactory in partial fulfillment of the requirements for the degree of B.Sc. Department of Computer Science and Engineering Faculty of Natural Sciences and Information Technology Daffodil International University. Mahfujur Rahman, Sr. Lecturer & Tania Khatun, Sr. Lecturer Department of CSE Daffodil International University.
Mahfujur Rahman, Sr.Lecturer and Head, Department of CSE, for his kind help in completing our project and also to other faculty members and the staff of CSE Department of Daffodil International University. We would like to thank all our coursemates at Daffodil International University who participated in this discussion while completing the coursework. Being more or less sensitive to sensory inputs such as light, noise, clothing or temperature than others.
Having a brother or sister with ASD, older parents, specific genetic conditions and individuals with diseases such as Down syndrome and a relatively low birth weight are all risk factors for ASD [1]. Daffodil International University Have a long-term, keen interest in a particular subject, such as dates, details or facts. Being more or less sensitive to sensory input than other people, such as light, sound, clothing or temperature.
Having an ASD sibling, older parents, specific genetic defects, persons with conditions such as intellectual disability, and poor body weight gain are all risk factors for ASD [1].
Asperger’s syndrome
Childhood Disintegrative Disorder
Autistic Disorder
Pervasive Developmental Disorder not otherwise specified PDD(NOS)
Rett Syndrome
- Motivation
- Objective
- Contribution
- Data Collection
- Methods
- Data processing
- Supervised Machine Learning Analysis
Because the symptoms are less severe in the initial stages, it is difficult to detect. Clinical diagnosis, on the other hand, is too expensive for the majority of Bengalis. From this point of view, my goal is to create a Machine Learning model that can identify ASD at an early stage with high accuracy. To achieve the optimal efficiency and accuracy, an effective machine learning approach will be developed.
In that light, the goal of the study is to present a newly designed machine learning model that can diagnose ASD in children at an early stage. A number of studies have been conducted in the last ten years to develop a model to detect ASD using classification and other ML methods in children. 2021 [5] used five machine learning models, “Random Forest Classifier, Logistic Regression, Nave Bayes, Support Vector Machines, and KNN,” to categorize individual participants as having ASD or not having ASD based on age, gender, ethnicity, and other variables.
2020 [4] proposed Rules Machine Learning (RML), a new machine learning approach that gives users a knowledge base of rules for understanding the basics of categorization and diagnosing ASD symptoms. 2019 [6] used the ABIDE database to identify 6 personality traits in 851 people and trained and evaluated Machine Learning models using a cross-validation technique. They used nine machine learning classifiers on four datasets based on age range, such as toddler, child, adult and adolescent.
The advantage of using machine learning is that behavioral subtypes and their connections can be discovered. 2019 [10] aimed to perform a systematic review and a meta-analysis to synthesize the data on the performance of machine learning algorithms in the diagnosis of ASD. 2018 [11] presented a machine learning technique for early autism prediction by integrating a recording and home video screening.
A1_Score: Answer from Q1 Boolean Listing in the Table 2 for details Q1 A2_Score: Answer from Q2 Boolean Listing in the Table 2 for details Q2 A3_Score: Answer from Q3 Boolean Listing in the Table 2 for details Q3 A4_Score: Answer from Q4 Boolean Listing in the Table 2 for details Q4 A5_Score: Answer of Q5 Boolean Mention in the Table 2 for details Q5 A6_Score: Answer of Q6 Boolean Mention in the Table 2 for details Q6 A7_Score: Answer of Q7 Boolean Mention in the Table 2 for details Q7 A8_Score: Answer of Q8 Boolean Mention in the Table 2 for details Q8 A9_Score: Answer of Q9 Boolean Mention in the Table 2 for details Q9 A10_Score: Answer of Q10 Boolean Mention in the Table 2 for details Q10 Scoring Result Numeric Mention in the Table 2 for details . Data preprocessing is required for any machine learning or data mining strategy, because the effectiveness of a machine learning methodology depends on how effectively the dataset is prepared and structured. To obtain a better analytical or statistical outcome, such outliers should be removed using machine learning (ML) or data mining approaches [16].
The identified training dataset is primarily used in supervised machine learning algorithms to train the ML model. The corresponding subsection provides a brief summary of these proposed supervised machine learning techniques for disease diagnosis.
K-nearest neighbor (KNN)
Multilayer Perception (MLP)
Random Forest (RF)
Decision Tree (DT)
Performance Evaluation Criteria
Log Loss Log-loss is a useful performance indicator where the model output represents the probability of a binary outcome. All performance evaluation criteria are listed in Table 2 along with their functional mathematical equation. These were the key criteria used in this study to evaluate all classification algorithms and select the most effective algorithm for the early detection of ASD.
FST Methods
Results & Discussion
Statistical & Exploratory Data Analysis
The attributes jaundice, autism, and previously used app have only positive values in this dataset.
Performance Analysis
The figure also supports that Random Forest is the best performing classifier compared to all other applied classification algorithms. Daffodil International University Figure 7 shows the result of the area under the ROC (AUROC) curve in Figure 7 (A) and under the Precision-Recall (PRC) curve in Figure 7 (B). The AUROC curve and the PRC curve show the efficiency of a model based on the area covered by the learning rate.
Based on Figure 7, DT is the least performing classifier as it covered 0.926 and 0.942 in AUROC and AUPRC, respectively. However, the most efficient classifier is MLP and RF as they covered 100% area in both AUROC and AUPRC curves. Four FST methods such as CFSSE, IGAE, GRAE and RFAE are applied and their result is presented in the table.
To summarize, we collected an ASD dataset from Kaggle for building our predictive model, then the collected dataset was further processed as needed. Then we used five ML techniques such as KNN, MLP, DT and RF and evaluated their results based on precision, sensitivity, specificity, kappa statistics, precision, recall, F-Measure, log loss and MCC. We found that all applied algorithms performed well, with DT generating the highest performance with 95% accuracy.
It indicates that it is the most accurate in predicting ASD in the early stages. In addition, we applied FST methods to show the importance of features by applying four techniques such as CFSSE, IGAE, GRAE and RFAE, and the result is represented in Table 6. The results of FST methods help us to identify the most important factors, that are associated with ASD.
It should be noted, however, that the amount of data on ASD provided by this dataset was insufficient to address these issues adequately and that more data analytic approaches are needed to build a useful model. prediction. However, we expect that in the future, we will be able to understand the limitations of this approach and that more data analysis will allow more accurate predictions of ASD and comorbidities using ML methodologies.
Conclusions & Future Scope
Md Satu, Syeda Atik, Mohammad Moni, A Novel Hybrid Machine Learning Model for Diabetes Mellitus Prediction, 2019. Heart Disease Detection Using Machine Learning Algorithms and a Real-Time Cardiovascular Health Monitoring System. Peng, Linking logistic model tree and random subspace to predict landslide-susceptible areas considering the uncertainty of environmental characteristics, Sci.
Azimi, Multilayer Perceptron in development, and factorial design for modeling and optimization of paint decomposition from biosynthesized CdS- diatom nanocomposite, Environ.