Implementation of Adam Optimizer using Recurrent Neural Network (RNN) Architecture for Diabetes Classification

(1)

Implementation of Adam Optimizer using Recurrent Neural Network (RNN) Architecture for Diabetes Classification

Nur Cahyo Tio Nugroho^*, Erwin Yudi Hidayat

Fakultas Ilmu Komputer, Prodi Teknik Informatika, Universitas Dian Nuswantoro, Semarang, Indonesia Email: ^1,*[email protected], ²[email protected]

Correspondence Author Email: [email protected]

Abstract−Non-communicable diseases (NCDs) present a considerable worldwide health dilemma, resulting in considerable expenses for treatment and heightened rates of mortality. Conditions like diabetes mellitus, cardiovascular diseases, cancer, and chronic respiratory diseases are primary causes of global mortality, making up 71% of total global deaths in 2016, as reported by the World Health Organization (WHO). Diabetes Mellitus (DM), marked by prolonged elevated blood glucose levels, stands out as a significant metabolic disorder. This research delves into the implementation of Recurrent Neural Networks (RNNs) utilizing the Adaptive Moment Estimation (Adam) optimizer for classifying Diabetes Mellitus (DM). RNNs, a subset of artificial neural networks tailored for sequential data processing, are employed to make predictions by incorporating recurrent connections. Situated within the dynamic landscape of Artificial Intelligence and Machine Learning, the research exhibits promising outcomes via k-fold cross-validation, confusion matrix analysis, loss graph examination, and classification report. The RNN-Adam model showcases commendable overall performance, achieving an average accuracy of 80.20%

through k-fold cross-validation and 81.60% accuracy as revealed by the confusion matrix. This research offers valuable insights into the effectiveness of the RNN-Adam model for diabetes classification.

Keywords: Recurrent Neural Network; Optimizer; Adam; Diabetes; Classification System

1. INTRODUCTION

Non-communicable diseases (NCDs) represent a global health issue with significant implications, including high treatment costs and elevated mortality rates. NCDs such as diabetes mellitus, cardiovascular diseases, cancer, and chronic respiratory diseases are leading causes of death worldwide. According to the World Health Organization (WHO), NCDs accounted for approximately 71% of total global deaths in 2016, with cardiovascular diseases contributing 31%, cancer 16%, chronic respiratory diseases 7%, and diabetes mellitus 3%. In the Southeast Asia region, the mortality rate due to NCDs is also notably high, reaching around 23%. Therefore, this scientific paper is written with a focus on one well-known NCD, namely, cases of Diabetes Mellitus (DM) [1]. Diabetes Mellitus (DM), commonly known as diabetes, is a group of metabolic disorders characterized by prolonged elevated blood glucose levels. Symptoms often involve frequent urination, increased thirst, and heightened appetite. Left untreated, diabetes can lead to various complications. Acute complications may involve diabetic ketoacidosis, hyperglycemic hyperosmolar state, or even death [2], [3].

The traditional approach to integrating new treatments often involves elaborate and expensive strategies, coupled with the requirement to present evidence to various stakeholders, especially regulatory bodies and healthcare professionals [4]. Contemporary problem-solving methods provide a more streamlined way to categorize diabetes when compared to conventional approaches. This method considers patient data, symptoms, and laboratory test results, enabling a more personalized and efficient strategy. This involves a combination of medications and lifestyle adjustments to improve patient care [5].

The swift progress of Artificial Intelligence (AI) and data science signifies an ever-evolving journey characterized by notable accomplishments and transformative changes in technology paradigms. The integration of Machine Learning (ML) into the domain of AI has introduced an innovative approach, inspired by the intricate mechanisms of the human brain [6]. In the context of ML, classification systems play a pivotal role designed to predict user preferences [7]. Classification encompasses two distinct types: binary classification and multi-label classification. Binary classification deals with output categories that can be designated as either "yes" or "no." On the other hand, multi-label classification involves outputs that extend beyond straightforward "yes" or "no"

categorization [8]. In the creation of this classification system, leverage recorded data on Diabetes Mellitus (DM) cases to pinpoint risk factors linked to early symptoms of DM. Utilize statistical, mathematical, and computational techniques within the field of data science to scrutinize patterns in the data associated with the initiation of DM symptoms.

Deep Learning (DL), falling under the umbrella of machine learning, mimics human learning by relying on examples to generate precise outcomes. It employs neural networks for precise classification and is commonly known as deep neural networks [9]. The surge in DL is intricately linked to the capabilities of Neural Networks (NN), where multiple layers sequentially acquire abstract features to enhance predictions. Emulating the human brain's structure, NN comprise interconnected nodes organized in layers, with input and output layers delineating its structure. The pivotal hidden layers, situated between input and output, contribute significantly to internal computations. Neurons within these layers execute weighted sums and employ nonlinear functions. Information progresses unidirectionally through the network, culminating in predictions at the output layer. The architecture of the network, encompassing the count of hidden layers and neurons, adjusts based on the intricacy of the data

(2)

[10], [11]. This research implements one of the Deep Learning algorithms, namely Recurrent Neural Network (RNN).

Recurrent Neural Networks (RNN) are a class of artificial neural networks that process and predict sequential data by incorporating recurrent connections. These connections allow RNNs to have visibility of both current and previously hidden information. RNNs are biologically inspired artificial neural networks with feedback connections, allowing them to maintain memory [12]. Optimization is a pivotal component in machine learning, holding a fundamental role in both the creation and implementation of models. The success of machine learning models, especially in the era of big data, is heavily influenced by the effectiveness and efficiency of numerical optimization algorithms. The primary objective of an optimizer is to enhance a model by minimizing the loss function [13]. This study aims to apply Recurrent Neural Networks (RNNs) for the classification of diabetes, utilizing Adaptive Moment Estimation (Adam) optimizers to enhance performance. The selection of Adam is based on its status as an advanced Stochastic Gradient Descent (SGD) method in optimization.

Adam is introduce as an adaptive learning rate for each parameters. It combines the adaptive learning rate and momentum methods. While RMSprop Calculating the individually adapted learning rate for each parameter by reducing the learning rate based on the average square gradient from the previous [14]. Several previous studies also inspired the researchers to write this manuscript. The paper conducted by [15], proposes a machine learning- based approach for the classification, early-stage identification, and prediction of diabetes, achieving an accuracy of 86.08% for diabetes classification using the Multilayer Perceptron (MLP) classifier. For predictive analysis, the Long Short-Term Memory (LSTM) model is employed, achieving an accuracy of 87.26% for diabetes prediction.

Another research by [16], the study assessed the effectiveness of decision tree, K-nearest neighbor, random forest, and Naive Bayes classifiers in predicting diabetes among female patients in Bangladesh. The findings indicated strong performance by both random forest and Naive Bayes classifiers. From research conducted by [17], using two common boosting algorithms, Adaboost.M1 and LogitBoost, machine models for diabetes diagnosis were established based on clinical test data from 35,669 individuals. The classification models generated by both algorithms exhibit excellent classification abilities. Specifically, the LogitBoost classification model slightly outperforms the Adaboost.M1 classification model. The overall accuracy of the LogitBoost model reaches an impressive 95.30% with 10-fold cross-validation. Additionally, the binary classification model demonstrates high true positive and true negative rates (0.921 and 0.969, respectively), low false positive and false negative rates (0.031 and 0.079, respectively), and an outstanding area under the receiver operating characteristic curve of 0.99.

Research into optimizers has been thoroughly explored in numerous prior studies. In a research conducted by [18], the accuracy of AdaDelta and Adam optimizers was compared in the implementation of the CNN algorithm for fingerprint classification using a Kaggle dataset. The evaluation outcomes reveal that the Adam Optimizer surpasses AdaDelta, achieving an accuracy of 91.73%, while AdaDelta achieved an accuracy of 85.61%. The Adam Optimizer demonstrates greater effectiveness than AdaDelta in the realm of fingerprint classification applications. Research conducted by [19], compared optimizers, including SGD, RMSprop, Adam, AdaDelta, Adamax, and Adaptive Gradient, were evaluated for Cloth Pattern Prediction using the RetinaNet Algorithm. The model utilizing the Adamax optimizer with a learning rate of 1e-4 achieved the highest accuracy, reaching a mean Average Precision (mAP) of 91.28% during training. In the testing phase, this model demonstrated precision of 93.01%, recall of 92.91%, f1-score of 92.79%, and an overall accuracy of 92.91%. Conversely, the model employing the Adam optimizer with a learning rate of 1e-5 achieved a training accuracy with mAP of 85.48%, while the RMSprop optimizer with a learning rate of 1e-5 achieved a training accuracy with mAP of 88.82%.

2. RESEARCH METHODOLOGY

2.1 Research Stages

The study commences by acquiring the dataset through data acquisition from Kaggle, followed by the dataset entering the data preprocessing stage. After finishing the data preprocessing, next stage includes the development of a deep learning model using a Recurrent Neural Network (RNN) Architecture with Adam Optimizer, both the training and validation data undergo training with cross-validation. This is followed by evaluating the test data using various metrics, including accuracy metrics, confusion matrix, classification report, and loss graph. As if represent on the research stages in Figure 1.

Figure 1. Research Stages

(3)

2.2 Data Acquisition

This research utilizes secondary data from Kaggle, a website providing datasets. Diabetes prediction dataset consists of medical and demographic data of patients, along with their diabetes status, encompassing 9 columns, gender, age, hypertension, heart_diseases, smoking_history, bmi, hbA1c_level, blood_glucole_level, diabetes.

The dataset comprises a total of 100,000 rows of data.

Table 1. Dataset Before Preprocessing gender age hypertensio

n

heart_disease smoking _history

bmi HbA1c_level blood_

glucose _level

diabetes

Female 42.0 0 0 No Info 27.32 5.7 80 0

Male 37.0 0 0 ever 25.72 3.5 159 0

Male 67.0 0 1 not

current

27.32 6.5 200 1

Male 50.0 1 0 current 27.32 5.7 260 1

Female 26.0 0 0 never 21.22 6.6 200 0

Table 1 shown the example from dataset before preprocessing.

a. gender refers to the biological sex of the individual, b. age refers to the age of each individual,

c. hypertension refers to blood pressure data,

d. heart_disease refers to data on whether you have heart disease or not, e. smoking_history refers to data as an active smoker or not,

f. bmi is a measure of body fat based on weight and height, g. Hba1c_level is a hemoglobin measure of the individual,

h. blood_glucole_level refers to the amount of glucole in bloodstream, i. diabetes is the target label.

2.2 Data Preprocessing

Data pre-processing holds utmost significance in preparing data for effective utilization in data mining, machine learning, and various other data science initiatives. This constitutes a foundational phase in the initial steps of the evolution of machine learning and AI, exerting a substantial impact on result accuracy. Before the incorporation of data into machine learning models, it requires undergoing a series of pre-processing procedures [20].

Figure 2. Data Preprocessing Stages

Figure 2 illustrates the data preprocessing steps undertaken in this research, includes dropping unnecessary data, missing value identification, data duplicate identification, unique columns identification, label encoding and imbalance data handling. Aims to ensure that the algorithmic calculations produce appropriate results.

2.2.1 Data Cleaning

The initial step in data preprocessing involves removing entries associated with the "Other" gender, effectively clearing entries within the gender attribute. Subsequently, a comprehensive check for empty data is conducted.

Notably, in the Diabetes dataset, there are no rows containing missing data, confirming the dataset completeness.

Following this, a check for data duplication is carried out, revealing a total of 3,854 rows marked as duplicate data. Consequently, it is essential to eliminate these duplicated entries. The study leans towards deletion

(4)

as a viable option, given the substantial number of duplicated data, and the removal of approximately 5% of the data is not expected to have a significant impact.

2.2.2 Data Manipulation

The manipulation stage involves identifying the "Unique Column" through Label Encoding, with the objective of converting categorical data into numerical data. Label encoding is a method that transforms labels into a numeric format, facilitating machine interpretation. This process enables algorithms to make more informed decisions regarding the utilization of these labels, representing a critical step in the pre-processing of structured datasets for supervised learning [20]. In Diabetes dataset, data manipulation is performed for 2 categorical attributes, namely gender and smoking_history. The gender attribute is transformed as follows: female = 0, Male = 1. Meanwhile, the smoking_history attribute is transformed into: never = 0, No Info = 1, current = 2, former = 3, ever = 4, not current = 5.

Data imbalance arises when there is a significant difference in the number of instances between distinct classes. This situation is frequent in machine learning, where one class significantly surpasses another, leading to a scenario where the model achieves high accuracy primarily for the majority class [21]. Balancing the data can significantly enhance the model's performance when dealing with an imbalanced dataset. There are multiple techniques to tackle imbalanced data, and in this study, the selected method is employing random oversampling.

This is a simple approach wherein samples from the minority class are randomly chosen and replicated without specific criteria. The ratio is expressed by this equation for the Random Over Sampler is as follows:

∝_os = ^N^rm

N_M (1)

Where is:

∝_os = Over sampler

Nrm= Number in the minority class after resampling NM= Number of samples in majority class

Table 2. Dataset After Preprocessing gender age hypertensio

n

heart_disease smoking _history

bmi HbA1c_level blood_

glucose _level

diabetes

0 42.0 0 0 1 27.32 5.7 80 0

1 37.0 0 0 4 25.72 3.5 159 0

1 67.0 0 1 5 27.32 6.5 200 1

1 50.0 1 0 2 27.32 5.7 260 1

0 26.0 0 0 0 21.22 6.6 200 0

The initial dataset consisted of 100,000 rows of data with 9 attributes, including secondary data. After undergoing data preprocessing, it was reduced to 96,128 rows while retaining the same 9 attributes. Subsequently, after implementing imbalanced data handling, the data increased to 175,292 rows. This is due to the application of oversampling, which equalized the minority data to the majority data. After undergoing all those preprocessing steps, the post-preprocessing data is presented in Table 2.

2.3 Recurrent Neural Network

A Recurrent Neural Network (RNN) is a type of artificial neural network specifically designed to process sequential or temporally related data. RNN has the ability to "remember" previous information in the data sequence, making it highly valuable in various cases where historical information or the sequence of data is crucial.

RNN are a type of Artificial Neural Networks that establish connections between different nodes to analyze temporal dynamics [12], [22]. Therefore, RNN is exceptionally well-suited for sequential data types such as time series or sequences where historical data is essential. However, in this research, an attempt is made to implement it in a binary classification system, a scenario that might be considered less optimal for classification tasks.

2.4 Adam Optimizer

Adam Optimizer is a specialized algorithm for optimizing the learning rate in training deep neural networks. Unlike traditional stochastic gradient descent, Adam dynamically adjusts the learning rate for each parameter independently throughout the training process. It leverages adaptive learning rate techniques by utilizing estimates of both the first and second-order gradient moments. This dynamic adjustment of the learning rate for every weight in the neural network allows Adam to overcome limitations associated with fixed learning rates, leading to more efficient and effective training [23], [24].

The equation this calculates the exponentially decaying average of past gradients to obtain the first moment estimate.

(5)

m̂t= β1∗ m̂t−1+ (1 − β1) ∗ g_t (2) Similar to the first moment estimate, the next equation calculates the exponentially decaying average of past squared gradients to obtain the second moment estimate.

v̂_t= β₂∗ v̂_t−1+ (1 − β₂) ∗ (g_t)² (3)

After calculating the biased first and second moment estimates, the Adam optimizer updates the parameters using the following equation:

θt+1 = θt− ^∝

√v̂_t+ϵ m̂t (4)

2.4 Evaluation

K-Fold Cross-Validation is employed to evaluate the effectiveness of Adam Optimizers. Cross-validation plays a pivotal role in assessing a model's performance on unseen data and serves as a precaution against overfitting [25].

The dataset is divided into training, validation, and testing sets, ensuring a diverse range of data through testing.

This research employs a loss graph for the evaluation of train loss and val loss derived from the training results. A classification report is a comprehensive summary of a classification model performance,it provides a detailed evaluation of the model's performance on a per-class basis.

Confusion Matrix is use during this study too, a widely utilized visualization tool, provides valuable insights into model performance. Precision, recall, accuracy, and f1-score make up the confusion matrix. Accuracy measures the level of agreement between actual values and expected values. Meanwhile, precision is used to gauge the accuracy of the information requested by users and the system's response. The system's success rate in retrieving information is indicated by the recall value. F1-score is the harmonic mean of precision and recall. It provides a balance between precision and recall.

Recall = ^TP

TP+FN (5)

Precision = ^TP

TP+FP (6)

Accuracy = ^TP+TN

TP+TN+FP+FN (7)

f1 − score = 2 ∗ (precision ∗ recall

precision + recall) (8)

Where is:

a. TP (True Positive) : Class prediction and amount of data in actual class is positive.

b. TN (True Negative) : Actual and predicted classes has a negative amount of data.

c. FP (False Positive) : The projected class is positive while the class actually has a negative amount of data.

d. FN (False Negative) : The amount of data in the actual class positive, but the projected class is negative.

3. RESULT AND DISCUSSION

3.1 RNN Model Architecture

Figure 3. RNN Architecture Layer As ilustrates in Figure 3, the architecture consist:

a. Input Layer:

(6)

The input shape is specified as (n_timesteps = 1, n_features = 8). This suggests that the model expects input sequences with n_timesteps as time steps and n_features as features.

The input layer is a SimpleRNN layer, a type of recurrent layer in Keras.

It has 10 units (neurons) with the Rectified Linear Unit (ReLU) activation function.

b. Hidden Layers:

There is one hidden dense layer with 8 units and ReLU activation.

Another hidden dense layer follows with 3 units and ReLU activation.

c. Output Layer:

The output layer is a dense layer with 1 unit and a sigmoid activation function.

It is common to use sigmoid activation in binary classification tasks.

In summary, this is a simple Recurrent Neural Network (RNN) architecture for binary classification. It takes input sequences, processes them through a simple RNN layer to capture temporal patterns, passes through a couple of dense hidden layers, and produces a binary classification output.

The method is applied by designing the RNN model architecture as depicted above. These steps involve determining the input format, adding hidden layers, and configuring the output. The application of the method includes using appropriate layers, such as SimpleRNN and dense layers with the correct activation functions.

3.2 Evaluation Result

The implementation of training the Neural Network (NN) model is carried out using K-Fold Cross-Validation with n_splits = 5. In the context of K-Fold Cross-Validation, n_splits refers to the number of folds or sections that will be created from the data. In this case, n_splits = 5 means that the data will be divided into 5 folds with 10 epochs, using binary_crossentropy as the loss function.

Here is a breakdown of each epochs output of training process from Recurrent Neural Network (RNN) using K-Fold Cross-Validation.

a. Epoch 1/10: The model is trained for 10 epochs. For each epoch, it goes through the entire training dataset (3506 samples) in batches of 32, updating the weights of the network to minimize the binary crossentropy loss.

The accuracy on the training set is shown as 64.66%, and on the validation set, it is 72.17%.

b. Epoch 2/10: The model continues to learn, and you can see improvements in accuracy. The training accuracy increases to 77.83%, and the validation accuracy is 77.35%.

c. Epoch 3/10: Further improvement is observed, with training accuracy at 86.99% and validation accuracy at 88.99%.

d. Epoch 4/10 to Epoch 10/10: The model continues to learn and refine its parameters. The training and validation accuracies fluctuate in subsequent epochs.

After completing the training for each fold, the model is evaluated on the validation set of that fold. The average accuracy across all folds is 80.20%.

The method applied using the K-Fold Cross-Validation technique involves dividing the data into several folds for a more robust training process. Each fold contributes to the model training, and the final result is taken as the average across all folds. This is done to ensure that the model can effectively grasp patterns from the entire dataset. The accuracy scores for each fold indicate how well the model generalizes to unseen data. It is important to note that the model's performance may vary across folds, and the average accuracy gives an overall assessment of its effectivenes.

Figure 4. Confusion Matrix Examining Figure 4, we can conclude:

a. True Positives (TP): The model correctly predicted diabetes (1) 11,057 times. These are instances where the individual has diabetes, and the model correctly identified them.

b. False Positives (FP): The model predicted diabetes (1) for 700 instances where the actual condition was not diabetes (0). These are instances of "false alarms" or Type I errors.

(7)

c. True Negatives (TN): The model correctly predicted not having diabetes (0) for 13,266 instances. These are instances where the individual does not have diabetes, and the model correctly identified them.

d. False Negatives (FN): The model predicted not having diabetes (0) for 3,023 instances where the actual condition was diabetes (1). These are instances of "missed opportunities" or Type II errors.

Figure 5. Calculate Metrics Figure 5 shown the calculated metrics, from that researcher can conclude:

a. Recall, or True Positive Rate, assesses the model's capability to accurately recognize all pertinent instances.

Specifically, it indicates the proportion of true positive instances (diabetes) correctly identified by the model.

A recall value of 0.785 signifies that the model successfully identified approximately 78.50% of the actual diabetes cases.

b. Precision is a measure of the accuracy of the positive predictions made by the model. In this context, it tells us what proportion of predicted positive instances (diabetes) were actually correct. A precision of 0.941 means that out of all the instances predicted as diabetes, 94.10% were correct.

c. Accuracy provides a comprehensive gauge of the model's performance, taking into account both true positives and true negatives. An accuracy value of 0.816 indicates that the model accurately predicted instances of both diabetes and non-diabetes approximately 81.60% of the time.

d. F1-Score represents the harmonic mean of precision and recall, offering equilibrium between these two metrics.

A higher F1-Score implies an improved balance between false positives and false negatives. With an F1-Score of 85.70%, the model demonstrates strong overall performance, effectively considering both precision and recall aspects.

In summary, while high precision indicates that the model is making accurate positive predictions, high recall indicates that it is effectively capturing all positive instances. The f1-score considers both aspects and is especially useful when there is an imbalance between the classes. The accuracy gives an overall performance measure across both positive and negative predictions.

Figure 6. Loss Graph

Looking at Figure 6, loss values show a significant decrease in each epoch. This demonstrates that the model encounters stability when the validation data is processed. While reviewing the model's performance at the conclusion of each epoch, there is a slight increase in the loss observed during the validation phase. This rise in loss on the validation set might indicate that the model, despite initially improving, faces challenges in generalizing data. It is essential to interpret this in the context of the entire training process.

Figure 7. Classification Report

(8)

This classification report from Figure 7, provides a comprehensive evaluation of the model's performance for each class, as well as overall metrics considering class imbalances. It suggests that the model is performing reasonably well, with high precision and recall values for both classes. The macro and weighted averages further affirm the model's effectiveness across the entire dataset.

4. CONCLUSION

The comprehensive evaluation from the K-Fold Cross-Validation, confusion matrix, loss graph, and classification report provides valuable insights into the model's performance. The K-Fold Cross-Validation showcases a reasonable average accuracy of 80.20%, indicating a satisfactory level of generalization across diverse subsets.

The confusion matrix reveals a mix of accurate predictions and challenges, particularly in correctly identifying instances of diabetes. While the loss graph suggests effective learning, a slight increase in validation loss raises concerns about potential overfitting, suggesting a need for further model refinement, regularization, or architectural adjustments. A closer look at the classification report reveals that the model is quite good at correctly spotting cases where there is no diabetes (Class 0). However, there's room for improvement when it comes to identifying instances of diabetes (Class 1). It's important to strike a balance between making accurate positive predictions (precision) and catching all the actual positive instances (recall) for both classes. This balancing act is crucial, and it suggests that some adjustments are needed to fine-tune the model for better performance. In summary, while the model overall performs well, there are areas where it could do even better, especially by addressing overfitting and improving its predictions for diabetes cases. Future improvements may involve strategic modifications in the hidden layers, introduction of supplementary layers like dropout to mitigate overfitting or backpropagation to train the model by refining parameters in response to the loss function. Moreover, exploring variations in the learning rate within the optimizer, or even experimenting with alternative optimizers, could potentially yield diverse outcomes in utilizing the dataset. These strategies have the potential to enhance the efficacy of RNN in the classification of diabetes symptoms.

REFERENCES

[1] D. N. Purqoti, Baiq Rulli Fatmawati, Zaenal arifin, Ilham, Zuliardi, and Harlina putri Rusiana, “Peningkatan Pengetahuan Penyakit Tidak Menular (PTM) Pada Masyarakat Resiko Tinggi Melalui Pendidikan Kesehatan,” LOSARI: Jurnal Pengabdian Kepada Masyarakat, vol. 4, no. 2, pp. 99–104, Dec. 2022, doi: 10.53860/losari.v4i2.108.

[2] Yash Sahebrao Chaudhari, Yash Sahebrao Chaudhari, Srushti Sunil Bhujbal, Vidya Ashok Walunj, Neha Satish Bhor, and Rutuja Dattatraya Vyavhare, “Diabetes Mellitus: A Review,” International Journal of Advanced Research in Science, Communication and Technology, pp. 16–22, Mar. 2023, doi: 10.48175/IJARSCT-8551.

[3] L. M. Quinn, S. L. Thrower, and P. Narendran, “What is type 1 diabetes?,” Medicine, vol. 50, no. 10, pp. 619–624, Oct.

2022, doi: 10.1016/j.mpmed.2022.07.002.

[4] D. Kerr, F. King, and D. C. Klonoff, “Digital Health Interventions for Diabetes: Everything to Gain and Nothing to Lose,”

Diabetes Spectrum, vol. 32, no. 3, pp. 226–230, Aug. 2019, doi: 10.2337/ds18-0085.

[5] R. Chandra, A. Shukla, S. Tiwari, S. Agarwal, M. Svafrullah, and K. Adiyarta, “Natural language Processing and Ontology based Decision Support System for Diabetic Patients,” in 2022 9th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), IEEE, Oct. 2022, pp. 13–18. doi: 10.23919/EECSI56542.2022.9946601.

[6] S. Raschka, J. Patterson, and C. Nolet, “Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence,” Information, vol. 11, no. 4, p. 193, Apr. 2020, doi:

10.3390/info11040193.

[7] P. Yilmaz, Ş. Akçakaya, Ş. D. Özkaya, and A. Çetin, “Machine Learning Based Music Genre Classification and Recommendation System,” El-Cezeri Journal of Science and Engineering, vol. 9, no. 4, pp. 1560–1571, Dec. 2022, doi:

10.31202/ecjse.1209025.

[8] S. Yoo, “Comparison of Artificial Intelligence and Human Motivation,” Technium Social Sciences Journal, vol. 25, pp.

345–351, Nov. 2021, doi: 10.47577/tssj.v25i1.4736.

[9] Dr. A. Bashar, “Survey on Evolving Deep Learning Neural Network Architectures,” Journal of Artificial Intelligence and Capsule Networks, vol. 2019, no. 2, pp. 73–82, Dec. 2019, doi: 10.36548/jaicn.2019.2.003.

[10] R. Yauri and R. Espino, “Edge device for movement pattern classification using neural network algorithms,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 30, no. 1, p. 229, Apr. 2023, doi:

10.11591/ijeecs.v30.i1.pp229-236.

[11] E. Prisciandaro, G. Sedda, A. Cara, C. Diotti, L. Spaggiari, and L. Bertolaccini, “Artificial Neural Networks in Lung Cancer Research: A Narrative Review,” J Clin Med, vol. 12, no. 3, p. 880, Jan. 2023, doi: 10.3390/jcm12030880.

[12] J. Wang, L. Zhang, Q. Guo, and Z. Yi, “Recurrent Neural Networks With Auxiliary Memory Units,” IEEE Trans Neural Netw Learn Syst, vol. 29, no. 5, pp. 1652–1661, May 2018, doi: 10.1109/TNNLS.2017.2677968.

[13] C. Gambella, B. Ghaddar, and J. Naoum-Sawaya, “Optimization problems for machine learning: A survey,” Eur J Oper Res, vol. 290, no. 3, pp. 807–828, May 2021, doi: 10.1016/j.ejor.2020.08.045.

[14] M. Reyad, A. M. Sarhan, and M. Arafa, “A modified Adam algorithm for deep neural network optimization,” Neural Comput Appl, vol. 35, no. 23, pp. 17095–17112, Aug. 2023, doi: 10.1007/s00521-023-08568-z.

[15] U. M. Butt, S. Letchmunan, M. Ali, F. H. Hassan, A. Baqir, and H. H. R. Sherazi, “Machine Learning Based Diabetes Classification and Prediction for Healthcare Applications,” J Healthc Eng, vol. 2021, pp. 1–17, Sep. 2021, doi:

10.1155/2021/9930985.

(9)

[16] B. Pranto, Sk. M. Mehnaz, E. B. Mahid, I. M. Sadman, A. Rahman, and S. Momen, “Evaluating Machine Learning Methods for Predicting Diabetes among Female Patients in Bangladesh,” Information, vol. 11, no. 8, p. 374, Jul. 2020, doi: 10.3390/info11080374.

[17] P. Chen and C. Pan, “Diabetes classification model based on boosting algorithms,” BMC Bioinformatics, vol. 19, no. 1, Mar. 2018, doi: 10.1186/s12859-018-2090-9.

[18] F. F. Alkhalid, “The effect of optimizers in fingerprint classification model utilizing deep learning,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 20, no. 2, pp. 1098–1102, Nov. 2020, doi:

10.11591/ijeecs.v20.i2.pp1098-1102.

[19] I. Amelia Dewi and M. A. Negara Ekha Salawangi, “High performance of optimizers in deep learning for cloth patterns detection,” IAES International Journal of Artificial Intelligence (IJ-AI), vol. 12, no. 3, p. 1407, Sep. 2023, doi:

10.11591/ijai.v12.i3.pp1407-1418.

[20] D. Varma, A. Nehansh, and P. Swathy, “Data Preprocessing Toolkit : An Approach to Automate Data Preprocessing,”

International Journal of Scientific Research in Engineering and Management, vol. 07, no. 03, Mar. 2023, doi:

10.55041/ijsrem18270.

[21] R. R. Achmad and M. Haris, “Hyperparameter Tuning Deep Learning for Imbalanced Data,” TEPIAN, vol. 4, no. 2, Jun.

2023, doi: 10.51967/tepian.v4i2.2216.

[22] J. Feng, L. T. Yang, B. Ren, D. Zou, M. Dong, and S. Zhang, “Tensor Recurrent Neural Network With Differential Privacy,” IEEE Transactions on Computers, pp. 1–11, 2023, doi: 10.1109/TC.2023.3236868.

[23] W. Wardianto, F. Farikhin, and D. M. Kusumo Nugraheni, “Analisis Sentimen Berbasis Aspek Ulasan Pelanggan Restoran Menggunakan LSTM Dengan Adam Optimizer,” JOINTECS (Journal of Information Technology and Computer Science), vol. 8, no. 2, p. 67, Jul. 2023, doi: 10.31328/jointecs.v8i2.4737.

[24] Q. Tong, G. Liang, and J. Bi, “Calibrating the adaptive learning rate to improve convergence of ADAM,”

Neurocomputing, vol. 481, pp. 333–356, Apr. 2022, doi: 10.1016/j.neucom.2022.01.014.

[25] E. Kee, J. J. Chong, Z. J. Choong, and M. Lau, “A Comparative Analysis of Cross-Validation Techniques for a Smart and Lean Pick-and-Place Solution with Deep Learning,” Electronics (Basel), vol. 12, no. 11, p. 2371, May 2023, doi:

10.3390/electronics12112371.