Developing A Hybrid Linear Model with A Multilayer Feed-Forward Neural Network for HbA1c Modeling Among Diabetes Patients

(1)

Developing A Hybrid Linear Model with A Multilayer Feed- Forward Neural Network for HbA1c Modeling Among Diabetes

Patients

Wan Muhamad Amir W Ahmad^1*, Mohamad Nasarudin Adnan^1*,

Mohamad Shafiq Mohd Ibrahim², Norsamsu Arni Samsudin^1*, Nor Farid Mohd Noor³, Nor Azlida Aleng⁴, Farah Muna Mohamad Ghazali^1*, Siti Nazihahasma Hassan⁵

1 School of Dental Sciences, Universiti Sains Malaysia (USM), Kota Bharu, Kelantan, Malaysia.

2 Kulliyyah of Dentistry, International Islamic University Malaysia, IIUM Kuantan Campus, Kuantan, Pahang, Malaysia

3 Faculty of Medicine, Universiti Sultan Zainal Abidin (UniSZA), Medical Campus, Kuala Terengganu, Terengganu, Malaysia

4 Faculty of Ocean Engineering Technology and Informatics, Universiti Malaysia Terengganu (UMT), Terengganu, Malaysia

5 Department of Neurosciences, School of Medical Sciences, Universiti Sains Malaysia (USM), Health Campus, Kota Bharu, Kelantan, Malaysia

*Corresponding Author: [email protected] ; [email protected] ; [email protected] ; [email protected]

Accepted: 15 April 2023 | Published: 30 April 2023

DOI:https://doi.org/10.55057/ajfas.2023.4.1.5

_____________________________________________________________________________________________

Abstract: Hemoglobin A1c (HbA1c) is the gold-standard measure for diagnosing and managing diabetes. Given the importance of data-driven decisions, this paper aimed to develop a method for elucidating and predicting HbA1c levels. We developed a comprehensive method for analyzing the multiple linear regression through the R syntax, embedding the multilayer feedforward neural networks (MLFFNN) and bootstrapping. The success of the proposed method was determined by the accuracy of the prediction. The quality of the obtained model was represented by the size of the obtained minimum mean square error (MSE). This study used secondary diabetes data with a total of 1000 observations to illustrate the development method (data obtained after the bootstrapping procedure). The clinical relevance and significance of each preselected variable were evaluated before further testing. The proposed variables were assessed using the MLFFNN methodology, such as the HbA1c, fasting blood sugar (FBS), urea, and blood sodium levels. It was found that FBS, urea, and blood sodium levels can all be used to verify HbA1c. FBS ( = 0.45931;

Std SE= 0.01018; p< 0.01), urea ( =-0.03777; SE= 0.00266; p < 0.01), and blood sodium levels ( =-0.06685; SE= 0.01112; p < 0.01) all had a significant impact on HbA1c. Our strategy provides an accurate prediction possible. The methodology precisely assesses the validity of the final model.

Superior model performance leads to more efficient management in decision-making.

Keywords: HbA1c, Linear Model, Multilayer Feed-Forward Neural Network

______________________________________________________________________________

1. Introduction

(2)

The increasing prevalence of diabetes mellitus (DM) is a major global health concern and burden (Sun et al., 2022). It is estimated that approximately more than 422 million adults globally suffered from DM in 2014 (Lovic et al., 2020). DM is a metabolic disorder that causes hyperglycemia due to impaired insulin action or secretion, or both. A glycated hemoglobin (HbA1c) is formed when glucose binds to the N-terminal valine of the β-chain of a hemoglobin (Hb) molecule. Analysis of HbA1c in blood reveals a blood glucose level during the past two to three months (Lau and Aw, 2020). The HbA1c is a well-known parameter for monitoring type 2 DM. In 1969, Rahbar et al., discovered higher HbA1c levels in diabetic patients, while Koenig et al., in 1976, was the first to propose using HbA1c as a biomarker for monitoring glucose levels in diabetic patients. As per Bae and colleagues (2011), an HbA1c cut-off point of 5.7% is a fit value for predicting future diabetes, with an optimal sensitivity of 62% and specificity of 85%.

Herein, three statistical methods, such as bootstrapping, linear regression, and multilayer feedforward neural networks (MLFFNN) were used to create a technique to clarify and forecast the HbA1c level. The approaches were unified in the R syntax with a few adjustments and additions. Bootstrapping makes use of a sample as a population. This is accomplished by taking a sample (via replacement) and generating a bigger sample ‘consisting of case resampling,’ also known as bootstrap samples. The bootstrap approach starts with a random sample drawn from the population under investigation (Kim and Im, 2019). Furthermore, one such supervised learning- based machine learning approach is linear regression. The program runs a regression analysis.

Using independent variables, a desired prediction value may be modelled through regression. It is mostly used to figure out how variables and forecasting relate to one another. While the goal of MLFFNN is to assess independent factors in relation to dependent variables from the standpoint of linear regression. In MLFFNN, there is a link between perception and computation that allows for a one-way flow of data from the input to the output (Fashoto et al., 2011, Jusoff et al., 2022).

In data analysis, a robust technique is crucial because it helps researchers make informed decisions.

It is essential to emphasize the underlying problem of computation precision and accuracy to align theory and practical programming.

2. Material and Methods

2.1 Multilayer Perceptron (MLP) Neural Network

MLP procedure was applied, which is the most widely used artificial neural network is the MLP.

MLP is generally grouped into three main layers that are the input layer, the hidden layer, and an output layer [1,2]. In the research study, the output node of this analysis is fixed at one since there is only one dependent variable. Equation (1) gives the MLP with N input nodes, H hidden nodes, and one output node. The values of are given as follows:

ˆ (1)











 +

=



= H

1 j

0 j j

i w h w

g Y

where wj an output weight from hidden node j to the output node, the bias for the output node, and g is an activation function. The values of the hidden node hj, j =1…H are given by:

(3)

₍₂₎











 +

=



= H

1 j

j0 i ji i

j g v x v

h

where the output weight from input node i to hidden node j, is the bias for hidden node j where j =1, …, H and xi are the independent variables where i =1, …, N and k is an activation function [1,2]. The general architecture of the MLP model is illustrated in Figure 1.

Figure 1: The general architecture of the MLP with one hidden layer, N input nodes, H hidden nodes, and one output node

The selected variable from the MLP procedure will be the input for the multiple linear regression.

Multiple linear regression extends simple linear regression to include more than one explanatory variable. The proposed model is given as follows:

HbA1c = ₀+ ₁FBS + ₂Urea + ₃Sod +  (1) Where:

1 0,

 , ₂ and₃ regression coefficients FBS is referring to fasting blood sugar level Urea is referring to urea level

Sod is referring to the sodium level

is a random error, ^ε^~^N

( )

^0,^σ²

(4)

2.2 Data and the R Syntax

We employed data collection from individuals with type 2 DM who attended the outpatient clinic at Hospital Universiti Sains Malaysia (n=35). Table 1.1 provides a summary of the study's chosen variable's data description.

Table 1: Data description of the selected variable in the study

Num. Code –Variables Explanation of user variables

1. Y- HbA1c Hemoglobin A1c level readings

2 X1-Fbs FBS level readings

3. X2- Urea Urea level readings

4. X3-Sod Blood sodium level readings

R Syntax for the MLFF Neural Network Methodology and Multiple Linear Regression

#/Dataset for Biometry: Modelling Study # Input =("

FBS HbA1c Urea Sod 5.300 7.20 5.70 142.00 11.700 9.70 2.90 133.00 7.400 7.50 5.70 142.00 4.600 6.20 3.90 139.00

8.600 6.90 12.10 139.00 17.300 12.50 4.60 136.00 3.800 5.80 6.60 139.00

")

data = read.table(textConnection(Input),header=TRUE)

#/Performing Bootstrap for 1000 : Case Resampling Procedure /#

mydata <- rbind.data.frame(data, stringsAsFactors = FALSE) iboot <- sample(1:nrow(mydata),size=1000, replace = TRUE) bootdata <- mydata[iboot,]

#/Install the Neuralnet Package/#

if(!require(neuralnet)){install.packages("neuralnet")}

library("neuralnet")

#/Checking for the Missing Values/#

apply(bootdata, 2, function(x) sum(is.na(x)))

#/Scaling the Data for Normalization/#

#/Method (Usually Called Feature Scaling) to Get All the Scaled Data/#

#/In the Range [0,1]/#

max_data <- apply(bootdata, 2, max) min_data <- apply(bootdata, 2, min)

data_scaled <- scale(bootdata, center = min_data, scale = max_data - min_data)

(5)

#/Randomly Split the Data into 70:30/#

#/70 Percent of the Data at Our Disposal to Train the Network/#

#/30 Percent to Test the Network/#

index = sample(1:nrow(bootdata),round(0.70*nrow(bootdata))) train_data <- as.data.frame(data_scaled[index,])

test_data <- as.data.frame(data_scaled[-index,])

#////////////////////////////////////////////////////////////////////////////////////////////////////#

#/Build the Network/#

#/There are 3 Hidden Layers Have 3 and 2 Neurons Respectfully/#

#Input = 3/#

#Output = 1/#

n = names(bootdata)

f = as.formula(paste("HbA1c ~", paste(n[!n %in% "HbA1c"], collapse = " + "))) nn = neuralnet(f,data=train_data,hidden=c(2,2),linear.output=T)

plot(nn)

options(warn=-1)

#/30 Percent of the Available Data to do this:

# Using Only the First 2 Columns Representing the Input Variables

# of The Network and 1 is The Output for NN/

predicted <- compute(nn,test_data[,1:3])

#/Use the Mean Squared Error NN (MSE-forecasts the network) as a Measure of How Far

#Away Our Predictions Are From The Real Data/

MSE.net <- sum((test_data$HbA1c-predicted$net.result)^2)/nrow(test_data) MSE.net

#///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Model <- lm(HbA1c~FBS+Urea+Sod,data=bootdata) # build the model summary(Model)

data$PredictedHbA1c <- predict(Model,data) distPred <- predict(Model, data)

preds <- predict(Model, data)

modelEval <- cbind(data$HbA1c, preds) colnames(modelEval) <- c('Actual','Predicted') modelEval <- as.data.frame(modelEval) print (modelEval)

test <- data[-index,]

predict_lm <- predict(Model,test)

MSE.lm <- sum((predict_lm - test$HbA1c)^2)/nrow(test) MSE.lm

#/Printing the Value of MSE for Linear Model and Neural Network/

print(paste(MSE.lm,MSE.net))

(6)

# Finished

#///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

In this model, three independently chosen variables are denoted as X1 (FBS), X2 (Urea), and X3 (Sod). All factors were evaluated with MLFFNN, and the most significant variable was applied for regression modelling. The data set was divided into 70% training and 30% testing settings. The two-hidden-layer MLP was determined to be the most appropriate model for the investigated situation.

3. Results and Discussion

Predictive modeling for forecasting the development of diabetes is essential for public health professionals. Such an approach could help in achieving the best possible level of predictive accuracy and facilitate the identification of individuals at risk of developing diabetes.

Table 2: Coefficients Result of Multiple Regression Model

Unstandardized Coefficients

t Sig.

B Std. Error

(Constant) -4.73834 1.56937 -3.019 0.0026 **

FBS 0.45931 0.01018 45.132 < 2e-16 ***

Urea -0.03777 0.00266 -14.198 < 2e-16 ***

Sodium 0.06685 0.01112 6.014 2.55e-09 ***

Dependent Variable: HbA1c

R² : 0.6961, [F(df) = 760.3(3, 996); p < 0.05]

Multiple linear regression (MLR). Model Assumption is met.

Sig. codes: 0 ‘***’ 0.001 ‘**’

Figure 2: The architecture of the MLFFNN with three input nodes, two hidden layers, and one output node

(7)

The MLFFNN architecture is shown in Figure 2. It has three input nodes, two hidden layers with two neurons each, and one output node. In this section, the variable selection was determined using the MLFFNN methodology that had been developed. Three factors which are FBS ( = 0.45931;

Std SE= 0.01018; p < 0.01), urea ( =-0.03777; SE= 0.00266; p < 0.01), and blood sodium ( =- 0.06685; SE= 0.01112; p < 0.01) levels all had a significant influence on HbA1c.

Herein, MLFF neural network performance with MLR performances was studied. In this model, the MLFFNN can be considered as a validation of the selected factors. The smallest value indicates that the factor contributes significantly to the study. The combination of selected variables that produces the smallest MSE will be considered the best model for regression modelling. In this study, we proposed the MLFFNN, which consists of three inputs, one hidden layer and one single output is being proposed in this method. These three variables produce a small MSE, indicating that the three independent variables studied contribute significantly to HbA1c. The output node is set to one, the HbA1c level (a dependent variable). The split for the train to the test is 70:30; 70%

of the data was used to train the network, while the remaining was used to test the network (Mohamed, et al., 2012). The testing/out-of-sample MSE was used to measure how well MLFFNN worked. Mean Squared Error (MSE) shows how far off our predictions are from the actual data.

Table 2 summarizes the MLR results. The models are illustrated below. Therefore, the proposed linear model is given by

HbA1c =- 4.73834 + 0.45931 (FBS) -0.03777 (Urea)+ 0.06685 (Sodium) (1) Multiple linear models of the HbA1c level can be found in Equation 1. The findings showed that there is a strong relationship between FBS, urea, and sodium levels on HbA1c.

HbA1c is an important indicator of long-term glycemic control, which not only provides a reliable measure of chronic hyperglycemia but also correlates with the risk of long-term disease complications (Sherwani et al., 2016). In one study, HbA1c levels and sodium intake were both linked to the development of cardiovascular disease in type 2 diabetes patients (Horikawa et al., 2014). It was also found that patients with type 2 DM had significantly higher urea levels, with HbA1c and serum urea levels positively correlated (Abdelsalam et al., 2011).

The primary objective of the work was to build, test, and evaluate a decision tree, bootstrap, and ordinal regression combination for generating and using medical statistic strategies. The bootstrap approach generates a massive dataset. In this probe, the decision tree approach provides a precise evaluation of the variables that must be carefully selected for the final model. Alternatively, researchers might use decision trees to determine which data points are most important when developing an ordinal regression model and to test a medical hypothesis. The usefulness of this approach to health care planning is demonstrated by the decision tree inference of which variables to use in making predictions. This strategy shows an accuracy rate of 98.5%, with a positive predictive value of 98.7 % for Class 1, 96.41% for Class 2, and 100.0% for Class 3.

The method can be used as an alternative to ordering regression modelling in situations where the selection of acceptable variables is based on computational studies that predict the importance of independent variables that should be selected for the final model. This technique simplifies the most difficult aspect of any research project, which is choosing the right input parameters. To

(8)

assess the effectiveness of the established approach, the predictive model is applied to real data and its output is compared to the real data.

Throughout the previous decade, researchers have studied the risk factors for diabetes using a variety of statistical methods, including logistic regression, correlation, and decision trees.

Combining a decision tree with ordered regression analysis, it may improve the clinical application of risk variables. Using decision tree analysis, more conclusive, detailed, and reliable results can be obtained. Future researchers will be able to replicate the processes because the syntax offered in R is meant to be easily understandable. In addition, this method can assist policymakers and health professionals in establishing a new program by improving existing preventive measures.

Using bootstrap, decision trees, and ordered logistic regression, the developed predictive models can create accurate and robust diagnostic parameters.

4. Conclusion

The model reveals that FBS, urea, and sodium levels significantly influence HbA1c. It may be useful in decision-making in diabetic management.

References

Abdelsalam, K. E. A., & AE, M. E. (2011). Correlation between urea level and HbA1c level in type 2 diabetic patients. Sudan Medical Laboratory Journal, 1(2), 1-5.

Bae, J. C., Rhee, E. J., Lee, W. Y., Park, S. E., Park, C. Y., Oh, K. W., . . . Kim, S. W. (2011).

Optimal range of HbA1c for the prediction of future diabetes: a 4-year longitudinal study.

Diabetes research and Clinical practice, 93(2), 255-259.

Fashoto, S. G., Adeyeye, M., Owolabi, O., & Odim, M. (2015). Modelling of the Feed Forward Neural Network with its Application in Medical Diagnosis. International Journal of Advances in Engineering Technology, 8(4), 507.

Horikawa, C., Yoshimura, Y., Kamada, C., Tanaka, S., Tanaka, S., Hanyu, O., . . . Ohashi, Y.

(2014). Dietary sodium intake and incidence of diabetes complications in Japanese patients with type 2 diabetes: analysis of the Japan Diabetes Complications Study (JDCS).

The Journal of Clinical Endocrinology & Metabolism, 99(10), 3635-3643.

Jusoff, M. K. S., Ahmad, W. M. A. W., Noor, N. F. M., AzlidaAleng, N., FatihaGhazalli, N., Ibrahim, M. S. M., . . . Halim, N. A. (2022). Combination of Methodology Building for Multi-Layer FFED Forward Neural Network (MLFF) and Linear Modelling (LM): A Case Study by Biometry Modelling. J Journal of Algebraic Statistics, 13(1), 302-309.

Kim, Y. M., & Im, J. (2019). Frequency domain bootstrap for ratio statistics under long-range dependence. Journal of the Korean Statistical Society, 48(4), 547-560.

Koenig, R. J., Peterson, C. M., Jones, R. L., Saudek, C., Lehrman, M., & Cerami, A. (1976).

Correlation of glucose regulation and hemoglobin AIc in diabetes mellitus. New England Journal of Medicine, 295(8), 417-420.

Lau, C., & Aw, T. (2020). HbA1c in the diagnosis and management of diabetes mellitus: an update.

Diabetes, 6, 1-4.

Lovic, D., Piperidou, A., Zografou, I., Grassos, H., Pittaras, A., & Manolis, A. (2020). The growing epidemic of diabetes mellitus. J Current vascular pharmacology, 18(2), 104-109.

(9)

Rahbar, S., Blumenfeld, O., & Ranney, H. M. (1969). Studies of an unusual hemoglobin in patients with diabetes mellitus. Biochemical and Biophysical Research Communications, 36(5), 838-843.

Sherwani, S. I., Khan, H. A., Ekhzaimy, A., Masood, A., & Sakharkar, M. K. (2016). Significance of HbA1c test in diagnosis and prognosis of diabetic patients. Biomarker insights, 11, BMI. S38440.

Sun, H., Saeedi, P., Karuranga, S., Pinkepank, M., Ogurtsova, K., Duncan, B. B., . . . Mbanya, J.

C. (2022). IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes research and Clinical practice, 183, 109119.