Lee Kong Chian, Faculty of Engineering and Science, Universiti Tunku Abdul Rahman,. in partial fulfillment of the requirements for a master's degree in mathematics August 2022. Due to advances in health infrastructure, the death rate from heart disease is falling in developing countries. According to the chi-square test used in this investigation, age, serum creatinine, and serum sodium were significant.
This logistic model is constructed from risk factor ejection fraction, serum creatinine, serum sodium, age and time to predict mortality in patients with heart failure. I sincerely want to thank and appreciate Dr. Goh Yann Linga and Dr. Tan Wei Luna, who served as my supervisors, for their guidance, patience, support and advice while I completed my project. Finally, I would like to thank my family for supporting me throughout this project and for their patience with me.
This project titled “FACTORS ASSOCIATED WITH LIFESTYLE AND DIET IN HEART FAILURE MORTALITY” was prepared by LAU CHENG CHENG and submitted in partial fulfillment of the requirements for Master of Mathematics degree at Universiti Tunku Abdul Rahman. It is hereby acknowledged that LAU CHENG CHENG (ID No: 19UEM01834) has completed this project titled "FACTORS ASSOCIATED WITH LIFESTYLE AND DIET IN HEART FAILURE MORTALITY" under the supervision of Dr. Goh Yann Ling (supervisor) of Department of Mathematics and Actuarial Sciences, Faculty of Engineering and Science, and Dr. Tan Wei Lun (Co-Promoter) from the Department of Mathematical and Actuarial Sciences, Faculty of Engineering and Science.
Background
Heart failure occurs when the heart cannot function effectively as a pump to support the flow of blood through the body. Coughing, wheezing, fatigue, worsening shortness of breath, swollen legs/stomach and difficulty performing an active physical task are symptoms of heart failure. Pillai and Ganapathi (2013) conclude that the leading cause of disease burden in South Asia is heart failure and is expected to increase.
Diet and lifestyle management can stop the contribution of heart failure to the economic burden.
Problem statement
Diet and lifestyle management can stop the contribution of heart failure to the economic burden. application of machine learning classifiers and presented ranking to predict survival of patients with heart failure.
Objectives
Heart failure mortality studies
2020) state that heart failure is the predominant cause of death among older people, as the charges affected by heart failure are estimated to be 1 percent for age above 50. Therefore, heart failure patients are categorized into groups of < 55 years old and age >= 55 years old . Obesity was identified by Savji et al. 2018) as a significant risk factor for heart failure with preserved ejection fraction (HFpEF).
According to Rosano, Vitale and Seferovic (2017), patients with diabetes mellitus have an extremely high rate of acute and chronic heart failure, with 25 percent of patients experiencing chronic heart failure. According to Benjamin et al. 2018), there is a 1.6 times greater risk of developing heart failure for an individual with systolic blood pressure (SBP) >. 2002) stated that hypertension contributed 39 percent for men and 59 percent for women to the development of heart failure. Low hemoglobin continued to be a significant, independent predictor of death or death due to heart failure, according to Anand et al. 2004) investigation, which took into account a number of other factors.
According to Diana Rodriguez (2009), there is a direct correlation between anemia and heart disease, with more than 48% of people diagnosed with heart failure also having anemia. The percentage of blood that leaves the heart with each beat is known as the ejection fraction, according to Healthwise Staff (2021).
Related Work
The accuracy of the three features—ejection fraction, serum creatinine, and time—selected for use in the logistic regression models was 83.3 percent, and 83.8 percent for all features. The results show that serum creatinine and ejection fraction are sufficient to build a model that can predict a patient's prognosis for heart failure. 2020) used decision trees and multilayer perceptron neural networks as their (MLP) methodologies. Before fitting outliers into the selected models, the authors first removed outliers from the data set using the interquartile range.
The accuracy of the decision tree is 86.57 percent, and the best model compared to other studies, MLP, produces an accuracy of 88 percent. The accuracy of 87.78 percent obtained from the generalized linear models and support vectors was the highest. The result shows that each of the five machine learning methods is predictive with a respectable level of accuracy, but the Bayer network has the best accuracy with a rate of 79.28 percent.
Z-score with SMOTE is more accurate in predicting heart failure compared to z-score and min-max accuracy without SMOTE. 2021) wants to improve the technique of predicting the survival of heart failure patients using the same data set.
Logistic Regression model
The dataset was screened using Naive Bayes Tree, Naive Bayes Classification, Bayes Network, Classification Regression and LiBLinear Das et al. They outperformed supervised learning with an accuracy rate of 62.24 percent for K-Means and 52.45 percent for Fuzzy C-Means, respectively. Sakinc and Ugurlu (2013) stated that logistic regression could explain and test hypotheses about binary, discrete or continuous variables.
Data Description
Statistical characteristics of the numerical data, including lowest, maximum, mean, standard deviation, and missing values, are reported in Table 3.2(a). Statistical details of the binary attributes, including label, count, proportion, and missing values, are given in Table 3.2(b). In the binary and numeric properties of the heart failure dataset, there are no missing values to be detected.
Age, anemia, creatinine phosphokinase (CPK), diabetes, ejection fraction (EF), blood pressure (BP), platelets, serum creatinine, serum sodium, gender, and smoking were lifestyle-related risk factors. and diet that were recorded. They were viewed as potential independent variables that could be used to account for heart failure-related death. Age, CPK, EF, platelets, serum creatinine, serum sodium, and time are all measurable data; Anemia, blood pressure, diabetes, gender and smoking were considered qualitative data.
The normal value of the platelet count is and any value outside this range is considered abnormal. The heart failure patients were categorized into groups age < 55 years and age ≥ 55 years.
Analysis
The time variable was excluded from the data set used for forecasting, as this information would not be available at the time of forecasting. Using this standard, we can group the data of serum creatinine into two categories normal and non-normal. Then the dataset will also be analyzed by ANOVA test for continuous characteristics (age, creatinine phosphokinase, ejection fraction, platelets, serum creatinine, serum sodium, time).
Pearson correlation coefficients will be calculated to check for collinearity between the univariate prognostic indicators. The dataset is further analyzed by constructing linear and logistic regression models to explore relationships between variables. The logistic regression with DEATH_EVENT as the model outcome was generated and fit to the combined data.
Due to the data distribution for the target class, DEATH_EVENT, being unbalanced, we need to fix the data imbalance problem. Variance inflation factors (VIF) are calculated for both linear and logistic if the model has more than one independent variable. Hair et al., 2010) indicated that the multicollinearity problem does not exist when the variance inflation factors are less than five and the model functions correctly.
Data Analysis
The presence of anemia will result in a higher mortality rate for patients with heart failure than for patients who survive with anemia, according to the percentage stacked bar graph in Figure 4.2(a). The proportion of men and women who die from heart failure is the same, but there were more deaths in smoking patients than in non-smoking patients. The heart failure data set includes independent variables that can predict the dependent variable, according to the results of the chi-square test and the ANOVA test, so a model can be built using these data.
The survival rate of a patient with heart failure is maximized by using models that identify the variables most critical to managing the condition. In addition, the models enable early detection of possible risk factors for heart failure. It shows that ejection fraction, serum creatinine, serum sodium, age and time are the most important and necessary factors to predict the mortality of patients with heart failure compared to the use of all characteristics.
Using the same data set, it would be interesting to perform an additional analysis to compare the survival rates of people with heart failure in different age groups. Machine learning can predict survival in heart failure patients based on serum creatinine and ejection fraction alone. Building an intelligent system for remote monitoring of heart failure: using the Internet of Things, big data and machine learning.
Anemia is common in heart failure and associated with poor outcomes: Insights from a cohort of 12,065 patients with new-onset heart failure. Projecting the impact of heart failure in the United States a policy statement from the American Heart Association. Serum sodium profile of patients with heart failure and its impact on their outcome at discharge.