• Tidak ada hasil yang ditemukan

(VC) ensemble techniques to predict mortality more accurately using data from the laboratory experiments. This study proposes feature ranking based approaches, the modification of FVC techniques and CFC as an ensemble strategy to advance the current state of the art. It has been shown from the findings of detailed studies that the suggested methodology outperforms other models using lab test data for predicting mortality. The research also illustrates how the CFC can lead to improving the model’s performance. In addition, it demonstrates the effect of the Vertical(V)-Horizontal(H) FVC techniques along with Vacuum Count (VC) for better efficiency.

This study also compares the performance of several standard classifier algorithms in this context, such as, the J48 classifier, the NaiveBayes (NB) classifier, the RandomForest (RF) classifier and the Support Vector Machine (SVM) classifier. Finally, these proposed models can be applied to other fields of clinical decision support systems where data exhibits similar characteristics.

peroxiredoxins have been reported to play a crucial role in lung physiology [54] and identified as a vitamin D mediated pathways in the development of lung growth [55]. Interestingly, HIV Tat protein, known to alter cellular redox environment, regulates SRXN1 gene expression [56].

Another protein coding gene,HMGCS1, which was nearly as significant for maternal HIV (p-value=

1.29E−05, p-adjusted-value= 0.069275) also revealed similar connections when investigated. In particular, HMGCS1, known as 3-hydroxy-3-methylglutaryl-CoA synthase 1, also involves with some diseases, namely, experimental autoimmune encephalomyelitis, oral submucous fibrosis, and infection by HIV-1 [215]. These information hints that some DEGs may have some impacts on the immune system and may be sensitive for maternal HIV.

To analyze the DEGs further, fGSEA identified pathways, GO biological processes and KEGG pathways, have been investigated and interpreted to link them with lung function at age of 6 weeks and 2 years of the infants. From these pathways, it has been found that some genes are majorly contributing in ribosomes. Beyond the cellular functions, e.g., repairing damage, directing chemical processes, they have been reported to be participants in the innate immune response [216].

Furthermore, analyses on these reveals the involvements of some of these pathways on immune function or inflammatory diseases.

To interpret the hub genes, another attempt has been made, which resulted in some interesting meaningful information in this problem context as follows. Genetic variation in one of the hub genes, namely, ADIPOR1, which is an inflammatory and immune response gene [57], has been previously associated with lung function [58]. The hub genes of the over-represented two mod- ules/clusters (i.e., M1 & M14) enriched some immune system oriented REACTOME pathways including Immune System, Adaptive Immune System, Innate Immune System, and Cytokine Sig- nalling in Immune System. These immune systems play vital roles to avoid infections when the body is exposed to millions of potential pathogens daily. These findings, together with the DGE analysis, clearly demonstrate that in response to maternal HIV exposure, immune development is clearly altered.

To aid in disease diagnostics and predictions in general, this research applied some feature ranking algorithms to rank the features in each dataset (from the classification point of view) and identified important features for each dataset. For example, for Diabetes dataset, all ranking

algorithms have reported ‘plasma glucose concentration a 2 hours in an oral glucose tolerance test’

as the most important feature that makes sense from a medical point of view. On the other hand,

‘diastolic blood pressure (mm Hg)’ and ‘triceps skin fold thickness (mm)’ have been ranked as less important features in the context of diabetes prediction. These are also logical compared to above from a medical point of view. As another example, for the Heart Disease dataset, all rankers have ranked ‘number of major vessels colored by fluoroscopy’ as the most important feature that is indeed a very important feature from clinical point of view.

To assist CDS or CP, this thesis presented a feature ranking based ensemble classifier for survival prediction of the ICU patients. Here also some feature ranking algorithms have been applied to find out the important features in prediction tasks for each dataset. For example, for Adult group in MIMIC-II and MIMIC-III datasets, Protein S (Functional), Protein C (Functional), Reticulocyte Count (manual or absolute), Quantitative G6PD blood tests, have been identified as low ranked features. Further investigation on these has identified that in around 99% cases, the values of Quantitative G6PD, Protein C (Functional), and Protein S (Functional), are missing for Adult group in MIMIC-II datasets and in fact 100% values of Reticulocyte Count blood test are missing for Adult group in MIMIC-III dataset. Similarly, for Senior group in MIMIC-II datasets, some other features as well (e.g., HLA-DR (Human Leukocyte Antigen – DR isotype), Heinz Body Prep, Howell-Jolly Bodies, and H/O Smear blood tests, have been found as low ranked features.

The values of these features also missing in around 99% cases. The feature ranking and selection stage, thus, served as a filter that omitted these ‘damaging’ features, thus enhancing the efficiency of the classifier.

This thesis has made an effort to interpret the results produced by the experiments handled by this thesis from the biological and/or medical point of view. The above brief discourse demonstrates an attempt in this context. This kinds of interpretations on the results gives an light in this direction for the future research works. Although there are some limitations of this study in this problem context, the discussion on the feature identification and feature importance in the light of the biological and/or medical/clinical point of view, illustrates the insightful information for the future research works.