Future Research Directions and Ongoing Works

years for many patients to predict exacerbation early (in a time window, e.g., 1 day/week/month before) of a COPD patient.

Recently Acute Exacerbation and Respiratory Infections in COPD (AERIS) study [213] pub- lished observational facts in a time window for the COPD patients with their exacerbations. Uti- lizing these features to predict the exacerbation of the COPD patients early (in a time window) can help the patients to get proper treatment early and can get rid of the risk of exacerbation. In the rest of this section we briefly present some of the preliminary advances of the early exacerbation prediction task that has been carried out as an extension of the tasks carried out in this thesis.

Further details can be found in Appendix F - Section F.3.2.

AERIS [213] has followed 120 COPD patients around 2 years and during this time around 360 exacerbation events were detected. The datasets available in the AERIS are unstructured.

Therefore, a methodology has been developed (please refer to Appendix F - Section F.3.2 for the details) to construct samples in a time window, i.e., in a one month window. Briefly, the data processing steps are: (1) data selection, (2) data ordering, (3) data labeling, (4) data organization, (5) data filtering, and (6) missing data imputation (please refer to Appendix F - Section F.3.2 for the details). A total of 1947 samples have been generated at the end of the data processing steps.

Training and testing samples are separated randomly at a ratio of 2:1 respectively.

Several state of the art ML algorithms (e.g., Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), MultilayerPerceptron (MLP), Naive Bayes (NB), AdaBoostM1 etc.) have been trained (please refer to Appendix F - Section F.3.3 for the details). Additionally, a deep learning pipeline leveraging the Long Short-Term Memory (LSTM) network has been developed (please refer to Appendix F - Section F.3.3 for the details). All these models have been trained on the training samples and 10-fold cross validation has been performed to measure the models’ training performances. Finally, the model is run on the testing samples to measure the final performances thereof.

Here we very briefly highlight some of the preliminary results (Figure 7.1). Figure 7.1 (A) shows a comparsion of the models in terms of the accuracy in percentage (%). Evidently, AdaBoostM1, BayesNet, DecisionTable, RandomSubsapce, SVM, and ZeroR algorithms predict exacerbations more accurately (i.e., around 78.07%) one month before compared to other algorithms. Addi-

tionally, Figure 7.1 (B) reports the performances of the models in terms of other metrics. The analysis on these results reveal that RandomForest algorithm performs better than that of other alogirithms. The area under ROC (AUROC) is 0.612 and the f-score is 0.705 for the RF based models (please refer to Appendix F - Section F.4 for the details).

Figure 7.1 (C, D) reports the performance of the LSTM technique on the training and the testing samples. Figure 7.1 (C)((D)) presents the accuracy (entropy losses) of our LSTM based pipeline with respect to the number of epochs. It is clearly seen that the best performance has been reached at around 20 epochs, after which the training accuracy increases but the testing accuracy decreases.

Exacerbation is an worsen condition of the COPD patients. Early (i.e., 1 month before) detec- tion of exacerbation can help the patients to take proper and efficient steps early to get rid of the worsen condition of the disease, and can decrease the number of hospitalization which eventually reduces the cost associated therewith. This research has analyzed the data rigorously and proposed a data processing approach to prepare a time series like data on which the predictive models has been developed.

The preliminary experiments reported above have clearly indicated that the Random Forest based model and the LSTM pipeline are the main competitors to predict exacerbation one month before. In particular, the Random Forest based model can predict the exacerbation early with the better performances (i.e., accuracy is 78.07%, f-score is 0.705 and the area under the ROC is 0.62) comparing to other models. Similar results have been found for the LSTM pipeline too. In this case of the LSTM pipeline, it has been observed that after running 20 epochs, the training models over fits on the training datasets. Therefore, it is suggested to use 20 epochs to make this model more generalized for predicting exacerbation early for a new sample. It is anticipated that further refinement of the LSTM pipeline and/or further hyper parameter tuning will improve the results further. This preliminary work is expected to follow up by further research works including, but not limited to the following components: (1) extensive feature engineering; (3) exploratory discussion with the subject matter experts on some feature values; (4) various kinds of feature imputation techniques;

7.5.2 Possible improvements for the survival prediction of the ICU patients

We have conducted our experiments on 655 lab tests from MIMIC-II datasets and 570 lab tests from MIMIC-III datasets. In future, we plan to incorporate background knowledge on these tests into our approach. In addition, missing value replacement will be examined and assessed using certain other approaches. Moreover, the issue of longitudinal features (e.g., the features involved over an extended period of time on the patients) in the data from the laboratory experiments will be addressed. In real world scenario, the laboratory test results would be paired with vital signs to more accurately predict the mortality/survivability. In the current model, some subsets of features from the ranked features are considered; more fine tuning in the processes of feature selection may lead us to even better performance.

Dalam dokumen Department of Computer Science and Engineering in partial fulfilment of the requirements for the degree of (Halaman 188-191)