Building Ensemble of CNN - Pre-processing

Chest X-Ray Pre-processing

2.2 Building Ensemble of CNN

In constructing the proposed Ensemble model, we consider three powerful pre-

trained CNN models, “Vgg16, ResNet50, and Inception v3,” initially trained on

the ImageNet [5], dataset. These models were trained to identify about 1000

classes of objects, hence adopting it to CXR dataset; their ﬁnal layer (classiﬁer)

is substituted with a binary classiﬁer. Each of the CNN models is trained inde-

pendently using diﬀerent hyper-parameters that perform best in terms of the

optimizer, learning rate, and activation function. The models were trained each

for 80 iterations (epochs) with a batch size of 32 to obtain the probability of

each CXR samples belonging to either a positive or negative class representing

TB and non-TB classes. The output from the individual model is then combined

to obtain the ﬁnal prediction through the Ensemble classiﬁer. The performance

554 M. Oloko-Oba and S. Viriri

of the proposed model is measured in terms of the popular evaluation metrics deﬁned as follows:

Accuracy: is the rate of correctly predicted samples from the overall samples examined and is deﬁnes as:

Accuracy = T P + T N

T P + F P + T N + F N (1)

Sensitivity: refers to the measure of conﬁrmed positive samples that are cor- rectly identiﬁed as positives.

Sensitivity = T P

T P + F N (2)

Specificity: is the rate of confirmed negative samples that are correctly iden- tified as negatives.

Specif icity = T N

T N + F P (3)

where:

TP = True Positive, FP = False Positive, TN = True Negative, and FN = False Negative respectively.

3 Results

The proposed model was trained on the Shenzhen dataset and validated on the Montgomery dataset [13]. These datasets contain TB speciﬁc CXR images provided by the National Library of Medicine (NLM) and publicly made available for research purposes. The Shenzhen dataset is made up of 336 infected CXR samples and 326 healthy CXR samples. All the samples are 3000 × 3000 pixels in resolution. Meanwhile, the Montgomery dataset contains 58 infected CXR samples and 80 healthy samples having a size of either 4892 × 4020 pixels or 4020 × 4892 pixels. Both datasets are accompanied by clinical readings that provide demographic details about each of the CXR samples with respect to diagnosis, sex, and age of the patients. The diagnostic information given for each CXR sample serves as a standard ground truth to validate the proposed model’s output. These datasets are accessible from the NIH website

. Figure 2 shows sample CXR from both datasets.

Several experiments were performed aimed at minimizing false negative and positives as much as possible. In this regard, Vgg16, ResNet50, and Inception v3 trained and evaluated for detection of TB abnormalities. Different outcomes are expected since different models exhibit diverse characteristics in terms of depth, parameters, and overall configurations. In the first experiment where the above models are individually employed to classify CXR, the best performance was obtained with the Inception v3 model at 93.50% followed by a 92.10% per- formance with the Vgg16 and the lowest accuracy of 89.51% with the ResNet50

1 https://lhncbc.nlm.nih.gov/publication/pub9931.

85 Ensemble of CNN for TB Classiﬁcation 555

Fig. 2.

Sample of CXR images from both datasets

model. Meanwhile, ResNet50 surpass the other models in terms of sensitivity.

The next experiment is the Ensemble of all models from the previous experi-

ment. The performance of the individual models is combined/averaged to build

an Ensemble classiﬁer for Tb detection. The Ensemble model output surpasses

all the individual models to boost the overall classiﬁcation accuracy. The detail

experimental results are shown in Table 2 and Fig. 3 respectively.

556 M. Oloko-Oba and S. Viriri

Fig. 3.

Models accuracy performance

Table 2.

Experimental results

Ensemble ResNet50 Vgg16 Inception v3 Accuracy (%) 96.14 89.51 92.10 93.50

Sensitivity (%) 90.03 89.09 88.25 79.31 Speciﬁcity (%) 92.41 90.08 91.00 83.36

4 Discussion

It is evident from Table 2, the performance of individual models shows the Incep- tion V3 model with the best accuracy, Although the overall best performance is achieved through the Ensemble model. One of the obstacles that affect CNN model is overfitting which is common when a model is run on a new dataset. It is essential to control overfitting to achieve better generalization of the model on new datasets. In this study, we employed techniques such as “data augmentation, normalization, dropout, and least absolute deviations (LAD), also known as L1, to control overfitting. As a result of various manifestations of TB, depending on a single model might not be too effective. Hence Ensemble of multiple models employed in this study makes the CAD system more robust with an improved detection rate because with Ensemble, different models could compliment them- selves such that where a model misclassifies an image, one or more of the other models can accurately classify the image. The proposed model generalized well

87 Ensemble of CNN for TB Classiﬁcation 557 despite having the training, validation and test set from diﬀerent datasets. The model performed well when compared with recent Ensemble CAD systems as shown in Table 3.

Table 3.

Proposed model compared with related work. The performances are measured in terms of Accuracy (ACC) and Area Under Curve (AUC)

Authors ref Models combined Performance (%)

Hijazi et al. [10] 2 89.77

Hernandez et al. [9] 3 86.40

Ayaz et al. [2] 6 0.99

Proposed

3 96.14

Hijazi et al. [10], Hernandez et al. [9], and the Proposed model are evaluated using the accuracy, while Ayaz et al. [2] is measured using area under the curve.

5 Conclusion

An Ensemble model comprising of multiple pre-trained models is presented in this study to aid early and accurate TB diagnosis from CXR. The proposed model is trained on one dataset and tested on an alternative dataset, resulting in a good generalization. Due to limited datasets in medical fields, the dataset were augmented along with other techniques to control overfitting. However, the individual CNN classifier achieved good results and was improved through the Ensemble classifier. The proposed model can also be deployed to detect other pulmonary abnormalities. Future work will focus on developing a robust CAD system that could accurately identify foreign objects that may be seen on CXR.

Objects such as “pieces of bones, coins, rings, or button found in the chest can be likened to one of many TB manifestations which will result in misclassiﬁcation.

References

1. Ahsan, M., Gomes, R., Denton, A.: Application of a convolutional neural network using transfer learning for tuberculosis detection. In: 2019 IEEE International Con- ference on Electro Information Technology (EIT), pp. 427–433. IEEE (2019) 2. Ayaz, M., Shaukat, F., Raja, G.: Ensemble learning based automatic detection of

tuberculosis in chest x-ray images using hybrid feature descriptors. Phys. Eng. Sci.

Med.

44, 1–12 (2021)

3. Bloice, M.D., Stocker, C., Holzinger, A.: Augmentor: an image augmentation library for machine learning. arXiv preprint

arXiv:1708.04680

(2017)

4. Cicero, M., et al.: Training and validating a deep convolutional neural network

Dalam dokumen Tuberculosis diagnosis from pulmonary chest x-ray using deep learning. (Halaman 94-98)