Convolutional Neural Network with Visualization of Guided Grad-CA

(1)

Diabetic Retinopathy Detection using Deep

Convolutional Neural Network with Visualization of Guided Grad-CA

Radifa H Paradisa, Alhadi Bustamam Department of Mathematics Faculty of Mathematics and Natural

Science Universitas Indonesia

Depok, Indonesia radifa.hilya@sci.ui.ac.id,

alhadi@sci.ui.ac.id

Andi Arus Victor, Anggun R Yudantha Department of Ophthalmology

Faculty of Medicine Universitas Indonesia Cipto Mangunkusumo National General

Hospital Jakarta, Indonesia arvimadao@yahoo.com, ramalangka@gmail.com

Devvi Sarwinda Department of Mathematics Faculty of Mathematics and Natural

Science Universitas Indonesia

Depok, Indonesia devvi@sci.ui.ac.id

Abstract— One of the complications of diabetes that represents a serious threat to world health is Diabetic Retinopathy (DR). High blood sugar levels in people with diabetes can damage the blood vessels in the retina and causing blindness. DR can be detected by examining the fundus image by an ophthalmologist. However, the limited number of ophthalmologists who can analyze fundus image is an obstacle because the number of DR sufferers continues to increase.

Therefore, an automated system is needed to help doctors diagnose the disease. Researchers have developed deep learning techniques as Artificial Intelligence (AI) approach to finding DR in fundus images. In this research, we use the Deep Convolutional Neural Networks method with InceptionV3 structure and various optimizers such as the Stochastic Gradient Descent with Momentum (SGDM), Root Mean Square Propagation (RMSprop), and Adaptive Moment Estimation (Adam). The fundus image dataset previously through the augmentation and preprocessing steps to make it easier for the model to recognize the image. The InceptionV3 model with the Adam optimizer gave the best results in detecting DR lesions from the Kaggle dataset with 96% accuracy. This paper also presents a Grad-CAM guided activation map that can describe the position of the suspicious lesion to explain the results of DR detection.

Keywords—diabetic retinopathy, deep convolutional neural network, guided grad-cam

I. INTRODUCTION

Diabetic retinopathy (DR) is a main cause of visual impairment in human beings with diabetes mellitus. It is estimated that 463 million adults have diabetes and will increase to 700.2 million in 2045 [1]. According to the International Council of Ophthalmology (2017), 1 in 3 diabetics has some level of DR [2]. In Indonesia, the prevalence of DR in diabetics is 43.1% [3].

DR is characterized by the appearance of dots on microaneurysm (blood vessels), blood vessel leakage, exudates (yellowish lipid spots), retinal swelling, abnormal growth of new blood vessels, and damaged nerve tissue [2]. DR is classified into five classes, including no signs of DR (normal), mild Non-

Proliferative Diabetic Retinopathy (NPDR), moderate NPDR, severe NPDR, and Proliferative Diabetic Retinopathy (PDR) [4].

The examination of DR severity was carried out by direct observation of the patient's retinal image that the ophthalmologist took employing a fundus camera. A doctor analyzes the results of the retinal images, and this examination needs high concentration in analyzing these images.

The limited number of ophthalmologists who can analyze fundus image is an obstacle to handling DR because the number of sufferers continues to increase so that an automatic system is needed that can help doctors in diagnosing this disease. Various Artificial Intelligence methods have been developed by researchers in medical image processing, including fundus images. Deep Learning as a branch of AI provides impressive results in imaging through the Convolutional Neural Network (CNN) method.

Previous studies related to this include Sallam, et al.

[5] obtained an accuracy of 85.68% by applying the CNN architecture, namely ResNet18, on fundus image dataset and the Gaussian filter at the preprocessing stage.

CLAHE and min-max normalization were applied by Pradhan, et al. [6] for fundus image enhancement, then carried out by CNN training to categorize the patient's fundus images into five classes of DR. The results of testing the model achieved an accuracy of 85.68%.

Rizal, et al. [7] classified DR using preprocessing techniques such as Gaussian filter and CLAHE before the dataset entered the EfficientNet model so that an accuracy of 79.8% was obtained. To achieve a higher level of accuracy, Mushtaq and Siddiqui [8] used preprocessing, and augmentation techniques on the dataset, then the CNN model trained was DenseNet169, which achieved an accuracy of 90%.

In this study, the augmentation and preprocessing

(2)

stages were used to increase the quantity and quality of the dataset so that the result of the deep learning model created worked better to detect DR lesions from fundus images. The method proposed in this study uses the CNN architecture, particularly InceptionV3, with varied optimizers such as SGDM, RMSprop, and Adam to work out which optimizer provides the most effective performance.

Several studies have only focused on the model's performance in detecting DR using fundus images. It is essential to build the model of deep learning more interpretable in many applications related to medical imaging. The Grad-CAM visualization approach was used in this study to interpret the detection of DR and describe the position of the suspected lesion features on the fundus image to assist the ophthalmologist in taking further action.

II. MATERIAL AND METHODS

In detecting diabetic retinopathy, several stages of the study were conducted as shown in Fig. 1.

A. Material

The public data used is the fundus image dataset from Kaggle [9]. The dataset is a vast collection of high- resolution fundus photos captured in various imaging conditions such as from different camera models and types. The composition of the Kaggle data is shown in Table I and an example of the Kaggle data for each class is presented in Fig. 2.

Fig. 1. Proposed Method Design

TABLE I. DISTRIBUTION OF DATASETS

CLASS ORIGINAL AUGMENTATION

Normal 1796 1800

Mild NPDR 338 1800

Moderate NPDR 923 1800

CLASS ORIGINAL AUGMENTATION

Severe NPDR 176 1800

PDR 272 1800

Fig. 2. Example of Fundus Image from Kaggle B. Image Augmentation

Deep learning, especially convolutional networks, can prove to be excellent for medical image analysis tasks on large data sets [10]. In this study, augmentation techniques were used to increase the quantity and quality of the training dataset by introducing particular variations to the image so that the model could better recognize it.

The Kaggle data used comes from several clinics collected using various cameras. It allows the image to contain noise, out of focus, and poor contrast. Therefore, the augmentation techniques that we used were flip, brightness, add blur and noise randomly. Adding noise and blur to the training data create the model more robust to increase the accuracy of model recognition [11, 12].

C. Image Pre-processing

The fundus image dataset from Kaggle converted into grayscale, Contrast Limited Adaptive Histogram Equalization (CLAHE) [13] was implemented to uniform the contrast of the image. To lessen the number of parameters throughout the computation process, we cropped the images and resized them to 224 x 224 pixels.

D. Model and Optimizers

In creating the model, researchers used the InceptionV3 model architectures. InceptionV3 [14] is an architecture that provides a high-performance network with relatively low computation costs without affecting generalizability. This model has advanced object recognition performance, which benefits from its unique architecture that includes three parts: basic convolutional blocks, enhanced Inception module, and classifier. The factoring convolution strategy is used to improve computational efficiency. Fig. 3. shows the inception

(3)

module of InceptionV3 leveraging spatial factorization into asymmetric convolution to save on computational costs.

Fig. 3. Architecture of InceptionV3

In this study, the hyperparameters used are optimizers, which include the Stochastic Gradient Descent with Momentum (SGDM), Root Mean Square Propagation (RMSprop), and Adaptive Moment Estimation (Adam). Stochastic Gradient Descent (SGD) is a famous optimization method, but the computation time when training the model is relatively high. The momentum parameter is meant to accelerate learning, especially in handling high curvatures, small but steady gradients, or noisy gradients [15]. Root Mean Square Propagation (RMSprop) is an improvisation of Root Propagation (Rprop) extensively utilized in deep learning models. The essence of RMSprop is to keep a moving average of the square gradient for each weight in the model [16]. Adam is an aggregate of RMSprop and momentum optimizer. Compared to other optimizers, the advantage of Adam is that it can deal with loose gradients in noise problems, consumes less memory, and is efficient in computing time [17].

E. Guided Grad-CAM

Guided Grad-CAM is a visual explanation technique that allows us to visualize each class's specific region or location on the image data learned by the deep neural network [18]. It does not require architectural changes or retraining but can highlight fine-grained areas of detail in the image and provide the class discriminative capability.

Heat maps have shown a preferred visualization within CNN's last convolutional layer because they have detailed spatial information and the best high-level semantics [18]. Therefore, we created a heat map of the final convolutional layer for our fundus image with Guided GradCAM, which addresses the CNN focus area on the feature map. Guided Grad-CAM is obtained from the combination of pointwise multiplication between Guided Backpropagation and Grad- CAM. An

illustration of this method is shown in Fig. 4.

Fig. 4. Illustrasion of Guided Grad-CAM [19]

F. Model Evaluation Metrics

Generally, the performance evaluation measures of the classification model use a confusion matrix to compare the classification results by the model with the actual classification [20]. The matrix table in the confusion matrix describes the evaluation of the model on the test data whose actual value is known. There are four values generated in the table of the confusion matrix, which include True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). For multiclass classification, the class that we will calculate the performance is considered a positive class, and the other classes are considered negative classes. The illustration of the confusion matrix is presented in Table II and evaluation metrics is shown in (1), (2), (3), and (4).

TABLE II. THE CONFUSION MATRIX TABLE

= (1)

= (2)

1 − = ^!"

!" (3)

# $ % = (4)

III. RESULTS AND DISCUSSION

A total of 9000 fundus image data were obtained from oversampling the Kaggle data with augmentation techniques such as random flip, brightness, add blur, and noise. The data consists of five classes and each class has 1800 fundus images. The training process used 5600 data, and the rest was needed to test the model. To easily recognize the dataset, the preprocessing is done before the dataset entered the training model.

Predicted Class

Positive Negative

Actual Class Positive TP FN

Negative FP TN

(4)

The model used in this study is InceptionV3, with various optimization techniques consisting of SGDM, RMSprop, and Adam. We set the initial learning rate to 0.001, the epoch was formed to 20, and the cross-entropy category as a loss function. If the loss value increases in an epoch, the learning rate will continue to be divided by 10. Evaluation of model training performance in the form of validation accuracy and loss from our model for each optimization is shown in Fig. 5. At the beginning of the epoch, SGDM looked better. However, after the 7th epoch, Adam optimizer achieved best validation accuracy and validation loss compared to SDGM and RMSprop.

The model performance with SGDM optimizer on the test data is shown in Table III. Each class had a good evaluation, especially in normal class with a precision of 97%, recall and f1-score are 98%. Thus, it achieves an accuracy of 92%. Table IV shows the model performance with the RMSprop optimizer. Each class has a precision, recall, and f1-score performance exceeding 90% and an accuracy of up to 95%.

The results of model testing with Adam Optimizer is shown in Table V, the accuracy is 96%. Thus, the performance of the model with Adam optimizer outperforms other optimizers.

For comparison, Table VI shows the results of the InceptionV3 model with Adam optimizer using the original data before augmentation. It can be seen that the results using data without augmentation are not better than models built with augmented data. Thus, augmentation techniques can improve model performance.

Fig. 5. Validation Loss and Validation Accuracy of the Model with Various Optimizers

TABLE III. MODEL PERFORMANCE USING SGDM OPTIMIZER

TABLE IV. MODEL PERFORMANCE USING RMSPROP OPTIMIZER Precision Recall F1-score

Normal 99% 97% 98%

Mild NPDR 93% 94% 94%

Moderate NPDR 92% 91% 92%

Severe NPDR 96% 97% 96%

PDR 95% 96% 95%

Accuracy = 95%

TABLE V. MODEL PERFORMANCE USING ADAM OPTIMIZER Precision Recall F1-score

Normal 100% 99% 99%

Mild NPDR 92% 96% 94%

Severe NPDR 98% 97% 98%

PDR 94% 97% 95%

Accuracy = 96%

TABLE VI. MODEL PERFOR1MANCE USING ADAM OPTIMIZER WITHOUT DATA AUGMENTATION

Precision Recall F1-score

Normal 97% 99% 98%

Mild NPDR 61% 50% 55%

Severe NPDR 43% 22% 29%

PDR 71% 61% 66%

Accuracy = 84%

In Table VII, the researcher compared the proposed method with previous studies that used a single CNN model in classifying DR in 5 classes. The results of our proposed method are better than other researches by obtaining an accuracy of 96%.

Precision Recall F1-score

Normal 97% 98% 98%

Mild NPDR 90% 92% 91%

Severe NPDR 95% 94% 95%

PDR 89% 92% 91%

Accuracy = 92%

(5)

TABLE VII. COMPARISON OF MODEL PERFORMANCE WITH OTHER STUDIES

Method Preprocessing Augmentation Accuracy

ResNet18 [5]  - 70%

CNN custom [6]

(SGDM optimizer)

 - 85.68%

EfficientNet [7] (SGD optimizer)

 - 79.8%

DenseNet169 [8]

(Adam optimizer)

  90%

Proposed method - InceptionV3 (Adam

optimizer)   96%

We also create heatmap visualizations to interpret our deep network learning with Guided Grad-CAM techniques intuitively. This visualization highlights prognostic areas in the fundus image for future clinician review and analysis. It is expected to support real-time clinical validation of computerized diagnoses at the point of care. An example of the Guided Grad-CAM visualization is shown in Table VIII. The fundus images in the table have highlighted the areas of the lesion that can discriminate between classes more accurately.

IV. CONCLUSION

An automated system was built to detect DR by categorizing fundus images into five classes. In this study, the augmentation and preprocessing stages were carried out before the InceptionV3 model architecture trained the fundus image dataset. With these stages, it is easier for the model to recognize the pattern of the fundus image dataset to improve model performance.

Based on model testing, the application of augmentation techniques, and preprocessing techniques with Adam as the optimizer achieved the best performance in detecting DR. The accuracy is 96%, precision, recall, and F1-score each class exceeds 90%.

These results are better than other studies that did not apply augmentation and preprocessing to the dataset.

We also visualize our model network using Guided Grad- CAM in the form of heat maps. It aims to show the results of DR detection on fundus images with visualization highlighting the lesion area that can distinguish between DR classes. Thus, enabling automatic clinical review and diagnosis verification.

ACKNOWLEDGMENT

Thanks to RISTEK-BRIN (Directorate General of Higher Education) and the Directorate of Research and Community Engagements Universitas Indonesia for supporting this research by the PDUPT 2021 research grant (NKB- 155/UN2.RST/HKP.05.00/2021).

TABLE VIII. GUIDED GRAD-CAM OF PROPOSED METHOD

Normal Mild NPDR Moderate NPDR Severe NPDR PDR

Original

Pre-processing

Guided Grad-CAM

(6)

REFERENCES

[1] International Diabetes Federation, "IDF Diabetes Atlas, 9th edn,"

Brussels, Belgium, 2019. [Online]. Available:

https://www.diabetesatlas.org

[2] T. Wong, L. Aiello, F. Ferris, N. Gupta, R. Kawasaki, and V.

Lansingh, "Updated 2017 ICO guidelines for diabetic eye care," Int Counc Ophthalmol, pp. 1-33, 2017.

[3] T. Wahyu and M. Syumarti, "The Epidemiology of Diabetic Retinopathy," 2019. [Online]. Available:

http://perpustakaanrsmcicendo.com/wp-

content/uploads/2019/10/The- Epidemiology-of-Diabetic- Retinopathy.Tri-Wahyu.pdf.

[4] C. Wilkinson et al., "Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales,"

Ophthalmology, vol. 110, no. 9, pp. 1677-1682, 2003.

[5] M. S. Sallam, A. L. Asnawi, and R. F. Olanrewaju, "Diabetic Retinopathy Grading Using ResNet Convolutional Neural Network,"

in 2020 IEEE Conference on Big Data and Analytics (ICBDA), 2020: IEEE, pp. 73-78.

[6] A. Pradhan, B. Sarma, R. K. Nath, A. Das, and A. Chakraborty,

"Diabetic Retinopathy Detection on Retinal Fundus Images Using Convolutional Neural Network," in International Conference on Machine Learning, Image Processing, Network Security and Data Sciences, 2020: Springer, pp. 254-266.

[7] S. Rizal, N. Ibrahim, N. K. C. Pratiwi, S. Saidah, and R. Y. N.

Fu'adah, "Deep Learning untuk Klasifikasi Diabetic Retinopathy menggunakan Model EfficientNet," ELKOMIKA: Jurnal Teknik Energi Elektrik, Teknik Telekomunikasi, & Teknik Elektronika, vol.

8, no. 3, p. 693, 2020.

[8] G. Mushtaq and F. Siddiqui, "Detection of diabetic retinopathy using deep learning methodology," in IOP Conference Series: Materials Science and Engineering, 2021, vol. 1070, no. 1: IOP Publishing, p.

012049.

[9] Asia Pacific Tele-Ophthalmology Society (APTOS). APTOS 2019 Blindness Detection [Online] Available:

https://www.kaggle.com/c/aptos2019-blindness-detection/data [10] C. Shorten and T. M. Khoshgoftaar, "A survey on image data

augmentation for deep learning," Journal of Big Data, vol. 6, no. 1, p. 60, 2019.

[11] M. Sáiz-Abajo, B.-H. Mevik, V. Segtnan, and T. Næs, "Ensemble methods and data augmentation by noise addition applied to the analysis of spectroscopic data," Analytica chimica acta, vol. 533, no.

2, pp. 147-159, 2005.

[12] C. Song, W. Xu, Z. Wang, S. Yu, P. Zeng, and Z. Ju, "Analysis on the impact of data augmentation on target recognition for UAV- based transmission line inspection," Complexity, vol. 2020, 2020.

[13] S. B. Patil and B. Patil, "Retinal fundus image enhancement using adaptive CLAHE methods," Journal of Seybold Report ISSN NO, vol. 1533, p. 9211.

[14] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,

"Rethinking the inception architecture for computer vision," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818-2826.

[15] S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv:1609.04747, 2017.

[16] G. Hinton, N. Srivastava, and K. Swersky, "Neural networks for machine learning lecture 6a overview of mini-batch gradient descent," Cited on, vol. 14, no. 8, 2012.

[17] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.

[18] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-cam: Visual explanations from deep networks via gradient-based localization," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618-626.

[19] R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra, "Grad-cam: Why did you say that?," arXiv preprint arXiv:1611.07450, 2016.

[20] J. Patterson and A. Gibson, Deep learning: A practitioner's approa ch. " O'Reilly Media, Inc.", 2017.