Strategies to Improve Performance of Convolutional Neural Network on Histopathological Images

(1)

Strategies to Improve Performance of Convolutional Neural Network on Histopathological Images

Classification

Toto Haryanto^1*, Heru Suhartanto¹

1Faculty of Computer Science, Universitas Indonesia Depok, 16424 Indonesia

e-mail: [email protected], [email protected]

Aniati Murni¹, Kusmardi Kusmardi²

2Faculty of Medicine, Universitas Indonesia Jl. Salemba Raya No. 6, Jakarta Pusat 10430 e-mail: [email protected], [email protected]

Abstract—Convolutional Neural Network (CNN) has been widely used in medical image processing. Histopathology is one of modality or images for a pathologist to analyze the status of cancer. The unstructured pattern of this image cause the problem, tend to miss identification or takes more time to analyze by the pathologist. Besides that, Deep learning training generally requires powerful hardware resources to improve performance during the training. Therefore, to address these problems, we propose two main activities in this study; to accelerate training time and to enhance the histopathology dataset. We train our CNN on three similar GPU specification (GTX-1080) as an alternative to become training time is faster. Mean-shift filter is one of the low-pass filter technique. We use this to handle unstructured pattern on histopathology images to enhance this dataset. The performance of all three GPUs is presented during the training process with 500 epochs measure by the speedup.

Meanwhile, the performance of model testing is carried out with several batch-size selection scenarios from 32,64,128 and 256.

The use of mean-shift can improve convergence during training in 128 batch-size become faster.

Keywords—; CNN, GPU, histopathology I. INTRODUCTION

The use of histopathological imagery for identification and analysis of cancer status via a computer approach provides an important role for pathologists. With computer-based, the process of identification and analysis can be done faster and more precise. With the development of technology in the field of image processing, histopathological image data can be trained using machine learning techniques to obtain patterns from the data. Patterns that have been formed can be used to predict new images for diagnostic necessary.

Meanwhile, in the medical field, according to the research, the development of cancer conducted by the American Cancer Society, shows that cancer is still one of the leading causes of death in the world. Even throughout 2018 at least more than 1.7 million cases of cancer are found and it is projected that the cancer death rate reaches more than 30% or about six hundred thousand cancer death in the United States [1].

Convolutional Neural Network (CNN) has been widely implemented in the field of image processing. Especially in the field of medicine, CNN has an important role as a state of the art in the field of medical image processing for both segmentation and classification. Histopathological image is one type of medical image used by pathologists to help diagnose cancer status of a patient. Comprehensive histopathological image analysis was carried out in the study [2]. The histopathological image analysis with a machine learning approach is more specifically conducted in research [3].

Cancer analysis and identification based on the histopathological image was published in [4]. In the study, CNN was developed by adding contour information from the image so as to increase the precision of the resulting cancer cell segmentation process. The CNN architecture developed consists of six convolution layers, six max-pooling layers, and three de-convolution layers. From the study, the contour provides additional information in segmenting Nuclei. As a result, the study successfully won the MICCAI Nuclei Segmentation Challenge in 2015.

Other studies that also use histopathological images for identification of cancer was in [5]. This study utilizes histopathological images of the whole slide of gastric carcinoma tissue. The CNN architecture that was designed was then trained and compared with several conventional techniques for feature extraction such as GLCM, Gabor Filter, and Local Binary Pattern histograms. There are three convolution layers, three max-pooling layers and two dense layers on the CNN architecture in this study. In general, the cancer classification obtains the accuracy of 0.699 and detection of necrosis reached 0.81 of accuracy.

After the Graphics Processing Unit (GPU) produced for various researches, deep learning is one of the discipline areas that can not be separated from the use of the GPU. Deep learning in general and CNN, in particular, requires very fast computing. With GPU, training can be done more quickly so as to provide an opportunity to improve architecture, parameters or even hyper-parameters if the initial model we designed still doesn't get good results.

(2)

The use of GPU in the field of deep learning has been proposed in research [6]. In this study accelerated the CNN feed-forward process so that the object detection process can be done faster. The results of the study found that the speedup of the algorithm developed reached 6.97 times compared to the use of the CuDNNv3 standard library. This research was carried out on a server with one GPU-GTX 980.

GPU also has a role that is no less important in the world of medical research. This is shown in several published studies such as [7]–[10]. All of these studies have succeeded in exploiting the GPU to accelerate the computation task.

The problem in histopathology image is the unstructured pattern, so we have to enhance the image using the filter technique. Training time is the other problem for CNN if running on the high-resolution of the image. The three main terms above, Histopathology, CNN and GPU underlie this study to develop a classification model for identification of cancer types with the CNN approach in a multi GPU environment. This study will also analyze processor and memory utilization for each GPU machine during the training process that is conducted.

II. METHOD A. Dataset

To conduct this study, we used histopathological image data from previous studies. The data is in the study are histopathological images with Hematoxylin and Eosin staining (H & E) refers to the [11] and [12]. The data is obtained by using a VLI120 scanner with a magnification of 20X and a 0.55 / pixel resolution. Before the training, the augmentation process is carried out on the data. Augmentation is done to enhance or enrich the variety of data, especially if the data we have is imbalanced [13].

B. Network Design and Training

CNN consists of some layers such as input, convolution, pooling, dropout, fully connected, and output layers.

Convolution layers are designed to extract the feature of our images. In convolutional layer, features will be extracted from low-level feature up to high-level feature according to the depth of our network. In the convolution layer, there is a dot- product operation between the real image and kernel/filter to produce convolved features. Kernel-size is a matrix with a smaller size than the real images. In this study, we use kernel- size 5x5. We use 5x5 kernel size and combined with one padding when convolution step to minimize lose information during feature extraction process. The number of neurons is represented by the number of filters in CNN. We determine 32 and 64 for number of neurons in convolutional layers.

Pooling layer will reduce the dimension of the images during the feature extraction process in CNN. Most of CNN architecture have pooling after convolutional layer. There are several types of pooling layer, minimal pooling, average

pooling, and maximal pooling. This study uses maximal pooling when choosing a maximal number of pixels. A maximal number from convolutional output is the most particular images. We use maximal pooling to extract particular features in histopathology images. After pooling, the next layers are fully connected layers. In our CNN network, there are three layers of fully connected with the number of neurons are 1024, 512 and 256 respectively. Every neuron in this layer will receive input from previous layers so this layer called fully connected. At the end of the layer, our CNN has one output layers with softmax function to classify two types of cancer. Softmax function will compute probabilities for each target class among all possible class refers to (1)

= ∑

i x x

i _i

i

e

f(x ) e

(1)

To avoid vanishing gradient during training, we apply ReLu function in convolution layers. ReLu also can be able to make the training process faster than without this function.

Equation (2) refers to the formula of ReLu.

0 0

; 0

≥

<

 



= x

x for for

; x

ReLu

(2)

In ReLu activation function, it gives an output x if x positive and 0 otherwise. ReLu has less computationally expensive than tanh or sigmoid activation function. If we use a big neural network with a lot of neurons, using tanh or sigmoid function cause all activation fire or to be processed for describing all the output of the network and of course the activation is dense. The benefit of ReLu is that if we have a big network with randomly initialized weight, almost up to 50% yield activation 0 because the characteristic of ReLu that result 0 when the value of x is negative. This means not all activation is firing. It is called sparse activation for ReLu. Fig.

1 is a graph of ReLu function.

Fig. 1. ReLu activation function -10 -5 0 5 10

10 8 4 2

(3)

Our CNN architecture comprises seven convolution layers, seven max-pooling layers, three dense or fully connected layers, and one output layers. Detail of the network structure can be seen in Fig 2.

C. GPU Utilization

Utilization of multiples GPU conducted by evaluating the GPU processor and GPU memory. Every GPU in our machine has unix PCI ID describes in Table I. During training, we record the utilization for each GPU.

TABLE I. DETAIL OF GPU MACHINE

PCI : GPU ID ID Name Memory size

Cuda Cores 0000:01:00 GTX-1080 8144 MB 2650 0000:02:00 GTX-1080 8144 MB 2650 0000:03:00 GTX-1080 8144 MB 2650

When training is running, both of processor and memory of GPU are recorded until training stop by end of the epoch. The performance of the GPU is captured and described compared to the CPU.

Conceptually, the strategy for training on multi GPUs is to copy and share data based on the batch-size size determined by the number of GPU machines used to carry out the training process. The results of the training for each model will be sent to the CPU. Fig.3. illustrate the process of CNN training on multi GPU.

D. Training Speedup

CPU usage when conducting training will be compared with GPU usage so that the time comparison can be calculated. This comparison is known as the speedup value of the GPU. The speedup value is mathematically written by referring to equation (3)

Tp

speedup=Ts (3)

where Ts is a sequential time of training using CPU and Tp is parallel time by GPU for training.

E. Mean-shift filter Algorithm

In digital image processing, image enhancement is one of the tasks undertaken to improve images. Mean-shift filter as one of the techniques of the low pass filter. The mean-shift algorithm is presented below. Mean-shift actually is filtering function considering surrounding pixels [14].

Fig. 2. CNN architecture design for histopathological image training

Fig.3. Desgin parallelization for CNN training

(4)

III. RESULT

A. Dataset

The data preparation of histopathological images comprises two classes, benign and malignant with 224x224 pixel size. The total of 5000 data are collected for training our models. Fig. 4 shows two of the classes of the benign and malignant dataset.

Benign Malignant

Fig. 4 : Examples of the histopathological dataset On colon cancer and most of the cancer type, benign is associated with healthy tissues and malignant associated with infected tissues. Tissue with cancer cell will change its morphology of the cell nucleus.

B. Data Augmentation

To enrich the dataset, we apply the augmentation step of our data. There are some techniques of augmentation in our research such as flip, shear, zoom and rescale images. We illustrate augmentation in Fig. 2.

Fig. 5. Augmentation process in our dataset: shear (a), horizontal flip (b) and zoom in (c)

C. CNN input and output design

Data entering the CNN network undergoes dimensional adjustments, then a feature extension process occurs from the low level to the high-level extraction. At the input layer, the size of 224x224x3 with three represents the RGB color layer so that it will produce an output of the same size. At the convolution layer, the output of the input layer will act as input at the first convolution layer.

The output of the convolution layer is called the feature map. The size of the output of the convolution layer will depend on several parameters such as kernel size, stride and the presence or absence of zero padding in the convolution. In this study, the number of zero padding that we use is 1. The output size of the convolution layer refers to equation (4)

2 1 + +

= − S

P K

O W (4)

Where W is the input size, K is the filter size, P is the number of padding and S is the number of strides. The results of the input and output designs in this study are presented in Table II.

Let xi : d-dimensional input

zi : filtered image pixel in the joint spatial-range domain

where i = 1,2 … n.

for each pixel :

Initiate j = 1 and yi,1= xi

Compute yi,j+1 refers to (1) until convergence, y = yi,c Assign zi = (

x

_i^s

, y

_i^r_,_c

)

where (1) =

∑

=

= +

 





 



 −

 





 



 −

=

n

i

i n

i

i i

j

h x g x

h x g x x y

1

2 1

2

1 j = 1,2 ….

(a)

(c)

(b)

(5)

D. Evaluation

Our models are evaluated using loss function and accuracy function represented by graphs.

Fig. 6: Validation accuracy of models

Fig.6 shows us the accuracy of the validation data set from various batch-size. Models with the smallest batch size (32) can be able to reach convergent faster than models with high batch-size. Besides the accuracy, loss function also shows to measure the performance of the models. Fig.7 describes loss function.

Fig. 7. Validation loss of models

E. GPU Performance and Utilization

The GPU performance is seen by calculating training time using all three GPUs. The training time is then compared with CPU usage to get the speedup value. While GPU utilization is used in the use of GPU processes and memory during the training process, Table III below shows the speedup of the training process with the multiples GPU refers to the equation (3).

TABLE III. ACCELERATION OF GPU FOR CNN TRAINING

CPU Training (s) GPU Training (s) Speedup (s)

136.943 15.760 8.68

IV.DISCUSSION

Analysis and identification of cancer, based on histopathology imagery is still one of the methods conducted by pathologists. The process carried out so far is still semi- automatic. This means that there are certain parts that are still done manually and partially done automatically. The presence of image processing and deep learning techniques combined with the use of GPU to accelerate the identification process has the contribution to the pathologists.

According to Fig.6, from the accuracy, it can be seen that training with batch size 128 requires more epochs to converge.

Therefore, in this study, we made improvements to the histopathology images dataset by applying a mean-shift filter algorithm then we re-trained the model. After the mean-shift is TABLE II. INPUT AND OUTPUT DATA ON CNN

Layer Input Filter Stride Ouput

Input 224x224x3 NA NA 224x224x3

Conv 224x224x3 5x5 1 224x224x32

MaxPool 224x224x32 2x2 2 112x112x32

Conv 112x112x32 5x5 1 112x112x32

MaxPool 112x112x32 2x2 2 56x56x32

Conv 56x56x32 5x5 1 56x56x32

MaxPool 56x56x32 2x2 2 28x28x32

Conv 28x28x32 5x5 1 28x28x32

MaxPool 28x28x32 2x2 2 14x14x32

Conv 14x14x32 5x5 1 14x14x64

MaxPool 14x14x64 2x2 2 7x7x64

Conv 7x7x64 5x5 1 7x7x64

Conv 3x3x64 5x5 1 3x3x64

FC 1x1x64 NA NA 1024x1

FC 1024x1 NA NA 512x1

FC 512x1 NA NA 256x1

output 256x1 NA NA 2x1

(6)

applied, the original image undergoes changes and as shown in Fig. 8.

With this mean-shift filter, the nucleus appears to be more smooth and the cytoplasm also experiences the same thing. All filtered training images are then retrained on the same batch size, which is 128.

After training, we compare the validation accuracy before and after mean-shift filter applied. The Graphic on Fig. 9 is a graphical comparison of accuracy before and after the mean- shift filter is applied.

Fig. 9. Validation accuracy before (blue) and after the mean- shift filter (red) applied to histopathology images From Fig. 9. It can be seen that after the mean-shift filter is applied, the convergence process can be achieved faster, at the 29^th epoch, there has been an increasing accuracy.

The trained model is stored and used as a reference to test new histopathological image data to be diagnosed. A number of our new data are tested with the models that have been developed and the results as we presented in Table IV for validation dataset and Table V for prepared testing dataset.

TABLE IV. CONFUSION MATRIX OF THE VALIDATION DATASET

Predicted

Actual benign malignant

Benign 608 117

malignant 7 970

TABLE IV. CONFUSION MATRIX OF THE TESTING DATASET

Predicted

Actual benign malignant

Benign 23 14

malignant 9 34

According to the Table, we can calculate the precision, recall, and F1 score refers to the equation (5), (6) and (7).

FP TP precision TP

= + (5)

FN TP recall TP

= + (6)

recall precision

recall precision score

F

× +

= *

2 _

1 (7)

For the validation data set, the model can reach a precision of 0.83, recall of 0.98 and F1 score of 0.90. Meanwhile, if we test the external data test, the model performance reaches the precision, recall and F1 score of 0.71, 0.62 and 0.67 respectively.

V. CONCLUSION

The approach of deep learning histopathological images is very helpful for pathologists to speed up the process of analysis and identification. Histopathological images in an unstructured form will require time to convergent. The approach of a mean-shift filter can improve CNN training become faster to convergent. The Convolutional Neural Network (CNN) which is trained in multiple GPU environments provides a solution to assist pathologists in diagnosing cancer status and also with multiple GPU environments, the training process runs faster.

ACKNOWLEDGMENT

This research is funded by Penelitian Terapan Unggulan Perguruan Tinggi (PTUPT) 2019 Ministry of Research, Technology and the High Education Republic of Indonesia with grant number NKB-1691/UN2.R3.1/HKP.05.00/2019.

Real image (a) Shift filter image (b)

Fig. 8 Image transition before and after mean-shift filter is applied (a) real image before mean-shift filter ; (b) image after mean-shift

filter

(7)

REFERENCES

[1] R. L. Siegel, K. D. Miller, and A. Jemal, “Cancer Statistics, 2018,” CA. Cancer J. Clin., vol. 68, no. 1, pp. 7–30, 2018.

[2] M. N. Gurcan, L. E. Boucheron, A. Can, A. Madabhushi, N. M.

Rajpoot, and B. Yener, “Histopathological Image Analysis: A Review,” IEEE Rev. Biomed. Eng., vol. 2, pp. 147–171, 2009.

[3] D. Komura and S. Ishikawa, “Machine Learning Methods for Histopathological Image Analysis,” Comput. Struct.

Biotechnol. J., vol. 16, pp. 34–42, 2018.

[4] H. Chen, X. Qi, L. Yu, Q. Dou, J. Qin, and P. Heng, “DCAN:

Deep contour-aware networks for object instance segmentation from histology images,” Med. Image Anal., vol. 36, pp. 135–

146, 2017.

[5] H. Sharma, N. Zerbe, I. Klempert, O. Hellwich, and P. Hufnagl,

“Deep convolutional neural networks for automatic classification of gastric carcinoma using whole slide images in digital histopathology,” Comput. Med. Imaging Graph., vol. 61, pp. 2–13, 2017.

[6] S. Li, Y. Dou, Q. Lv, Q. Wang, X. Niu, and K. Yang,

“Optimized GPU Acceleration Algorithm of Convolutional Neural Networks for Target Detection,” in Proceedings - 18th IEEE International Conference on High Performance Computing and Communications, 14th IEEE International Conference on Smart City and 2nd IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2016, 2017, pp. 224–230.

[7] J. Zhang, J. Xiao, J. Wan, J. Yang, Y. Ren, H. Si, L. Zhou, and H. Tu, “A Parallel Strategy for Convolutional Neural Network Based on Heterogeneous Cluster for Mobile Information System,” Mob. Inf. Syst., vol. 2017, 2017.

[8] V. Campos, F. Sastre, M. Yagues, J. Torres, and X. Giro-I- Nieto, “Scaling a convolutional neural network for classification of adjective noun pairs with TensorFlow on GPU Clusters,” Proc. - 2017 17th IEEE/ACM Int. Symp. Clust.

Cloud Grid Comput. CCGRID 2017, pp. 677–682, 2017.

[9] S. Wu, M. Zhang, G. Chen, and K. Chen, “A New Approach to Compute CNNs for Extremely Large Images,” Proc. 26th ACM Int. Conf. Inf. Knowl. Manag., pp. 39–48, 2017.

[10] S. Li, Y. Dou, X. Niu, Q. Lv, and Q. Wang, “A fast and memory saved GPU acceleration algorithm of convolutional neural networks for target detection,” Neurocomputing, vol.

230, no. October 2016, pp. 48–59, 2017.

[11] K. Sirinukunwattana, S. E. A. Raza, Y. W. Tsang, D. R. J.

Snead, I. A. Cree, and N. M. Rajpoot, “Locality Sensitive Deep Learning for Detection and Classification of Nuclei in Routine Colon Cancer Histology Images,” IEEE Trans. Med. Imaging, vol. 35, no. 5, pp. 1196–1206, 2016.

[12] K. Sirinukunwattana, D. R. J. Snead, and N. M. Rajpoot, “A Stochastic Polygons Model for Glandular Structures in Colon Histology Images,” IEEE Trans. Med. Imaging, vol. 34, no. 11, pp. 2366–2378, 2015.

[13] B. Leng, K. Yu, and J. QIN, “Data augmentation for unbalanced face recognition training sets,” Neurocomputing, vol. 235, no. October 2015, pp. 10–14, 2017.

[14] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Trans. Pattern Anal.

Mach. Intell., vol. 24, no. 5, pp. 603–619, 2002.

(8)