GPUs Utilization of Residual Network Training for Colon Histopathological Images Classification

(1)

GPUs Utilization of Residual Network Training for Colon Histopathological Images Classification

Toto Haryanto

1Faculty of Computer Science, Universitas Indonesia Depok, 16424 Indonesia

toto.haryanto@ui.ac.id Heru Suhartanto¹

heru@cs.ui.ac.id

Aniati Murni Arymurthy¹

aniati@cs.ui.ac.id Kusmardi Kusmardi²

2Faculty of Medicine, Universitas Indonesia Jl. Salemba Raya No. 6, Jakarta Pusat 10430

kusmardi.ms@ui.ac.id Abstract— Cancer is still one of the diseases with a high

mortality rate in the world. Histopathological image is one image that can be used to analyze the presence of Cancer in the human body. The deep learning approach as state of the art was conducted by researchers to investigate the image of Cancer. One of the deep learning architectures is Residual Network (ResNet). This architecture has the characteristics of additional input on the layer, when the training process which has an impact on the memory and processor extension during the training process. In this work, we propose the parallelization of the ResNet model by using three GTX-1080 of Graphics Processing Units (GPUs) to carry out the training process. The performance of all three GPUs can be seen from the utilization of the GPU processor and memory and speed up during the training. The advantage of using parallelization with multiple GPUs is to overcome the out of memory in larger batch-size usage that cannot be handled by the use of a single GPU. This study uses various batch-sizes ranging from 8,16,24 and 32 as research scenarios. The results showed that the utilization of processor and memory is more efficient for larger batch-size. As a result, the average utilization of processors for GPU 1, 2, and 3 is 66%, 61.5%, and 81.5%, respectively. Meanwhile, memory GPU utilization is 44%, 40%, and 48.2%.

Keywords—GPU; GTX-1080; histopathology; ResNet I. INTRODUCTION

Until now, Cancer is still one of the diseases with a high mortality rate in the world. Based on research published in the journal Cancer in 2018, it is known that in the United States, Cancer that attacks the digestive system still occupies the highest mortality rate followed by Cancer that attacks the respiratory system [1].

The pathologist identified Cancer by analyzing microscopic images of cancer cells suspected of being Cancer known as histopathological images. This analysis process generally takes 4-5 hours for each sample to decide whether or not a tissue has Cancer. Some tools have been developed called computer-assisted diagnosis (CAD).

Research related to histopathological images and computer- assisted diagnosis (CAD) has been proposed by several researchers such as [2], [3], and [4].

The presence of a Graphical Processing Unit (GPU) is a breakthrough to accelerate computational processes in various fields of research. One of the uses of GPUs in deep learning includes "Wheel" which is the acceleration of CNN in the training process. The results of this study were able

deep learning has produced speed up to 4.73 times with 4 nodes (32 GPUs) [6]. Studies of the use of GPUs for medical images were also carried out by other researchers such as [7], [8], and [9].

Convolutional Neural Network (CNN) is a deep learning architecture that has been widely implemented in the field of medical image processing. Several studies related to histopathology by applying CNN, such as [10], [11], [12], and [13]. Histopathological image analysis to identify lesions in breast cancer was performed in the study using the convolutional neural network technique. This research is focused on developing a framework for feature extraction based on the nucleus guide [14]. The ability to extract features and then to classify the images have become the advantage of CNN. The CNN architecture developed generally has a sequential structure from one layer to the other next layer.

One of the deep learning architectures is Residual Network (ResNet). ResNet architecture has characteristics that can add an identity (input) denoted as x to the output layer block. Thus, if the output of a layer is defined as f (x), then on the Residual Network, the output of the layer block is f (x) + x [15]. From the study, it was known that the use of Residual Network was able to produce smaller errors compared to not using residual networks. The success of ResNet then an inspiration to conduct research with this architecture for the classification of 3D Brain images [16].

Other research shows the advantages of Residual Network compared to Convolutional Neural Network (CNN) to detect key points on the face [17]. ResNet has also been successfully implemented to reconstruct medical imagery [18].

The addition of x input to the output layer will have an impact on the training process, especially the increasing of computation time and memory space. Training with Residual Network requires more working on processors and memory compared to ordinary convolutional neural network (CNN) architectures. This is because the CNN architecture does not require input on the output block layer.

Therefore, the memory and processor consumption in CNN less than the Residual Network. ResNet has advantages in avoiding vanishing gradient when the network architecture is going deeper and good performance for a classification task. However, the problem is in computational expensive and memory consumption during training. These problems underlie us to propose this study to utilize multiple GPUs in

(2)

up training activities. Through this research, GPU utilization, both processor and GPU memory during the training process will be analyzed. Besides that, the performance of the Residual Network model was also analyzed by looking at the loss value and accuracy of the designed model.

II. MATERIAL AND METHODS A. Dataset

Histopathological data is one of the medical data used by pathologists to analyze the cancer status of patients. This study uses histology data with Hematoxylin & Eosin (H &

E) from colorectal Cancer based on the study [19] [20]. This data is obtained by VLI120 scanner from colon histopathology images with 224 x 224 pixels. A total of images used in this study was 5302 images comprise of 3690 for training and 1702 for validation. There are two classes that will be classified in our research benign and malignant described in Fig. 1.

Benign Malignant

Fig. 1. Samples of histopathology tissue: benign and malignant

Before processed into the network, the data will be enriched with the augmentation process. There are some augmentation tasks for our data, such as rotation, vertical flip, horizontal flip, zoom, and share. Fig.2 illustrates the augmentation task of our dataset.

Fig.2. Image augmentation

B. Residual Network Architecture Design

Residual networks have residual blocks layer and sequential layer. In the residual block, the output of the block will be carried out the addition with input and will continue to the next layer. We design residual networks with twelve residual blocks. For each block, there are eight

layers. After the residual layer, there are average pooling layers and three fully connected layers, then output layers.

Some parameters set for training the architecture include learning rate 0.0001, batch normalization in residual layers, several batch-size from 8 to 32. The purpose of batch normalization is to allow us to use a larger learning rate.

We apply ReLU as an activation function and use softmax at the output layer to classify cancer status. ReLu can be applied to avoid vanishing gradient during training.

ReLu has less computational time than tanh or sigmoid activation function. The output of ReLu activation refers to the equation (1).

0 0

; 0

≥

<



=

x x for for

; x

ReLu (1)

The advantage of using ReLu is sparse activation.

Sparse activation is the condition when not all of the output of activation are fired. This is because of the characteristic of ReLu. If we have a big neural network with randomly initialized weights, using tanh or sigmoid activation function will process all of the output layers. This condition causes more time in computational expenses.

At the end of the layer, we apply softmax to classify the two types of cancer benign and malignant. Softmax is a function to calculate the probability of each class over all of the possible class refers to equation (2)

=



i x x

i i

i

e

f(x ) e (2)

The developed architecture is trained as many as 500 epochs to reveal convergence values. Evaluation of our model was measured by accuracy and loss function for the optimization of the model. Accuracy measures the ratio of the number of correct prediction to the total number of in input as shown in equation (3)

data input total

s prediction correct

Accuracy

_ _

# _

= # (3)

Meanwhile, loss values are obtained from cross-entropy function refers to equation (4)



=

−

=

C i

i

i s

t L

1

)

log( (4)

Where,

t

_iand

s

_i are ground truth and CNN score for each class i in C. The architecture of Residual Network comprises some residual block. In this study, we have twelve residual blocks. In general, the characteristics contained in the residual block are homogenous. In the residual block, we apply batch normalization to normalize the input and ReLu to avoid vanishing gradient. After residual block, the network followed by three fully connected layers with dropout function for each layer. At the end of layer, we use softmax function to classify the type of Cancer.

Vertical flip

Image

Image shearing

(3)

C. Architecture Parallelization

Parallelization is conducted to optimize the three GPU machines that have been installed. In this parallelization, the inputs are divided into several (multiple) sub-batches. After that, the model will be copied for each sub-batch by the number of GPUs specified. The last results for each model are combined to become the final model. The illustration of the parallelization is shown in Fig 3.

Fig.3. Model Parallelization Scheme

To implement the parallelization on three GPUs, we apply a module from “multi_gpu_model” developed by Keras version 1.14.0 framework. We illustrate the subsection of parallelization during training explained in the three steps.

• Divide model input. When training started, the input image (batch size) will distribute to the dedicated GPU. The script for this step is bellow.

divide_model_input function def get_slice(data, i, parts):

shape = K.shape(data) batch_size = shape[:1]

input_shape = shape[1:]

step = batch_size // parts if i == parts - 1:

size = batch_size - step * i else:

size = step

size = K.concatenate([size, input_shape], axis=0) stride = K.concatenate([step, input_shape * 0], axis=0) start = stride * i

return K.slice(data, start, size)

• Copy and execute the model on dedicated GPU.

After the images are distributed, the model copy is executed on a dedicated GPU.

model_copy_function

for i, gpu_id in enumerate(target_gpu_ids):

with tf.device('/gpu:%d' % gpu_id):

with tf.name_scope('replica_%d' % gpu_id):

inputs = []

# Retrieve a slice of the input.

for x in model.inputs:

# In-place input splitting which is not only # 5% ~ 12% faster but also less GPU memory # duplication.

with tf.device(x.device):

input_shape = K.int_shape(x)[1:]

slice_i = Lambda(get_slice,

output_shape=input_shape, arguments={'i': i,

'parts': num_gpus})(x) inputs.append(slice_i)

# Apply model on slice

# (creating a model replica on the target device).

outputs = model(inputs) outputs = to_list(outputs)

• Concatenate (on CPU) into big batch. This step allows the merge process after on CPU after model executed on dedicated GPU.

merge_function

with tf.device('/cpu:0' if cpu_merge else '/gpu:%d' % target_gpu_ids[0]):

merged = []

for name, outputs in zip(output_names, all_outputs):

merged.append(concatenate(outputs, axis=0, name=name)) return Model(model.inputs, merged)

The GPU parallelization is needed because the Residual Architecture consumes extra GPU memory for training. To illustrate the memory consumption, here is the calculation process. Our architecture consists of the sequential block, including fully connected and 12 residual blocks. The memory consumption during training depends on memory for parameters, memory for parameter gradient, memory for momentum, memory for convolution, and memory for layer output.

D. Experiment Setup

Our experiment runs on Ubuntu 16.04 Operating System with supported by CUDA 9.0 architecture. Besides, we use cudnn as GPU accelerated library for deep learning, Keras 2.0 as a high-level framework, and Tensorflow 1.5 as a middle layer framework. All software and are installed on server machine hardware with specification Intel Core i7 16 cores with 3.0 GHz-5960X series with 3.0 GHz speed for each core completed by 128 GB of RAM. Training of architecture runs on 3 GPU with homogenous specification listed on Table I.

(4)

TABLE I. GPU MACHINE SPECIFICATION

GPU Engine

GPU Type GTX-1080

Number of GPU 3

CUDA Cores 2560

Graphics Clock (MHz) 1607 Processor Clock (MHz) 1733 Graphics Performance high-17362

Memory Specs

Memory Clock 10 Gbps

Standard Memory Config 8 GB GDDR5X

Memory Interface GDDR5

Memory Interface Width 256-bit Support technologies CUDA 8.0 above

For training using three GPU simultaneously, we use an additional library called “multi_gpu_model” [21]. How these library works have been explained in subsection architecture parallelization of this section.

E. Acceleration Evaluation

We will Evaluate the acceleration when model training the model using GPU by speedup. The speedup is the comparison between sequential training using CPU and parallel via GPU refers to the equation (5)

Tp

speedup=Ts ⁽⁵⁾

Where Ts is sequential training time, and Tp is parallel training time.

III. R^{ESULT AND}D^ISCUSSION A. Experiment Scenarios

We designed this scenario by training several numbers of batch-size, while some parameters are fixed. Batch size is a number of data used during the training process that is used in the process of updating the weight of a network which will have implications for the processor and memory used, loss, and accuracy of the model. In this study, batch- size values were used from 8,16,24 and 32.

B. Performance of GPU

During the training process, we monitor processor performance and GPU memory. Each GPU is marked with each ID, as seen in Table II.

TABLE II. PCIID FOR GPUMACHINE

PCI ID GPU Type Cuda Cores Memory size

0000:01:00 GTX-1080 2650 8144 MB

0000:02:00 GTX-1080 2650 8144 MB

0000:03:00 GTX-1080 2650 8144 MB

From Table II, it can be seen that the three types of GPU have the same specifications, so it is expected that when getting a job, the load will be balanced. The third performance of the GPU server is seen from the processor workload and RAM while training process. On average, the processor workload utilization for each GPU can be seen in Fig. 4.

We can see that the three GPUs still have not been optimizing the available resources when the batch-size is 8.

While when batch-size is added become 16 and more, the utilization of GPU processors increases.

Fig.4. Average GPU Utilization during training with various batch-size

The addition of input on Residual Network during training not only impact the performance of the processor but also to the memory space availability. Fig.5 presents the average GPU memory utilization during the training.

Fig.5. Average GPU Memory Utilization during training with various batch-size

The detail of GPU utilization for each batch-size is shown in Fig. 6 – 9. To obtain utilization values, we use

“utilization.gpu” and “utilizaton.memory” library. Both of them are library that can be accessed from nvidia-smi as the parameters.

(5)

Fig.6. GPU utilization during training residual network with 8 batch-size

Fig.7. GPU utilization during training residual network with 16 batch- size

Fig.8. GPU utilization during training residual network with 24 batch- size

According to Fig.4, the average utilization of GPU processors 1,2 and 3 is 66%, 61.5%, and 81.5%, respectively. The utilization of GPU processors was still not visible when the ResNet architecture was trained with batch-size 8 and began to appear when the use of larger batch sizes.

Fig. 6 - 9 shows the contribution of GPU processors to computing training with Residual Network. The utilization of RAM during training was also the focus on our study.

Three GPU servers have the advantage of using batch-size numbers. With each RAM size of 8 GB, at least, during the training process around 24GB of RAM can be used to cover a larger batch size

Fig.9. GPU utilization during training residual network with 32 batch-size The size of the histopathological image used is 224x224 pixels. This Residual Network requires the large size of GPU memory. The contribution of each RAM of GPU is shown in Fig. 10-13. Overall, the contribution for each GPU machine is slightly different according to these Figures.

From these figures, we can see that GPU 3 (green color) has a dominant contribution to others.

Fig.10. GPU memory utilization during training residual network with 8 batch-size

The main limitation on GPU is memory availability.

For deep learning, this condition is very crucial. The most portion usage of memory is to load the data when batch- size is defined. Especially in a residual network, we have to prepare more memory availability. Parallelization of our architecture can accommodate this issue.

(6)

.

GPU memory has the main role when the feed-forward process in Residual Network and handling input in the residual block layer. The number of cores of GPU is more than the CPU. However, using one GPU machine cause out of memory when training is running.

C. Acceleration Evaluation

The acceleration of GPU usage is calculated by comparing training time with CPU and multiple GPUs. The CPU specifications used to conduct this training model are intel core i7-5960X 16 cores @ 3.00 GHz and 128 GB of RAM.

Using GPU for Residual Network can accelerate training time. Training time per epoch (in seconds) is described in Table III. Our model is trained for 500 epochs.

It is mean that if we run on CPU (16 cores), it can consume about 14 hours or equivalent with two weeks for training our model on average. Nevertheless, using multiple GPUs only spend about four hours.

TABLE III. AVERAGE TRAINING TIME PER EPOCH (IN SECONDS) Batch

size

Single GPU

multiples GPU

CPU (16 cores)

Speedup

8 59.71 46.56 2584 55.50

16 53.82 25.36 2580 104.66

24 OM 24.65 2570 101.34

32 OM 22.20 2550 114.84

We also try to train our architecture on a single GPU.

According to the table III, it can be understand that the training with 24 and 32 on Single GPU can not be able

because out of Memory (OM). Memory consumption for training mainly for convolutional in architecture, as summarized in Table IV.

TABLE IV. CONVOLUTIONAL OF MEMORY CONSUMPTION

Layer Dimension Memory consumption

Input 3x224x224

150,528

Convolutional 32x224x224 1,605,632

Block Residual 1-3 32x224x224 62,619,648 Block Residual 4-6 64x112x112 31,309,824 Block Residual 7-9 128x56x56 15,654,912 Block Residual 10-12 256x28x28 7,827,456 Dropout 28x28x256 200,704 BatchNormalization 28x28x256 200,704 Activation 28x28x256 200,704 AveragePooling [4,4] 7x7x256 12,544 Flatten 12,544 12,544 FC 1024 1,024 Dropout 1024 1,024

FC 512 512

Dropout 512 512

FC 256 256

Output 2 2

Total 119,798,530

Our architecture has twelve block architecture. Each block residual comprises batch normalization and convolution with the different numbers of the feature map.

For each block in block residual 1 – 3 requires 20,813,216.

Therefore the total memory consumption of this 3 x 20,813,216 = 62,619,648. Memory consumption also needed for other block residuals. According to Table IV, we need 119,798,530 or about 120 MB GPU memory per image in the forward step. A neural network has standard process forward-backward, so it will spend 2 x 120 ≈ 240 MB. We implement various batch-size for the experiment scenarios from 8,16,24, and 32 images. The memory GPU that must be prepared for basic training is shown in Table V.

TABLE V. MEMORY CONSUMPTION BASED ON BATCH SIZE

# batch size Memory consumption 8 batch – size 8 x 240 MB ≈ 1.91 GB 16 batch – size 16 x 240 MB ≈ 3.82 GB 24 batch – size 24 x 240 MB ≈ 5.73 GB 32 batch – size 32 x 240 MB ≈ 7.64 GB

The GTX-1080 is completed by a single 8 GB GPU RAM. During training, GPU RAM also needed for parameter computation, parameter gradient, layer error, augmentation, and layer output. So it’s reasonable why we can not training using single GPU using 24 and 32 batch sizes on a single GPU marked by Out of Memory (OM) as revealed in Table III.

In deep learning, in general, the training model is not a trivial activity. We have to design the network, choose parameter, and tune hyper-parameter and then try to train the model. If the training can finish faster, it will give us information and consideration to improve our model early.

(7)

D. Performance of the model

Performance models are shown with graphs of loss functions and accuracy functions. Based on Fig. 14, we can understand that the model with 8 batches has the best performance. The model evaluated by accuracy function using the validation dataset. The accuracy was calculated by dividing the correct number of the testing dataset to all data tested refers to the equation (3). Meanwhile, the loss function will appear as the optimization of the models refers to equation (4). We can illustrate that during training, the values of loss function are convergence for both training and validation datasets. Meanwhile, for accuracy, convergence values are achieved on training data. However, not all of batch-size is good to obtain the convergence values.

IV. C^ONCLUSION

Parallelization of the model on Residual Network Architecture (ResNet) has an important role in the training process. Parallelization makes training more effective and can be applied during the training process. All three GPU machines have contributed to the computational process and data allocation. More than that, using multiple GPUs give us the occasion to improve our model. According to the validation accuracy, our model is still to be improved to obtain stability of convergence values. Understanding GPU utilization becomes important to consider the trade-off between model performance and computation time during the training process on a Residual Network architecture.

ACKNOWLEDGMENT

This research is funded by Penelitian Terapan Unggulan Perguruan Tinggi (PTUPT) 2019 Ministry of Research, Technology and the High Education Republic of Indonesia with grant number NKB-1691/UN2.R3.1/HKP.05.00/2019.

The author expressed his gratitude to the Directorate of Research and the Community Engagement University of Indonesia. Faculty of Computer Science University of Indonesia and Faculty of Medical to join the discussion and valuable knowledge in this research.

R^EFERENCES

[1] R. L. Siegel, K. D. Miller, and A. Jemal, “Cancer Statistics, 2018,”

CA. Cancer J. Clin., vol. 68, no. 1, pp. 7–30, 2018, doi:

10.3322/caac.21442.

[2] W. K. Moon, I. Chen, J. M. Chang, S. U. Shin, C. Lo, and R. Chang,

“The adaptive computer-aided diagnosis system based on tumor sizes for the classification of breast tumors detected at screening ultrasound,” Ultrasonics, vol. 76, pp. 70–77, 2017, doi:

10.1016/j.ultras.2016.12.017.

[3] J. C. M. Van Zelst et al., “Improved cancer detection in automated breast ultrasound by radiologists using Computer Aided Detection,”

Eur. J. Radiol., vol. 89, pp. 54–59, 2017, doi:

10.1016/j.ejrad.2017.01.021.

[4] M. Saha, R. Mukherjee, and C. Chakraborty, “Computer-aided diagnosis of breast cancer using cytological images: A systematic review,” Tissue Cell, vol. 48, no. 5, pp. 461–474, 2016, doi:

10.1016/j.tice.2016.07.006.

Fig. 14. Graphics of loss and accuracy of Residual Network during training

(8)

[5] X. Du, J. Tang, Z. Li, and Z. Qin, “Wheel : Accelerating CNNs with Distributed GPUs via Hybrid Parallelism and Alternate Strategy,” in Proceedings of the 25th ACM international conference on Multimedia, 2017, pp. 393–401.

[6] V. Campos, F. Sastre, M. Yagües, M. Bellver, X. Giró-I-Nieto, and J. Torres, “Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster,” in International Conference on Computational Science, ICCS 2017, 2017, vol.

108C, pp. 315–324, doi: 10.1016/j.procs.2017.05.074.

[7] E. Smistad, T. L. Falch, M. Bozorgi, A. C. Elster, and F. Lindseth,

“Medical image segmentation on GPUs - A comprehensive review,”

Med. Image Anal., vol. 20, no. 1, pp. 1–18, 2015, doi:

10.1016/j.media.2014.10.012.

[8] A. Eklund, P. Dufort, D. Forsberg, and S. M. Laconte, “Medical image processing on the GPU - Past, present and future,” Med.

Image Anal., vol. 17, no. 8, pp. 1073–1094, 2013, doi:

10.1016/j.media.2013.05.008.

[9] T. Haryanto, H. Suhartanto, and X. Lie, “Past , Present , and Future Trend of GPU Computing in Deep Learning on Medical Images,” in International Conference on Advanced Computer Science and Information Systems (ICACSIS), 2017, pp. 21–28, doi:

10.1109/ICACSIS.2017.8355007.

[10] H. Sharma, N. Zerbe, I. Klempert, O. Hellwich, and P. Hufnagl,

“Deep convolutional neural networks for automatic classification of gastric carcinoma using whole slide images in digital histopathology,” Comput. Med. Imaging Graph., pp. 1–12, 2017, doi: 10.1016/j.compmedimag.2017.06.001.

[11] H. Chen, X. Qi, L. Yu, Q. Dou, J. Qin, and P. Heng, “DCAN: Deep contour-aware networks for object instance segmentation from histology images,” Med. Image Anal., vol. 36, pp. 135–146, 2017, doi: 10.1016/j.media.2016.11.004.

[12] T. Haryanto, I. Wasito, and T. Suhartanto, “Convolutional Neural Network (CNN) for Gland Images Classification,” in International Conference On Information & Communication Technology And System, 2017, doi: 10.1109/ICTS.2017.8265646.

[13] J. Xu, X. Luo, G. Wang, H. Gilmore, and A. Madabhushi, “A Deep Convolutional Neural Network for segmenting and classifying epithelial and stromal regions in histopathological images,”

Neurocomputing, vol. 191, pp. 214–223, 2016, doi:

10.1016/j.neucom.2016.01.034.

[14] Y. Zheng et al., “Feature extraction from histopathological images based on nucleus-guided convolutional neural network for breast lesion classification,” Pattern Recognit., vol. 71, pp. 14–25, 2017, doi: 10.1016/j.patcog.2017.05.010.

[15] H. Kaiming, Z. Xiangyu, R. Shaoqing, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778, doi:

10.1109/CVPR.2016.90.

[16] S. Korolev, A. Safiullin, M. Belyaev, and Y. Dodonova, “Residual and Plain Convolutional Neural Networks for 3D Brain MRI Classification,” in 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), 2017, pp. 835–838, [Online].

Available: https://arxiv.org/pdf/1701.06643.pdf.

[17] S. Wu, J. Xu, S. Zhu, and H. Guo, “A Deep Residual convolutional neural network for facial keypoint detection with missing labels,”

Signal Processing, vol. 144, pp. 384–391, 2018, doi:

10.1016/j.sigpro.2017.11.003.

[18] S. Ren, D. K. Jain, K. Guo, T. Xu, and T. Chi, “Towards efficient medical lesion image super-resolution based on deep resedual network,” Signal Process. Image Commun., vol. 75, no. March, pp.

1–10, 2019, doi: 10.1016/j.image.2019.03.008.

[19] K. Sirinukunwattana, D. R. J. Snead, and N. M. Rajpoot, “A Stochastic Polygons Model for Glandular Structures in Colon Histology Images,” IEEE Trans. Med. Imaging, vol. 34, no. 11, pp.

2366–2378, 2015, doi: 10.1109/TMI.2015.2433900.

[20] K. Sirinukunwattana, S. E. A. Raza, Y. W. Tsang, D. R. J. Snead, I.

A. Cree, and N. M. Rajpoot, “Locality Sensitive Deep Learning for Detection and Classification of Nuclei in Routine Colon Cancer Histology Images,” IEEE Trans. Med. Imaging, vol. 35, no. 5, pp.

1196–1206, 2016, doi: 10.1109/TMI.2016.2525803.

[21] C.C M Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, D. M.

Greg S. Corrado et.al., “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, doi: 10.1038/nn.3331.