Accelerating Convolutional Neural Network Training for Colon Histopathology Images by Customizing Deep Learning Framework *

(1)

Accelerating Convolutional Neural Network Training for Colon Histopathology Images by Customizing Deep Learning Framework ^*

Abstract Methodology Result

Conclusion & Future directions

Conventional vs Automatic Detection

Histopathology images

Image Classification

gurcan et al 2009

IMAGES

!=

conventional method

expert pathologist

automatic method

problem

- human dependency - error prone

- time consuming

chen et.al 2016

Automatic Feature Extraction General Feature Extraction

Good Performance for Classification

Data set

v Colon cancer data

v 32x32, 64x64, 128x128, 180x180, 200x200 pixel, 0.68 µm/pixel v annotated

v 941 benign, 910 malignant

Random subsample

200 x 200 775 x 522 pixel

Random subsample

…

… … …

32x32

200 x 200

Benign Malignant

…

data pre-processing, data augmentation

CNN Architecture v Image augmentation

v normal, random rotation, flip (H/V) , zoom, shear

CNN Architecture Design

Significant image size to perform GPU efficiency

GTX-980 speed-up analysis: 10, 20, 50, 100 and 150 batch-size

CNN parameter

Trade off : Training Time Expensive .: Convolutional Neural Network ::.

Toto Haryanto

^a

, Heru Suhartanto

^a

, Aniati Murni

^a

, Kusmardi

^b

email : [email protected], [email protected], [email protected], [email protected]

a

Faculty of Computer Science, Universitas Indonesia, Depok, 16424, West Java, Indonesia,

b

Faculty of Medicine, Universitas Indonesia, Depok, 16424, West Java, Indonesia.

Cancer diagnose based on the histopathology images still have some challenges recently. The variation of images, high resolution of images, different pattern of cell on the images have

potential contribute to miss-classification. Convolutional neural network has widely used in image processing with its ability to extract and classify an object. Applying CNN on high

resolution of images cause cost intensive in training process. For that, Graphics Processing Unit (GPU) has important role to speed-up the process. However, the problem in GPU is the

limitation of memory size. This work focuses on the way to utilize the GPU memory in the

training of CNN architecture. For training CNN architecture, NVIDIA GTX-980 are accelerated by customizing CUDA memory allocation from cnmem library. In the experiment, the parameter of cnmem are chosen from 0.6, 0.8, 1, 2 and and the best value will be used to the next

training. To enrich the dataset, augmentation such as rotation, zoom, shear and flip are

conducted. Some optimization technique are applied experimentally to determine the best model to classify two classes of cancer, benign or malignant. We use image variation from 32x32, 64x64, 128x128, 180x180 and 200x200 . In the training, a number of batch-size is selected experimentally from 10, 20, 50, 100 and 150. Our finding is that enabling cnmem with parameter 1 is selected as the best value. The 200x200 images show the most significant efficiency of GPU performance when training CNN. Speed-up are measure by comparing

training time of GTX-980 with CPU core i7 machine from 16, 8, 4, 2 cores and the single core.

The highest speed-up GTX-980 obtained with enabling cnmem are 4.49, 5.00, 7.58, 11,97 and 16.19 compare to 16, 8, 4, 2 and 1 core processor respectively

Objective

Accelerate NVIDA GPU GTX-980 for training CNN Analysis of Performance

64x64 120x120 180x180

v Mini batch-size : 10, 20, 50, 100, 150

v Optimizer : rmsprop, sgd, adam and adamx

CPU Environment :

GPU GTX-980 Environment : v Intel ® Core™ i7-5960X

v Number of core : 16 cores @3GHz v RAM size : 65GB

v Number of Cores : 2048 v Clock speed : 1126 Mhz v GPU memory : 4GB

v Memory interface width : 256 bit

4,493,62 4,273,50 3,89 3,76 3,66

2,85 2,61 2,64

0,00 2,00 4,00 6,00 8,00 10,00 12,00

10 20 50 100 150

speed-up (gpu vs 16 cores)

speedup(cnmem=1) speedup(cnmem=0)

5,004,03 4,743,89 4,663,41 4,593,18 4,433,20

0,00 2,00 4,00 6,00 8,00 10,00 12,00

10 20 50 100 150

7,34 7,58 7,20 6,92 6,66

5,91 6,22

5,27 4,80 4,81

0,00 2,00 4,00 6,00 8,00 10,00 12,00

10 20 50 100 150

11,43 11,97 11,56 11,02

10,22

9,21 9,82

8,47 7,64 7,38

0,00 2,00 4,00 6,00 8,00 10,00 12,00

10 20 50 100 150

15,49 16,19 16,06 16,05 15,75

12,48 13,29

11,76 11,14 11,38

-1,00 1,00 3,00 5,00 7,00 9,00 11,00 13,00 15,00 17,00

10 20 50 100 150

speed-up (gpu vs 1 core)

speedup(cnmem=1) speedup(cnmem=0) 0

1000 2000 3000 4000 5000 6000 7000 8000

32x32 64x64 128x128 180x180 200x200

pixel-size Training time (seconds)

cnmem=1 cnmem=0

The figure describe training time (in seconds) for several image size for batch-size = 150. the figure also compare between enabling of cnmem.

By enabling cnmem

(cnmem=1), training time is faster.

The figures are speed-up analysis of training CNN using 200x200 pixel image. The speed- up measured by comparing between training time via CPU and GPU GTX-980 based on the formula :

) (

p s

t speedup = t

Speed-up measured between GTX-980 and some scenario of multi-processors

v Performance of GPU GTX-980 for histopathology can be

exploited by customizing and select optimal value of cnmem v Multiple GPU can be apply for next research

*This poster has been published in the Proceedings of the Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019)