Accelerating Convolutional Neural Network Training for Colon Histopathology Images by Customizing Deep Learning Framework *
Abstract Methodology Result
Conclusion & Future directions
Conventional vs Automatic Detection
Histopathology images
Image Classification
gurcan et al 2009
IMAGES
!=
conventional method
expert pathologist
automatic method
problem
- human dependency - error prone
- time consuming
chen et.al 2016
Automatic Feature Extraction General Feature Extraction
Good Performance for Classification
Data set
v Colon cancer data
v 32x32, 64x64, 128x128, 180x180, 200x200 pixel, 0.68 µm/pixel v annotated
v 941 benign, 910 malignant
Random subsample
200 x 200 775 x 522 pixel
Random subsample
…
… … …
32x32
200 x 200
Benign Malignant
…
data pre-processing, data augmentation
CNN Architecture v Image augmentation
v normal, random rotation, flip (H/V) , zoom, shear
CNN Architecture Design
Significant image size to perform GPU efficiency
GTX-980 speed-up analysis: 10, 20, 50, 100 and 150 batch-size
CNN parameter
Trade off : Training Time Expensive .: Convolutional Neural Network ::.
Toto Haryanto
a, Heru Suhartanto
a, Aniati Murni
a, Kusmardi
bemail : [email protected], [email protected], [email protected], [email protected]
a
Faculty of Computer Science, Universitas Indonesia, Depok, 16424, West Java, Indonesia,
b
Faculty of Medicine, Universitas Indonesia, Depok, 16424, West Java, Indonesia.
Cancer diagnose based on the histopathology images still have some challenges recently. The variation of images, high resolution of images, different pattern of cell on the images have
potential contribute to miss-classification. Convolutional neural network has widely used in image processing with its ability to extract and classify an object. Applying CNN on high
resolution of images cause cost intensive in training process. For that, Graphics Processing Unit (GPU) has important role to speed-up the process. However, the problem in GPU is the
limitation of memory size. This work focuses on the way to utilize the GPU memory in the
training of CNN architecture. For training CNN architecture, NVIDIA GTX-980 are accelerated by customizing CUDA memory allocation from cnmem library. In the experiment, the parameter of cnmem are chosen from 0.6, 0.8, 1, 2 and and the best value will be used to the next
training. To enrich the dataset, augmentation such as rotation, zoom, shear and flip are
conducted. Some optimization technique are applied experimentally to determine the best model to classify two classes of cancer, benign or malignant. We use image variation from 32x32, 64x64, 128x128, 180x180 and 200x200 . In the training, a number of batch-size is selected experimentally from 10, 20, 50, 100 and 150. Our finding is that enabling cnmem with parameter 1 is selected as the best value. The 200x200 images show the most significant efficiency of GPU performance when training CNN. Speed-up are measure by comparing
training time of GTX-980 with CPU core i7 machine from 16, 8, 4, 2 cores and the single core.
The highest speed-up GTX-980 obtained with enabling cnmem are 4.49, 5.00, 7.58, 11,97 and 16.19 compare to 16, 8, 4, 2 and 1 core processor respectively
Objective
Accelerate NVIDA GPU GTX-980 for training CNN Analysis of Performance
64x64 120x120 180x180
v Mini batch-size : 10, 20, 50, 100, 150
v Optimizer : rmsprop, sgd, adam and adamx
CPU Environment :
GPU GTX-980 Environment : v Intel ® Core™ i7-5960X
v Number of core : 16 cores @3GHz v RAM size : 65GB
v Number of Cores : 2048 v Clock speed : 1126 Mhz v GPU memory : 4GB
v Memory interface width : 256 bit
4,493,62 4,273,50 3,89 3,76 3,66
2,85 2,61 2,64
0,00 2,00 4,00 6,00 8,00 10,00 12,00
10 20 50 100 150
speed-up (gpu vs 16 cores)
speedup(cnmem=1) speedup(cnmem=0)
5,004,03 4,743,89 4,663,41 4,593,18 4,433,20
0,00 2,00 4,00 6,00 8,00 10,00 12,00
10 20 50 100 150
speed-up (gpu vs 8 cores)
speedup(cnmem=1) speedup(cnmem=0)
7,34 7,58 7,20 6,92 6,66
5,91 6,22
5,27 4,80 4,81
0,00 2,00 4,00 6,00 8,00 10,00 12,00
10 20 50 100 150
speed-up (gpu vs 4 cores)
speedup(cnmem=1) speedup(cnmem=0)
11,43 11,97 11,56 11,02
10,22
9,21 9,82
8,47 7,64 7,38
0,00 2,00 4,00 6,00 8,00 10,00 12,00
10 20 50 100 150
speed-up (gpu vs 2 cores)
speedup(cnmem=1) speedup(cnmem=0)
15,49 16,19 16,06 16,05 15,75
12,48 13,29
11,76 11,14 11,38
-1,00 1,00 3,00 5,00 7,00 9,00 11,00 13,00 15,00 17,00
10 20 50 100 150
speed-up (gpu vs 1 core)
speedup(cnmem=1) speedup(cnmem=0) 0
1000 2000 3000 4000 5000 6000 7000 8000
32x32 64x64 128x128 180x180 200x200
pixel-size Training time (seconds)
cnmem=1 cnmem=0
The figure describe training time (in seconds) for several image size for batch-size = 150. the figure also compare between enabling of cnmem.
By enabling cnmem
(cnmem=1), training time is faster.
The figures are speed-up analysis of training CNN using 200x200 pixel image. The speed- up measured by comparing between training time via CPU and GPU GTX-980 based on the formula :
) (
) (
p s
t speedup = t
Speed-up measured between GTX-980 and some scenario of multi-processors
v Performance of GPU GTX-980 for histopathology can be
exploited by customizing and select optimal value of cnmem v Multiple GPU can be apply for next research
*This poster has been published in the Proceedings of the Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019)