Robust Hepatic Vessel Segmentation using Multi Deep Convolution Network

(1)

Robust Hepatic Vessel Segmentation using Multi Deep Convolution Network

Titinunt Kitrungrotsakul*

¹

, Xian-Hua Han

^{1, 2}

, Yutaro Iwamoto

¹

, Amir Hossein Foruzan

³

, Lanfen Lin

⁴

and Yen-Wei Chen

^1,4

1

Graduate School of Information Science and Engineering, Ritsumeikan University, Japan

2

Institute of Advance Industrial Science and Technology, Japan

3

Biomedical Engineering Department, Faculty of Engineering, Shahed University, Iran

4

College of Computer Science and Technology, Zhejiang University, China

ABSTRACT

Extraction of blood vessels of the organ is a challenging task in the area of medical image processing. It is really difficult to get accurate vessel segmentation results even with manually labeling by human being. The difficulty of vessels segmentation is the complicated structure of blood vessels and its large variations that make them hard to recognize. In this paper, we present deep artificial neural network architecture to automatically segment the hepatic vessels from computed tomography (CT) image. We proposed novel deep neural network (DNN) architecture for vessel segmentation from a medical CT volume, which consists of three deep convolution neural networks to extract features from difference planes of CT data. The three networks have share features at the first convolution layer but will separately learn their own features in the second layer. All three networks will join again at the top layer. To validate effectiveness and efficiency of our proposed method, we conduct experiments on 12 CT volumes which training data are randomly generate from 5 CT volumes and 7 using for test. Our network can yield an average dice coefficient 0.830, while 3D deep convolution neural network can yield around 0.7 and multi-scale can yield only 0.6.

Keywords: Deep learning, Vessel segmentation, Classification, Automatic segmentation

1. INTRODUCTION

It is important to understand the vessel structure in pre-surgical planning and computer-aided diagnostics. [1]. Vessel segmentation methods and techniques vary depending on the image modality, application domain, and other specific factors. Due to the variety dependency, many vessel segmentation methods have been proposed.

Region growing is a well-known semi-automatic pixel/voxel based segmentation method that depends on intensity similarity and spatial proximity. Region growing approach segments images by incrementally recruiting pixels/voxels to a region based on the pre-designed criteria [2,3,4]. This method requires user to supply seed information before it perform segmentation. Due to the variations in noise, intensity and the complicated vessel structure, region growing usually over- or under- segment a target results.

A multi-scale filter approach has been widely used for vessel segmentation in various cases. This approach is a three dimensional efficient automatic method for vessel segmentation that can be used in multiple scans and has better result with a thinner slice. A line enhancement filter with different orientations and scales, for emphasizing cylindrical structure of the vessel are uses in multi-scale approach [5], and then works with other image processing operations like threshold, component analysis, and connected component to complete the segmentation [6]. Y. Sato et al [7] introduced a three- dimensional multi-scale line filter for segmentation and visualization of curvilinear structures in medical images. The method is based on the second derivative of the image using Gaussian kernel that is applied on multi-scale with an adaptive orientation selection using the Hessian matrix.

More numerous methods and techniques have been proposed for vessel extraction, such as intensity ridge traversal [8], tubular filter [9], skeletonization [10], and active contours [11]. Some review can be found in [12].

(2)

patch: sagittal s =29

20 share kernels

patch: coronal s =29

2D Convolution W11

patch: transverse s =29

2D Convolution W11

2D Convolution w11

Max Pooling

2D Convolution W21 40 kernels

Max Pooling

Figure 1. The architecture of our liver vessel segmentation, which separately trains lower feature for each type of input vector and then merges into a higher level features. The first convolutional layer shares the same parameters for all input vectors, while the second layer separately learns parameters.

Recently, deep learning architecture has been demonstrated the powerful ability in computer vision tasks by automatically learn hierarchies of relevant feature directly from the input data. Deep convolutional neural network has been successfully applied for image classification and object detection applications, especially for ImageNet classification competition, which is the most successful network for image classification since 2012 [13].

Motivated by the deep learning development, we propose a deep learning approach for the automated segmentation of liver vessel. The vessel segmentation can be considered as a classification problem. Unlike the conventional 2D image classification, the vessel segmentation from CT volumes is a 3D image classification problem. We need to extract 3D image features in order to improve the segmentation accuracy. Our contribution is that we propose a novel deep neural network (DNN) architecture for vessel segmentation from a medical CT volume, which consists of three deep convolutions neural network to extract features of sagittal plane, coronal plane and transvers (axial) plane, respectively.

This paper is organized as follows. In section 2, we describe our proposed architecture of the network. Experimental results of the proposed architecture and conclusion are described in sections 3 and 4, respectively.

2. ARCHITECTURE OF THE NETWORK

The proposed network architecture is manifested in Figure 1. The network consists of three deep convolution neural networks to extract features of sagittal plane, coronal plane and transverse plane, respectively. The three features (outputs) of convolutional neural networks are merged into a fully connected network. The lowest layers are specific to each type of input vector and aim to detect representation features. The first convolutional layer shares the same parameter for all input vector and the second convolutional layer will separately learn parameter. The representation features are learnt by 2D convolution and pooling layers. At a higher layer of our network, the representation features are merged into a higher level features, which capture complex correlations across the different input vectors which are learnt with fully connected layers.

2.1 Convolutional, and activation function

The aim of convolutional layers is to detect local features at different position in an image. The parameters of the layer consist of a set of learnable kernels, which have small receptive field. In the other hand, a kernel in a given convolutional layer depends only on a spatial contiguous set of the layer inputs. Therefore, each kernel learns a particular local feature specific to its receptive field. The output of kernel k of layer l is then given by

(3)











 +

=

∑

⁻ ^lr

k l k lr k l

r W O b

O φ ⁽ ¹⁾ (1)

where W_k^lr is the weight matrix for kernel r of layer l and kernel k of layer l-1. The scalar bias of the kernel k of layer l is represent as b_r^l.

The activation function denote as φ which we used the same rectifier linear unit (RELU) function for all the neurons of the network except the top layer, the RELU function defined by

) , 0 max(

)

(x = x

φ (2)

RELU function is less prone to the vanishing gradient problem than sigmoid or tanh functions [14]. While the most of neurons in our network use RELU function, the top layer uses a softmax activation function. The softmax function maps the inputs z into [0,1] and the output can be interpreted as probabilities which we select the output with highest probability as a label of a voxel.

∑

=

= K

k z z j

k j

e z e

output

1

)

( (3)

2.2 Pooling layers

After the convolution layers, max-pooling layers are used to reduce the size of the feature by merging group of neurons.

A max pooling layer shifts a 2 × 2 over the feature map result and select only the highest value over each position of the pooling window. The local information is important in vessels segmentation. We design to use only small windows to avoid lost necessary vessel information, but still can reduce over-fitting problem.

2.3 Learning Algorithm

Suppose xⁱis the data vector in the training dataset, Dtrain = {(xⁱ, yⁱ) | i ϵ [1, n]}, where for each data i, yⁱ is the known design output of xⁱ. The performance of the designed network is evaluated using negative log-likelihood error function between the network’s result and the design output, which is defined as follows:

n y x E

n

i

N j

i j i

∑ ∑

j

= = 



 





−

= ¹ ¹

log

(θ) (4)

Where x and y is a network output and designed label, respectively. θ represents all the parameter of the network. The training data is trained by mini-batch method which was carried out by stochastic gradient decent algorithm (SGD) [15].

A momentum term is added into the update function of our network, which can help accelerate SGD in the relevant direction and less oscillations. The update function with momentum m and learning rate α at iteration I is given by

) 1 ( )

( + ∆ −

∂

− ∂

=

∆ m w I

w t E

w _i^lr

lr i lr

i α (5)

(4)

to

^-r

a. b. c. d. e.

Figure 2. Performance comparison between two CNN networks, our three of 2D orthogonal network and 3D patch CNN.

Row indicates different data samples (dataset number 4 and 6). Five columns: a. CT intensity images; b and c. the segmentation results from our network, and 3D patch CNN, respectively. Column d. and e. show the 3D visualization results of both network. Red color voxels denote correct vessel result while green are difference between the ground-truth vessel and the segmented vessel.

3. EXPERIMENTAL RESULTS

We evaluate our method using 12 CT volumes, where most CT image have tumors, liver abnormalities, and different intensity ranges as shown in Figure 2. All computations had to be run in NVIDIA GeForce GTX TITAN X GPU with 12GB memory. The training set has randomly extracted from the voxel that have value more than designed threshold T = mean - variance in vesselness image without liver’s boundary while the test set will use all voxel that contain vesselness value (T > 0) but still eliminate the boundary of liver. The total of training set has a total of 250k voxels from 5 CT volumes. Each voxel, we extracted a 2523 dimensional input vector consisting of three patches of 2D orthogonal patches of 29 × 29 voxels. The size of convolution kernels was set to 5 × 5 and the pooling windows were set to 2 × 2. The learning rate and momentum were set to 0.0005 and 0.5, respectively. We used a batch size of 400 data points. We applied patch connectivity as a post processing to remove noise voxel and connect the fragment vessel.

We compare vessel segmentation result generated by our deep learning network by feeding inputs as three of 2D orthogonal with 3D patch CNN. We also compare all result with and conventional multi-scale filter. The size of 2D patch was set to 29 × 29, while 3D patch was set to 15 × 15 × 15. The evaluation measures used in this experiment is the mean performance curve (DICE coefficient) which given by

|

| 2

B A

B DICE A

+

= ∩ (6)

Figure 3 show that three of 2D patch uses a few parameters than 3D patch. but can give us better results. Our network can yield an average dice coefficient 0.830, while 3D deep convolution neural network can yield around 0.7. The main problem of 3D patch is lack of global information while increase the size of patch will affect the size of model and computation time. More comparison detail with conventional multi-scale filter, the best result can achieve only 0.68, and lowest result is 0.48. Our best network of three patch 2D CNN can reaches better result in all best and lowest result which is 0.88, and 0.76, respectively.

(5)

1

0.8

W 0.6

u o

0.4 0.2 0

1 2

Three 2D patches CNN 3D patch CNN

MultiScale

3 4

Dataset

5 6 7

Figure 3. Result comparison between our proposed networks with 3D CNN and multi-scale method.

4. CONCLUSION

In this work we proposed the design of deep convolutional neural network for automatic vessel segmentation in liver region image. The proposed network consists of three input vectors from each patch (sagittal, coronal and transversal plane). The separated deep network of each patch are combine at the top layer. This concept can reduce the size of network and get more accurate result than 3D CNN. The effectiveness of the proposed network is illustrated by 12 medical data; 5 for training phase and 7 for testing phase.

Robust intensity and full image segmentation are our future researches. The network still not support a robust to intensity change which our network will fail to segment a data that have difference intensity range than trained model. The network currently working based on patch based method that take much computation time than using whole image segmentation.

ACKNOWLEDGEMENT

This work is supported in part by Grant-in- Aid for Scientific Research from the Japanese Ministry for Education, Science, Culture, and Sports (MEXT) under Grant Nos. 24300076, 15H01130, 15K00253, 15K16031, 26289069 and 25280044; in part by the MEXT Support Program for the Strategic Research Foundation at Private Universities (2013- 2017); and in part by the R-GIRO Research Fund from Ritsumeikan University. This work is also supported by Japan Society for Promotion of Science (JSPS) under Grant No. 16J09596.

(6)

REFERENCES

[1] Z-P Liang and PC Lauterbur, "Principles of Magnetic Resonance Imaging: a signal processing perspective,"

SPIE Optical Engineering Press, 416 (2000).

[2] J. F.O' Brien, and N. F. Ezquerra, "Automated Segmentation of Coronary Vessels in Angiographic Image Sequences Utilizing Temporal, Spatial and Structural Constraints," GVU Technical Report; GIT-GVU-94-30, (1994).

[3] W.E. Higgins, W.J.T. Spyra, and E.L. Ritman, "Automatic extraction of the arterial tree from 3-D angiograms,"

IEEE Trans. Engineering in Medicine and Biology Society 2, 563-564 (1989)

[4] Y. Masutani, K. Masamune, and T. Dohi, "Region-growing based feature extraction algorithm for tree-like objects," Proc. of Visualization in Biomedical Computing, 161-167 (1993)

[5] A.F. Frangi, W.J. Niessen, K.L. Vincken, and M.A. Viergever, "Multiscale vessel enhancement filtering,"

MICCAI 1496, 130-137 (1998)

[6] Y. Shang, R. Deklerck, E. Nyssen, A. Markova, J.D, Mey, X. Yang, and K. Sun, "Vascular Active Contour for Vessel Tree segmentation," IEEE Trans. on Biomedical Engineering 58, 1023-1032 (2011)

[7] Y. Sato, S. Nakajima, H. Atsmi, T. Koller, G, G. Gerig, and R. Kikinis, "Three-dimensional multi-scale line filter for segmentation and visualization of curvilinear structures in medical images," Medical Image Analysis 2, 143-169 (1998)

[8] S.R. Aylward, and E, Bullitt,"Initialization, noise, singularities, and scale in height ridge traversal for tubular object centerline extraction," IEEE Trans. Med. Image 21, 61-75 (2002)

[9] J. V. B. Soares, J. J. G. Leandro, R. M. Cesar, H. F. Jelinek, and M. J. Cree, "Retinal vessel segmentation using the 2-D GaborWavelet and supervised classification," IEEE Trans. Med. Image. 25, 1214-1222 (2006)

[10] P. J. Yim, P. L. Choyke, and R. M. Summers, "Gray-scale skeletonization of small vessels in magnetic resonance angiography," IEEE Trans. Med. Image. 19, 568-576 (2000)

[11] L. M. Lorigo, O. D. Faugeras, W. E. L. Grimson, R. Keriven, R. Kikinis, A. Nabavi, and C. F. Westin,

"CURVES: Curve evolution for vessel segmentation," Med. Image Analysis. 5, 195–206 (2001)

[12] J. S. Suri, K. C. Liu, L. Reden, and S. Laxminarayan, “A review on MR vascular image processing algorithms:

Skeleton versus nonskeleton approaches-Part II,” IEEE Trans. Inf. Tech. Biomed., 6, 338–350 (2002)

[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton. "Imagenet classification with deep convolutional neural networks". In Advances in neural information processing systems, 1097–1105 (2012)

[14] X. Glorot, A. Bordes, and Y. Bengio. "Deep sparse rectifier networks". In Proc. of 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP Volume15, 315–323 (2011)

[15] O. Bousquet and L. Bottou. “The tradeoffs of large scale learning”. In Advances in neural information processing systems, 161–168 (2008)