Transfer Learning-Based Automatic Detection of Acute Lymphocytic Leukemia
Pradeep Kumar Das Dept. of Electronics and Communication Engineering National Institute of Technology Rourkela
Rourkela, 769008, India Email: [email protected]
Sukadev Meher Dept. of Electronics and Communication Engineering National Institute of Technology Rourkela
Rourkela, 769008, India Email: [email protected]
Abstract—In healthcare, microscopic analysis of blood-cells is considered significant in diagnosing acute lymphocytic leukemia (ALL). Manual microscopic analysis is an error-prone and time- taking process. Hence, there is a need for automatic leukemia diagnosis. Transfer learning is becoming an emerging medical image processing technique because of its superior performance in small databases, unlike traditional deep learning techniques.
In this paper, we have suggested a new transfer-learning-based automatic ALL detection method. A light-weight, highly compu- tationally efficient ShuffleNet is applied to classify malignant and benign with promising classification performance. Channel shuf- fling and pointwise-group convolution boost its performance and make it faster. The proposed method is validated on the standard ALLIDB1 and ALLIDB2 databases. The experimental results show that in most cases, the proposed ALL detection model outperforms Xception, NasNetMobile, VGG19, and ResNet50 with promising quantitative performance.
Index Terms—Acute Lymphoblastic Leukemia, Detection, Classification, Blood Cancer, Transfer Learning, Deep Learning
I. INTRODUCTION
ALL is a rapidly growing life-threatening blood cancer that affects WBCs resulting in lymphoblast [1]–[7]. According to French-American-British (FAB) model, ALL and acute myeloid leukemia (AML) are the two types of acute leukemia [1]–[5], [7], [8]. The exact reason for ALL is unknown. Baby and older people are in ALL high-risk groups. Microscopic blood-cell analysis has an important contribution to the early diagnosis of hematological diseases [3]–[5], [9]. The micro- scopic images of benign and malignant in the standard ALL databases: ALLIDB1 [10] and ALLIDB2 [10] are shown in Fig. 1 and 2, respectively. Here, our goal is to suggest an efficient model for automatic detection of ALL.
Several researchers presented various machine and deep learning-based ALL detection techniques [3], [4], [6]. Mishra et al. [6] suggested a machine learning-based ALL classifica- tion method. Random Forest [4] is used to classifying malig- nant and benign. Yu et al. [13] recommended a multi-stage leukemia classification method to enhance the performance.
However, it is computationally expensive.
Das et al. [4] recommended an SVM [14]-based ALL detection technique. They extracted color, shape, and texture features. They used GLCM [11] and Gray-level run-length matrix [15] to extract effective texture features. Then, they
a b
c d
Fig. 1: Images of ALL-IDB1 database [10]: (a) and (b) Healthy; (c) and (d) ALL.
a b
c d
Fig. 2: Images of ALL-IDB2 database [10]: (a) and (b) Healthy; (c) and (d) ALL.
applied PCA [16] to choose important features. Shafique et al. [17] presented an SVM-based ALL classification model.
Abdeldaim et al. [18] extracted texture, shape, and color features. Gray-scaling-based data-normalization is used to minimize the feature-value gap. Finally, KNN is applied to efficiently classify ALL.
Machine learning methods generally need preprocessing, segmentation, feature extraction, feature reduction, and classi- fication steps. Each stage has a key role in achieving promising overall performance. Thus, it requires proper segmentation of cells. Nevertheless, deep learning techniques don’t require segmentation, unlike machine learning techniques [3], [5], [19]. Anyhow, traditional deep learning techniques need a big
database to train the network properly. Hence, transfer learning is becoming an emerging medical image processing technique because of its superior performance in small databases [3].
Banik et al. [5] developed an automatic ALL detection method using Convolutional Neural Networks (CNNs). In this model, they fused the first and last convolution-layers features. Vogado et al. [3] recommended transfer learning (AlexNet [20], VGG-f [21], or CaffeNet [22])-based feature extraction and SVM-based classification framework. Shahin et al. [19] presented two ALL detection techniques: (i) Deep learning-based WBCsNet, (ii) Transfer learning-based Deep convolutional activation model.
Simonyan and Zisserman introduced the VGG [21] model in which blocks of 3×3 convolution-layers are connected in succession rather than using a higher-order filter. VGG-19 has 19 weighted-layers [21]. Chollet recommended a depthwise- separable-convolution-based Xception network [23]. He et al.
[24] introduced a deep-residual CNN known as ResNet. They gave the idea of skip-connection to mitigate the vanishing- gradient problem. ResNet50 incorporating 50 weighted-layers [24]. Zoph et al. [25] presented an efficient CNN model (Nas- Net) in which ScheduledDropPath regularization is employed to enhance the generalization in this model. Moreover, they suggested a computationally efficient NasNetMobile network [25]. Das et al. [2] suggested three models by modifying ResNet architecture. Then, the best training model is used to extract features. Finally, they applied Logistic regression, SVM, and Random Forest to classify ALL effectively.
The rest of this paper is organized as follows. Section II presents the proposed method. Section III covers the result and discussion, whereas conclusions are discussed in Section IV.
II. PROPOSEDMETHOD
We have suggested a new transfer-learning-based automatic ALL detection method. In transfer learning, a pre-trained model (trained in a source domain) is only fine-tuned in the target domain. That means weights or knowledge are transferred from the source to the target domain. All these above-mentioned transfer-learning models are trained using the large ImageNet database. Here, these models are fine- tuned using the small ALLIDB database. Thus, these models achieve excellent performance. Here, ShuffleNet [26] is used to classify benign and malignant successfully. Channel shuffle for group convolutions and ShuffleNet unit boost its performance and make it faster.
A. Channel Shuffle for Group Convolution
Morden CNNs [21], [27] generally repeated a similar building-block in their architecture. In ResNetXt [27] and Xception [23], group convolution and depthwise separable convolutions are implemented in these building-blocks, respec- tively. It achieves a promising trade-off between computational cost and performance. However, both the models did not fully consider 1 ×1 pointwise convolutions into account though it needs considerable complexity. In ResNetXt, only 3×3
Fig. 3: Channel shuffle with two stack group convolutions (GConv) [26]. (a) Two stack convolution layers within a group; (b) The input channels are entirely related to the output channels (GConv2 collects data from various groups); (c) Channel shuffling among groups.
Fig. 4: ShuffleNet Blocks. (a) Bottleneck-block [24] with depthwise-convolution [23], [28]; (b) ShuffleNet-block with pointwise-group-convolution and channel-shuffle [26]; (c) ShuffleNet block with stride 2 [26].
convolutions are equipped with group-convolution. Thus, the pointwise convolutions adopt 93.4 % multiplication-addition in every residual unit of ResNetXt. This issue can be solved by using the channel-sparse connection (group convolution on 1×1 layers). Group convolutions undoubtedly decrease the computational-cost if every input-channel operates within the corresponding channel-groups. However, when many group- convolutions stack together, a channel output is obtained from very few input channels. Fig. 3 (a) presents two stack- group convolution layers situation, where output is determined depending on some inputs within the group. Thus, every output channel corresponds to input channels within the group, restricting information flow among groups. The input channels are entirely related to the output channels if group convo- lution collects data from various groups (feature groups), as displayed in Fig. 3 (b). This issue can be solved efficiently by introducing channel shuffling, which allows information sharing among groups, as displayed in Fig. 3 (c). The SuffleNet model efficiently generalizes depthwise-separable convolution and group convolution [26].
B. ShuffleNet Unit
An efficient ShuffeleNet unit is introduced by Zhang et al.
[26] that exploits channel shuffling to make a lightweight faster CNN model. The architecture of the ShuffleNet units is shown in Fig. 4.
a b c d e
Fig. 5: Variation of training performce w.r.t epochs in ALL-IDB1 database: (a) VGG-19, (b) NasNetMobile, (c) Xception, (d) ResNet50, and (e) ShuffleNet.
a b c d e
Fig. 6: Variation of training performce w.r.t epochs in ALL-IDB2 database: (a) VGG-19, (b) NasNetMobile, (c) Xception, (d) ResNet50, and (e) ShuffleNet.
TABLE I: ShuffleNet [26] architecture for ALL classification.
Layer Kernel
Size Stride Repeat Output channels with g groups Output Size g =1 g =2 g =3 g =4 g =5
Image 3 3 3 3 3 224×224
Conv1 MaxPool
3×3 3×3
2
2 1 24 24 24 24 24 112×112
56×56
Stage 2 2
1 1 3
144 144
200 200
240 240
272 272
384 384
28×28 28×28
Stage 3 2
1 1 7
288 288
400 400
480 480
544 544
768 768
14×14 14×14
Stage 4 2
1 1 3
576 576
800 800
960 960
1088 1088
1536
1536 7×7
GlobalPool 7×7 1×1
Fully Connected 2 2 2 2 2
Fig. 4 (a) represents a residual bottleneck block that uses 3 ×3 depthwise convolutions on a bottleneck feature-map.
Fig. 4 (b) illustrates the ShuffleNet block that uses pointwise- group-convolution and channel-shuffle instead of 1×1 con- volution. The channel dimension is modified to match the shortcut-path using the second pointwise group-convolution.
Fig. 4 (a) presents ShuffleNet block with stride 2. In this block,3average pooling is introduced on the shortcut-path. It uses channel concatenation instead of element-wise addition to enhancing the channel dimension with a small increase in computational cost. For the same environment, it is relatively faster than ResNet and ResNetXt. The number of FLOPs required in ResNet, ResNetXt, and ShuffleNet for an image of size l×b×wwithmbottleneck channels are given in Eq.
1 to Eq. 3, respectively.
Fr=bw(2lm+ 9m2) (1)
Frxt=bw(2lm+9m2
g ) (2)
Fs=bw(2lm
g + 9m) (3)
Here, g denotes the number of the group in the convolution.
The pointwise-group-convolution and channel-shuffle make ShuffleNet computationally efficient, which is also observed from these equations.
C. Architecture
Table I represents the ShuffleNet [26] architecture that is used for ALL classification. It basically consists of the stacks of Shufflenet blocks grouped into 3 stages. The first building block in every stage has stride 2. Other hyperparameters within any stage remain the same, where the output channels are doubled for the next stage. The number of bottleneck channels
Fig. 7: ROC on ALLIDB1 database (70-30)
Fig. 8: ROC on ALLIDB2 database (70-30)
is fixed to one-fourth of that of output channels, similar to ResNet. In ShuffleNet, g is responsible for controlling connection sparsity of pointwise-convolution [26].
III. RESULTS ANDDISCUSSION
The experimental results are obtained using MATLAB R2020b on Intel (R) Core i5 CPU processor with 12 GB RAM and a 3.40 GHz clock-rate. The proposed method is validated on the standard ALLIDB1 [10] and ALLIDB2 [10] databases.
ALLIDB1 [10] database has 108 images: 59 healthy and 49 ALL affected. ALLIDB2 [10] database has 260 images: 130 healthy and 130 ALL affected. We split both the datasets: (a) 70 % training and 30 % testing (70-30); (b) 50 % training and 50 % testing (50-50).
Here, we have done comparative performance analysis using 10-Fold cross-validation. Table II and Table III give a comparative quantitative performance analysis of the pro- TABLE II: Performance comparison with existing techniques in ALLIDB1 dataset
Method Sensitivity (%) Accuracy (%)
[17] 92.00 93.70
[6] 86.50 96.00
[4] 92.64 96.00
[29] 87.00 96.97
Proposed 98.00 96.97
Fig. 9: ROC on ALLIDB1 database (50-50)
Fig. 10: ROC on ALLIDB2 database (50-50)
posed ShuffleNet-based ALL classification technique with other existing techniques in the ALLIDB1 and ALLIDB2 databases, respectively. From these tables, we notice that the proposed technique delivers superior performance than other existing techniques. In preprocessing step, data argumentation is used to minimize overfitting and to enhance classification performance. In this experiment, for all these transfer-learning models, including ShuffleNet, we select the minimum batch- size as 32 and learning rate as 0.001. All the networks are trained with the number of epochs as six since six epochs are sufficient to deliver very good performance.
Fig. 5 and Fig. 6 demonstrate the performance variation (in terms of accuracy and loss) w.r.t epochs for the ALLIDB1 dataset (splitting with 70-30) and ALLIDB2 dataset (splitting TABLE III: Accuracy comparison of proposed method with other existing methods in ALLIDB2 dataset
Method Data
Normalization Classifier Accuracy (%)
[18] Grey-scalling
Decision Tree 86.81
SVM 92.80
Naive Bayes 89.97
KNN 96.01
[2]
Logistics Regration 96.15
SVM 96.15
Random Forest 96.15
Proposed ShuffleNet 96.67
TABLE IV: Average classification performance on ALLIDB1 database (70-30).
Model Sensitivity (%) Precision (%) Specificity (%) Accuracy (%) F1 Score AUC
VGG19 95.33 86.76 86.11 90.30 0.9084 0.9072
NasNetMobile 76.00 94.62 96.66 85.15 0.8429 0.8406
Xception 62.00 90.00 93.89 80.61 0.7342 0.7561
ResNet50 98.00 91.97 92.78 95.15 0.9489 0.9539
ShuffleNet 98.00 95.63 96.11 96.97 0.9680 0.9706
TABLE V: Average classification performance on ALLIDB2 database (70-30).
Model Sensitivity (%) Precision (%) Specificity (%) Accuracy (%) F1 Score AUC
VGG19 88.45 80.93 78.76 83.59 0.8452 0.8355
NasNetMobile 84.77 94.08 94.31 89.48 0.8918 0.8954
Xception 83.26 91.11 91.72 87.35 0.8701 0.8748
ResNet50 92.90 94.82 94.82 93.85 0.9385 0.9386
ShuffleNet 96.46 96.95 96.90 96.67 0.9670 0.9668
TABLE VI: Average classification performance on ALLIDB1 database (50-50).
Model Sensitivity (%) Precision (%) Specificity (%) Accuracy (%) F1 Score AUC
VGG19 90.00 78.12 75.86 82.26 0.8364 0.8293
NasNetMobile 81.72 79.55 80.69 81.13 0.8062 0.8118
Xception 69.17 86.79 91.72 81.51 0.7698 0.8045
ResNet50 90.83 94.07 95.17 93.21 0.9242 0.9301
ShuffleNet 95.00 94.31 95.17 95.10 0.9465 0.9509
TABLE VII: Average classification performance on ALLIDB2 database (50-50).
Model Sensitivity (%) Precision (%) Specificity (%) Accuracy (%) F1 Score AUC
VGG19 85.27 82.09 81.69 83.46 0.8365 0.8348
NasNetMobile 88.37 78.08 75.58 81.93 0.8291 0.8198
Xception 84.50 74.67 71.76 78.08 0.7928 0.7813
ResNet50 94.57 88.41 87.80 91.16 0.9139 0.9119
ShuffleNet 96.89 88.64 87.80 92.31 0.9258 0.9235
with 70-30), respectively. From these figures, we observe that the performance improvement occurs with epochs. We also observe that in both cases, the SuffleNet outperforms others.
Table IV and Table V show the quantitative performance on the ALLIDB1 and ALLIDB2 databases (70 % training and 30 % testing) with 10-Fold cross-validation, respectively.
Table IV indicates ShuffleNet outperforms over Xception, NasNetMobile, VGG19, and ResNet50. On-the-other-hand, ResNet50 gives the second-best performance. This can also be visualized from the receiver operating characteristic (ROC) curve, shown in Fig. 7. Table V shows ShuffleNet gives better performance on the ALLIDB2 database with the highest precision, sensitivity, accuracy, AUC, F1 score, and sensitivity.
Fig. 8 also displays that ShuffleNet gives the best ROC.
Table VI and Table VII show the quantitative performance on the ALLIDB1 and ALLIDB2 databases (50 % training
and 50 % testing), respectively. Table VI and Table VII indicate ShuffleNet outperforms with the highest sensitivity, precision, accuracy, specificity, AUC, and F1 score. This can also be visualized from the receiver operating characteristic (ROC) curve, shown in Fig. 9 and Fig. 10. ShuffleNet model demonstrates very good performance; however, it may result in misclassification in fewer cases since the rules for shuf- fling are manually set rather than adaptively selecting rules.
From Table VIII, we also observe that ShuffleNet is highly computationally efficient than others.
IV. CONCLUSION
The goal of this work is to build a new transfer-learning- based automatic ALL detection framework. For this purpose, ShuffleNet is employed to classify malignant and benign effi- ciently. The experimental results illustrate that, the ShuffleNet- based ALL classification gives a promising performance with TABLE VIII: Elapsed Time
Model ALLIDB1 (70-30) ALLIDB2 (70-30) ALLIDB1 (50-50) ALLIDB2 (50-50) Xception 8 min 38 sec 19 min 31 sec 4 min 44 sec 14 min 51 sec NasNetMobile 2 min 51 sec 6 min 25 sec 1 min 30 sec 5 min 12 sec
VGG19 5 min 34 sec 12 min 16 sec 2 min 41 sec 9 min 50 sec
ResNet50 2 min 42 sec 5 min 36 sec 1 min 13 sec 4 min 5 sec
ShuffleNet 52 sec 1 min 45 sec 32 sec 1 min 25 sec
the highest sensitivity, precision, accuracy, specificity, AUC, and F1 score in both the databases. It is also highly com- putationally efficient than others due to pointwise-group- convolution and channel-shuffle. In the future, performance can be further enhanced by developing a more efficient model or increasing database size.
REFERENCES
[1] P. K. Das, S. Meher, R. Panda, and A. Abraham, “An efficient blood- cell segmentation for the detection of hematological disorders,”IEEE Transactions on Cybernetics, 2021.
[2] P. K. Das, A. Pradhan, and S. Meher, “Detection of acute lymphoblastic leukemia using machine learning techniques,” in Machine Learning, Deep Learning and Computational Intelligence for Wireless Commu- nication. Springer, 2021, pp. 425–437.
[3] L. H. Vogado, R. M. Veras, F. H. Araujo, R. R. Silva, and K. R.
Aires, “Leukemia diagnosis in blood slides using transfer learning in cnns and svm for classification,”Engineering Applications of Artificial Intelligence, vol. 72, pp. 415–422, 2018.
[4] P. K. Das, P. Jadoun, and S. Meher, “Detection and classification of acute lymphocytic leukemia,” in2020 IEEE-HYDCON, 2020, pp. 1–5.
[5] P. P. Banik, R. Saha, and K.-D. Kim, “An automatic nucleus segmen- tation and cnn model based classification method of white blood cell,”
Expert Systems with Applications, vol. 149, p. 113211, 2020.
[6] S. Mishra, B. Majhi, P. K. Sa, and L. Sharma, “Gray level co-occurrence matrix and random forest based acute lymphoblastic leukemia detec- tion,”Biomedical Signal Processing and Control, vol. 33, pp. 272–280, 2017.
[7] J. M. Bennett, D. Catovsky, M.-T. Daniel, G. Flandrin, D. A. Galton, H. R. Gralnick, and C. Sultan, “Proposals for the classification of the acute leukaemias french-american-british (fab) co-operative group,”
British Journal of Haematology, vol. 33, no. 4, pp. 451–458, 1976.
[8] P. K. Das and S. Meher, “An efficient deep convolutional neural network based detection and classification of acute lymphoblastic leukemia,”
Expert Systems with Applications, vol. 182, p. 115311, 2021.
[9] P. K. Das, S. Meher, R. Panda, and A. Abraham, “A review of automated methods for the detection of sickle cell disease,” IEEE Reviews in Biomedical Engineering, vol. 13, pp. 309–324, 2020.
[10] R. D. Labati, V. Piuri, and F. Scotti, “All-idb: The acute lymphoblastic leukemia image database for image processing,” in 2011 18th IEEE International Conference on Image Processing. IEEE, 2011, pp. 2045–
2048.
[11] R. M. Haralick, K. Shanmugam, and I. H. Dinstein, “Textural features for image classification,” IEEE Transactions on Systems, Man, and Cybernetics, no. 6, pp. 610–621, 1973.
[12] M. E. Tipping and C. M. Bishop, “Probabilistic principal component analysis,”Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 61, no. 3, pp. 611–622, 1999.
[13] W. Yu, J. Chang, C. Yang, L. Zhang, H. Shen, Y. Xia, and J. Sha,
“Automatic classification of leukocytes using deep neural network,” in 2017 IEEE 12th International Conference on ASIC (ASICON). IEEE, 2017, pp. 1041–1044.
[14] C. Cortes and V. Vapnik, “Support-vector networks,”Machine learning, vol. 20, no. 3, pp. 273–297, 1995.
[15] X. Tang, “Texture information in run-length matrices,”IEEE transac- tions on image processing, vol. 7, no. 11, pp. 1602–1609, 1998.
[16] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,”
Chemometrics and intelligent laboratory systems, vol. 2, no. 1-3, pp.
37–52, 1987.
[17] S. Shafique, S. Tehsin, S. Anas, and F. Masud, “Computer-assisted acute lymphoblastic leukemia detection and diagnosis,” in 2019 2nd International Conference on Communication, Computing and Digital systems (C-CODE). IEEE, 2019, pp. 184–189.
[18] A. M. Abdeldaim, A. T. Sahlol, M. Elhoseny, and A. E. Hassanien,
“Computer-aided acute lymphoblastic leukemia diagnosis system based on image analysis,” in Advances in Soft Computing and Machine Learning in Image Processing. Springer, 2018, pp. 131–147.
[19] A. I. Shahin, Y. Guo, K. M. Amin, and A. A. Sharawi, “White blood cells identification system based on convolutional deep neural learning networks,”Computer methods and programs in biomedicine, vol. 168, pp. 69–80, 2019.
[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” inAdvances in Neural Infor- mation Processing Systems, 2012, pp. 1097–1105.
[21] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,”arXiv preprint arXiv:1409.1556, 2014.
[22] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” inProceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 675–678.
[23] F. Chollet, “Xception: Deep learning with depthwise separable convolu- tions,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1251–1258.
[24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[25] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8697–
8710.
[26] X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely effi- cient convolutional neural network for mobile devices,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848–6856.
[27] S. Xie, R. Girshick, P. Doll´ar, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1492–
1500.
[28] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convo- lutional neural networks for mobile vision applications,”arXiv preprint arXiv:1704.04861, 2017.
[29] S. Mishra, B. Majhi, and P. K. Sa, “Glrlm-based feature extraction for acute lymphoblastic leukemia (all) detection,” in Recent Findings in Intelligent Computing Techniques. Springer, 2018, pp. 399–407.