https://doi.org/10.1007/s12652-020-02478-x ORIGINAL RESEARCH
Molecular cancer classification method on microarrays gene expression data using hybrid deep neural network and grey wolf algorithm
AliReza Hajieskandar1 · Javad Mohammadzadeh1 · Majid Khalilian1 · Ali Najafi2
Received: 23 December 2019 / Accepted: 14 August 2020
© Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract
Gene selection methods are critical in cancer classification, which depends on the expression of a small number of biomarker genes, which have been a significant issue of enormous recent studies. Microarray technology allows generating tumors gene expression datasets. Cancer classification based on these datasets commonly has a kind of small sample size against the number of genes involved and includes multiclass categories. In this paper, grey wolf algorithm was used for extracting notable features in the pre-processing stage, and deep neural network (DNN) was used as deep learning for improving the accuracy degree of cancer detection from three datasets, i.e., STAD (Stomach adenocarcinoma), LUAD (lung adenocarci- noma) and BRCA (breast invasive carcinoma). The proposed method achieved the highest accuracy for these three datasets.
The proposed method was able to achieve accuracy close to 100. Furthermore, the proposed method was compared with linear support vector machine classification, RBF, the nearest neighbor, linear regression, one vs. all, Naive Bayes, and decision tree algorithms. The proposed method had 0.57 improvement on the LUAD dataset, 1.11 optimization on the STAD dataset, and 0.78 development on the BRCA dataset.
Keywords Cancer classification · DNA Microarray · Deep neural networks · Grey wolf algorithm · Deep learning
1 Introduction
Cancer is a multifactorial and complex disorder, mostly caused by acquired mutations and epigenetic alterations that affect gene expression. Accordingly, most of the can- cer investigations focus on the identification of genetic bio- markers that can be used to precisely diagnose and effective
treatment (Butterfield et al. 2017; Knudson 2000). 90%
of human cancers have an epithelial origin, which shows aneuploidy, deletions, duplications, and genetic instability.
These complexities probably explain the clinical diversities of similar tumor tissues and the need for a comprehensive understanding of the genetic changes in tumors (Chen et al.
2005; Gray and Collins 2000).
The initial human genome sequence has led to the iden- tification of genetic complexities of the common cancers using advanced technologies. Now, there are high-through- put technologies to identify all the cancer abnormalities at DNA, RNA, and protein levels. The gene expression studies in human cancers can identify different genetic biomarkers in malignant transformations. Conventionally, these studies are limited to assess a few genes at a time. However, there are high-throughput methods available, especially DNA microarray and RNA-Seq, for a comprehensive analysis of RNA expression (expression profile) in human tumor sam- ples, which is a significant breakthrough (Young 2000).
Microarray technology is considered as a promising opportunity that can be used for analyzing and investigating thousands of gene expression cancer profiles and its related
* Javad Mohammadzadeh [email protected] AliReza Hajieskandar [email protected] Majid Khalilian [email protected] Ali Najafi
1 Department of Computer Engineering, Karaj Branch, Islamic Azad University, Karaj, Iran
2 Molecular Biology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
issues (Bunz 2016). This technology has made it possible to figure out cell behavior at the molecular level. It can produce gene expression data on a large scale. Thus, large amounts of genetic data are generated, which can be quickly and pre- cisely analyzed and managed. Furthermore, several statis- tical processes and machine learning algorithms are used, which leads to the emergence of an analytical perspective (Dwivedi 2018).
Since tumor scanning is harmful to the human body, but the microarray method is harmless, the clinicians used the microarray method for cancer detection. During the experimental process of preparing a microarray dataset, large amounts of data are produced, and this creates a great challenge for analyzing gene expression alteration data.
The data have high dimensions, and the number of sam- ples is small. However, only a few genes are detected that can play a significant role in cancer differentiation (Tabakhi 2015; Motieghader et al. 2017; Ram et al. 2017). Thus, gene selection methods are critical in this area. In general, gene selection is a crucial pre-processing stage in cancer clas- sification, which is aimed at destroying inappropriate and inactive genes. In this way, it improves an understanding of classification models with the highest degree of accuracy and precision. Also, in most cases, finding a small subset of biomarker genes is critical (Zhou et al. 2015; Varadharajan 2018; Moslehi and Haeri 2019; Chen et al. 2016).
In this paper, the grey wolf algorithm was used for the pre-processing stage and selecting useful genes for cancer detection. Also, a 15-layer deep neural network algorithm was used for classifying selected genes to detect cancer (Agrawal and Agrawal 2015; Tavakoli 2019). Then, the proposed hybrid method was applied to three cancer data- sets, named STAD, LUAD, and BRCA. Also, the proposed method was compared to the following classification algo- rithms: linear support vector machine, RBF, the closest neighbor, linear regression, One VS all, Naive Bayes, and decision tree. The results of the proposed method indicated its more desirable performance in comparison to the other methods. The significant contributions of this research study are as follows:
• A Grey wolf algorithm is used as a pre-processor for selecting significant features out of all features. Using a meta-heuristic algorithm such as the grey wolf can con- tribute to the selection of useful optimal features. The optimal choice of significant features can facilitate the attainment of optimal response and operation improve- ment.
• The main algorithm is based on deep learning, DNN neu- ral network with a 15-layer ReLU non-linear activation function which consists of a drop-out layer, dense layer, and fully connected layer.
• Using deep learning and grey wolf algorithm optimizes the generalization of a problem to other problems.
The following section of the paper is organized as fol- lows: Sect. 2 clarifies deep neural network and grey wolf meta-heuristic algorithm; Sect. 3 gives a brief review of the related works about the above-mentioned research problem.
Section four introduces and describes the proposed method.
Section 5 reports the simulation results and discusses a com- parison of the proposed method to the other methods, and finally, Sect. 6 concludes the paper.
2 Background
In this section, a brief review of DNN and grey wolf algo- rithm are given.
2.1 Deep neural network
Artificial neural networks (ANN) are novel computational methods and systems for machine learning, demonstrating knowledge, and applying the obtained knowledge for the prediction of output responses from complicated systems.
The rationale behind these networks has been inspired by the mechanism of the biological neural network for pro- cessing data and information so that information is learned, and knowledge is acquired. The key component of ANNs is the generation of new structures for the information pro- cessing system. This system consists of a large number of extraordinary processing elements, i.e., neurons, which are attached and connected to each other. For solving a prob- lem, neurons operate in harmony with each other, and they transmit information through synapses. In these networks, if a cell is damaged, the other cells can compensate for its absence; they also work together for amending and recov- ering it. ANNs are able to learn things. For example, by applying burn in the neural cells of touch organs, cells learn not to move towards hot objects; via this algorithm, the system is trained to amend its mistake. Learning in these systems is done adaptively; that is, through using samples and instances, the weights of synapses are altered in such a way that the system generates the right response in case new inputs are given. Deep learning, which is also referred to as profound machine learning, in-depth structure learning, or hierarchical learning, is considered to be a sub-branch of machine learning based on a set of algorithms. It is aimed at clarifying high-level abstract notions in the data. It uses a deep graph with several processing layers, which includes several layers of linear and non-linear conversions for clari- fying the process. In other words, deep learning is based on demonstrating knowledge and features in layers.
In general, deep learning is a sort of machine learning algorithm that uses several layers for extracting high-level features from raw input (Wilmott 2019). Figure 1 depicts the structure of ANNs. As shown in this figure, hidden layers indicate the deepness of the analysis in ANNs. Such net- works are referred to as DNN (deep neural network). DNN is the most important structure and technique of deep learn- ing. Deep learning or profound machine learning denotes profound structure learning or hierarchical learning (Aggar- wal 2018). DNN is a sub-branch of machine learning, and it is based on a set of algorithms. As mentioned above, it is intended to clarify high-level abstract notions (Aggarwal 2018). For example, in image processing, lower layers can detect edges; however, higher layers can detect meaningful aspects such as faces and words (Wilmott 2019). By capital- izing on DNN and deep machine learning in this study, we intended to identify and classify cancer.
2.2 Grey wolf algorithm
Grey wolf algorithm (GWO) is a meta-heuristic algorithm that has been inspired by nature. The rationale behind this algorithm is the hierarchical structure and social behav- ior of wolves at hunting time. GWO is a population-based algorithm, and it has a simple adjustment and arrangement procedure. Four types of grey wolves, such as Alpha, Beta, Delta, and Omega, are used for simulating leadership hier- archy. Three main steps of hunting, i.e., search for prey, sur- rounding prey, and attacking prey, are executed (Mirjalili et al. 2014). The hierarchical structure and social behavior of wolves are as follows: grey wolves are at the head of the food continuum and have social lives; the average number of wolves in each wolf pack ranges from 5 to 12. There are four ranks in each wolf pack as follows:
• Alpha wolves (alpha): these wolves are referred to as leader wolves that may be male or female. They dominate packs and manage issues such as rest and the manner of hunting. Nonetheless, besides the dominant and superior behavior of alpha wolves, a sort of democratic structure is observed within wolf packs.
• Beta wolves (beta): beta wolves help alpha wolves in the decision-making process, and they are likely to replace alpha wolves in the future.
• Delta wolves (delta): delta wolves are at a lower rank than beta wolves that include old wolves, hunting wolves, and wolves that look after wolfing.
• Omega wolves (omega): these wolves have the lowest rank within the hierarchy in their packs. They have the smallest rights in comparison to the other members of the group. They ought to have food after the other wolves and have no roles in the decision-making process.
Then we Calculate the fitness of each search agent as follows:
• Xα = the best search agent
• Xβ = the second-best search agent
• Xδ = the third best search agent
Figure 2 shows the flowchart of this algorithm.
3 Related work
In the early 1990s, according to research findings, if a smart ANN-based structure can be used for identifying and classifying anomalies in medical images, not only iden- tification rate is enhanced, but also fewer false-positive
Fig. 1 An example of the ANNs structure (Aggarwal, 2018)
detections (FP) in comparison to image processing are obtained in identical problems (Lo and Lou 1995).
SCAD (simultaneous clustering and distinction) algo- rithm was proposed by Frigui and Nasraoui. This algo- rithm is trained for simultaneously weighing features and determining the weight differences of datasets during the clustering process. SCAD algorithm has two merits:
firstly, it divides the dataset into more meaningful clusters.
Secondly, it can be used as part of a more complicated learning system for improving learning behavior. SCAD was developed for clustering text documents (dependent on the weight of word set), which operates better than traditional clustering. This algorithm may also be used for selecting significant genes in gene expression data.
However, research studies indicate that SCAD does not operate well in this field, which is due to not considering different weights of each feature within each cluster (local adaptive distance). In some cases, local adaptive distance my not be appropriate because it leads to falling into local optima and indefinite solution (Frigui and Nasraoui 2000).
Wang et al. applied another algorithm for identify- ing microarrays in DNA form simulating selection from among a thousand genes. To extract optimal information from genes, they used a correlation-based method. Then, they used classification algorithms such as SVM, decision tree, and Naive Bayes for selecting the respective gene class (Wang et al. 2005).
For classifying DNA microarrays and detecting can- cer, Sharma and Paliwal used small scale sampling (SSS), standard learning algorithms implemented on a database of different genes, and LDA algorithm. Using LDA for extracting genomic information, they maintained that this method produces outstanding results for extracting data (Sharma and Paliwal 2008).
Mishra et al. proposed another algorithm in which a meta- heuristic method is applied for classifying DNA microar- rays and detecting cancer. They used PSO (particle swarm optimization) and SA (simulated annealing) for achieving the above-mentioned objective. The obtained results from implementing PSO-FLANN indicated that the possibility of evaluating and identifying the genome by this method is high (Mishra et al. 2012).
A smart evolutionary algorithm, namely IDGA (immune detector generation algorithm), was developed by Dashtban and Balafar, which is based on genetic algorithm and arti- ficial intelligence (Dashtban and Balafar 2017). It includes two stages: the first stage applies a method based on scoring for reducing dimensions and, more importantly, for provid- ing significant statistical genes for the next stage. The second stage is to use Fisher and Laplace scoring methods (Yang et al. 2016). The performance of the Fisher score and its power against noise has been demonstrated and proved (Oly- aee et al. 2013).
Fakoor et al. introduced a two-stage method based on feature extraction, dimension reduction via PCA, and clas- sification via an autoencoder. The results experimented on a number of datasets. The accuracy degree of this method on the AML dataset was 95.15 (Fakoor et al. 2013).
With regard to this research issue, Hui Chen et al. devel- oped a method and assumed that the weight of a feature for all clusters (global adaptive distance) is identical. In this algorithm, a kernel-based clustering method for gene selec- tion (KBCGS) was presented in which different weights are assigned to different genes, and each class is regarded as a known cluster. Then, the most desirable weight of genes is obtained by minimizing the clustered target function. In the clustering process, the variance is measured through the sum of Euclidean distances between samples and clustered
Fig. 2 The flowchart of GWO algorithm
centers. Variance for each gene is computed separately by using kernel functions. Finally, top genes are ranked and selected for displaying cancer classification (Chen et al.
2016).
Abdul Zaher et al. used a deep belief network-based method on Wisconsin Breast Cancer. The accuracy degree of this method was 99.69 (Abdel-Zaher and Eldeib 2016).
Lingo Gao et al. proposed another algorithm. Like other feature selection algorithms (Peng et al. 2005), it is a hybrid algorithm which is aimed at achieving high performance.
In this algorithm, gene selection is carried out in 2 stages by IG and SVM algorithms. IG is one of the ranking filter methods which analyzes the correlation between features and classes. Also, SVM is a machine learning algorithm based on structural error minimization. It has high classi- fication performance because it is capable of minimizing global optimization and generalizability in comparison to traditional classifiers (Gao et al. 2017).
In a technical research study (Liu et al. 2018), Liu et al.
compared KBCGS with SCAD method and found the fol- lowing differences:
• SCAD is an unsupervised learning algorithm, but KBCGS is a supervised learning algorithm.
• SCAD uses local adaptive distance, which leads to falling into local optima; however, KBCGS uses global adaptive distance, and it can predict the problem.
• SCAD variance is simply measured by using Euclidean distance. It is measured when the set of data is linearly separated. However, variance in the KBCGS method is measured by using the kernel. It can generate non-lin- ear hybrid levels from among clusters. The introduced changes can optimize performance in this area. KBCGS method makes it possible to use adaptive distances, which change in each step of the iteration. This type of mutual measurement is appropriate for learning the weight of features throughout the clustering process and optimizing a section of algorithm performance. Further- more, the KBCGS algorithm is simple to implement, and there is no need to amend or optimize parameters on each dataset.
The results are given in Table 1 indicate that Chen et al.’s method is more efficient and faster than other feature selec- tion methods. Also, it is a free method with a free parameter.
Each method has its own specifications, which impacts on the consistency of the results. Moreover, Laplace score analysis reveals its competitive performance in identifying predictive genes. Although the Laplace score is unsupervised, which depends on the structure of a dataset, it may be used as a pre-processing stage due to its competitive feature. However, IDGA is a strong measurement criterion in comparison with
the Laplace score and Fisher. After reducing dimensions and selecting statistically significant genes, IDGA is applied.
The evolutionary strategy is a numerical genetic algo- rithm with dynamic genotyping, smart adaptive parameters, and modified genetic operators. In a similar vein, some of the evaluations are random (populations including produced chro- mosomes have random length). The strategy of this algorithm is based on a genetic algorithm that uses chromosomes with a variable length and numerical coding for feature selection.
These chromosomes, which are used with numerical coding by compatible genetic operators, can lead to the efficacy of the algorithm. The concurrent speed of this algorithm leads to further motivation for application in very large dimensions.
In this algorithm, the Cross rate and compatible mutation rate were used based on the social strategy of reward and punish- ment. The IDGA algorithm obtains its parameters simply with respect to the quality of the discovered solutions in comparison with the total solutions (Liu et al. 2018).
Shekar and Dagnew used the regulators of L1 features and deep learning algorithms based on linear SVM for extracting features and classifying DNA microarrays. They found that extracting these features have a significant impact on identify- ing cancer. They used the Softmax activator function (Shekar and Dagnew 2019) for doing so.
Guia et al. proposed another algorithm based on deep learn- ing for classifying DNA microarrays and identifying features of microarrays in two-dimensional and three-dimensional forms. Using DNN of two-dimensional images with 95.65%
precision, it extracts features (Guia et al. 2019).
The hybrid method based on Ensamble learning of features according to the genetic algorithm is another method which firstly extracts significant features in a nested way by extracted genetic (Lee and Leu 2011). Then, it was experimented on a lung cancer dataset (Dong and Markovic 2018) by support vec- tor machine classifier of microarray class. The results revealed a 98.4 accuracy degree (Sayed et al. 2019).
The method introduced by Janse et al. is based on the feature selection of genes in microarray data according to a 2-stage genetic algorithm. The first stage is the selection of the information of relevant genes, and the second stage is to select the best candidates from the selected genes in the first stage.
Then, classification is done according to the support vector machine algorithm (Rani and Devaraj 2019).
Lu et al. proposed a novel pathological brain detection method using AlexNet and transfer learning. In order to apply AlexNet in pathological brain detection, they employed trans- fer learning (Lu et al. 2018).
Table 1 Function of KBCGS and six other gene selection methods (Chen et al. 2016)
Dataset Method ACC TPR FPR Time Dataset Method ACC TPR FPR ACC_IE
On the KNN 2-class dataset On the SVM 2-class dataset
AMLALL KBCGS 97.54 1 0.072 0.2411 AMLALL KBCGS 97.84 0.9872 0.04 86.36 ± 7.91
χ2-Statistic 97.89 0.9936 0.052 1.9400 χ2-Statistic 98.07 0.9915 0.04 90.48 ± 6.47
GINI 97.77 0.9957 0.056 22.6152 GINI 98.18 0.9936 0.04 91.57 ± 6.49
Info.Gain 97.43 0.9830 0.04 1.8732 Info.Gain 98.04 0.9915 0.04 90.25 ± 6.62
KW 95.82 0.9787 0.08 19.8651 KW 95.43 0.9766 0.088 83.58 ± 8.27
Relief-F 96.11 0.9787 0.072 7.8097 Relief-F 97.64 0.9851 0.04 85.22 ± 7.31
MRMR 97.29 0.9787 0.04 45.4010 MRMR 97.20 0.9830 0.048 90.21 ± 6.78
DLBCL KBCGS 98.45 0.9579 0.0069 0.2148 DLBCL KBCGS 98.25 0.9368 0.0017 83.89 ± 7.54 χ2-Statistic 96.29 0.9632 0.0362 1.7611 χ2-Statistic 96.34 0.9316 0.0259 88.54 ± 6.82
GINI 94.25 0.9421 0.0569 27.6935 GINI 95.91 0.8947 0.0207 87.46 ± 6.63
Info.Gain 97.52 0.9895 0.0293 1.4571 Info.Gain 96.23 0.9158 0.0224 87.02 ± 7.78
KW 90.91 0.9947 0.1190 17.5190 KW 91.68 0.9316 0.0879 84.63 ± 6.98
Relief-F 97.21 0.9421 0.0190 2.9264 Relief-F 97.77 0.9737 0.0207 80.05 ± 7.90
MRMR 97.39 0.9895 0.0310 43.3957 MRMR 97.55 0.9526 0.0172 83.75 ± 7.60
Lung KBCGS 87.00 0.8750 0.14 0.0370 Lung KBCGS 78.67 0.9042 0.4133 73.68 ± 14.57 χ2-Statistic 81.00 0.7417 0.0733 1.2936 χ2-Statistic 73.42 0.8167 0.4 71.78 ± 14.18
GINI 81.33 0.7500 0.0933 5.0624 GINI 72.75 0.8208 0.42 71.54 ± 14.43
Info.Gain 80.58 0.7458 0.1 0.5818 Info.Gain 72.33 0.8250 0.44 72.62 ± 14.76
KW 73.08 0.7875 0.36 5.7797 KW 63.17 0.7542 0.5733 70.63 ± 14.06
Relief-F 78.33 0.8250 0.28 1.0456 Relief-F 71.75 0.8042 0.42 70.26 ± 15.58
MRMR 74.08 0.7375 0.26 44.1737 MRMR 65.25 0.7458 0.4867 71.62 ± 14.20
Prostate KBCGS 95.17 0.9231 0.022 0.4503 Prostate KBCGS 94.71 0.9173 0.022 82.31 ± 10.72 χ2-Statistic 96.80 0.9558 0.02 3.0632 χ2-Statistic 96.67 0.9596 0.026 87.84 ± 7.11
GINI 95.47 0.9462 0.036 28.6469 GINI 96.85 0.9673 0.03 86.17 ± 6.60
Info.Gain 96.49 0.9500 0.02 3.6507 Info.Gain 95.88 0.9519 0.034 87.39 ± 6.20
KW 92.95 0.9481 0.09 42.9346 KW 95.47 0.9346 0.024 80.72 ± 8.01
Relief-F 94.21 0.9192 0.034 9.6568 Relief-F 92.37 0.9096 0.062 83.17 ± 6.96
MRMR 95.88 0.9423 0.024 38.8481 MRMR 95.60 0.9423 0.03 86.24 ± 7.78
Average KBCGS 94.54 0.94 0.06 0.24 Average KBCGS 92.37 0.9364 0.1193
χ2-Statistic 92.99 0.91 0.05 2.01 χ2-Statistic 91.13 0.9248 0.1230
GINI 92.21 0.91 0.06 21.00 GINI 90.92 0.9191 0.1277
Info.Gain 93.01 0.92 0.05 1.89 Info.Gain 90.62 0.9211 0.1341
KW 88.19 0.93 0.16 21.52 KW 86.44 0.8992 0.1933
Relief-F 91.47 0.92 0.10 5.36 Relief-F 89.88 0.9181 0.1357
MRMR 91.16 0.91 0.09 42.95 MRMR 88.90 0.9059 0.1455
Dataset Method ACC Kappa ACC_IE Dataset Method ACC Kappa Time (s)
On the multi-class KNN dataset On the multi-class SVM dataset
Brain_Tumor1 KBCGS 90.67 0.8033 75.84 ± 6.64 Brain_Tumor1 KBCGS 90 0.7963 0.2865 χ2-Statistic 87.67 0.7363 76.11 ± 5.48 χ2-Statistic 86.33 0.7040 1.4253
GINI 89.67 0.7852 76.38 ± 5.61 GINI 89.89 0.7931 38.9763
Info.Gain 90.89 0.8087 77.47 ± 5.51 Info.Gain 89.11 0.7824 1.4857
KW 83.33 0.6279 74.61 ± 6.73 KW 79 0.5329 15.6658
Relief-F 86.89 0.7149 75.20 ± 5.74 Relief-F 88.33 0.7541 3.4569
MRMR 88.11 0.7450 75.79 ± 6.16 MRMR 89 0.7714 35.1308
4 The proposed method
The use of a deep learning-based method and deep neural network for identifying and extracting information from microarray datasets for classifying cancer cells is consid- ered a novel approach. One of the most important learning approaches is the deep neural network that, in this arti- cle, has been considered. In our proposed method, GW- HDNN (grey wolf hybrid deep neural network), we used a grey wolf algorithm for extracting features and the DNN network for classifying microarray cancer datasets. The applied DNN network is a 15-layer deep neural network which consists of drop-out and dense layers. It includes 14 hidden layers and a final layer as the output. In this layer- ing, a fully connected layer was used, which is responsi- ble for connecting the available units in each layer. These connected layers are aimed at preventing the addition of layers and accelerating them in the network training; as a result, the inputs of the next layer are removed. Also,
ReLU is used as a non-linear activation function. Figure 3 shows the architecture of the proposed method. Drop-out function (0.01) was used for normalizing and batch reset- ting. The last defined layer in the DNN model is a fully connected layer that uses Sigmoid activation function for binary classification and, also, Softmax activation opera- tion for some classification cases. Table 2 gives the layout of the respective layers.
Figure 3 shows the flowchart and a general outline of the proposed method. The input of the set enters the proposed method. At first, the initial valuing is done based on the grey wolf algorithm. Then, the fitness function is computed for each input row. The best features are kept. The maximum input value is compared with the number of iteration. If the number of iterations is less than maximum input, the status of the features is updated, and the best current status is taken into consideration as the input of the first step. After this stage, the best extracted features are considered as the input of the DNN.
Table 1 (continued)
Dataset Method ACC Kappa ACC_IE Dataset Method ACC Kappa Time (s)
Lymphoma KBCGS 100 1 87.86 ± 6.11 Lymphoma KBCGS 100 1 0.2292
χ2-Statistic 100 1 86.89 ± 6.03 χ2-Statistic 100 1 0.9138
GINI 98.40 0.9670 85.77 ± 7.13 GINI 98.10 0.9603 15.7760
Info.Gain 100 1 86.49 ± 6.20 Info.Gain 100 1 0.9328
KW 98.38 0.9670 86.57 ± 6.31 KW 100 1 9.8692
Relief-F 100 1 85.90 ± 6.78 Relief-F 100 1 1.4333
MRMR 100 1 87.49 ± 5.81 MRMR 100 1 34.1350
NCI60 KBCGS 78.24 0.7503 43.63 ± 11.49 NCI60 KBCGS 80.52 0.7761 0.2004
χ2-Statistic 72.98 0.6883 39.99 ± 12.01 χ2-Statistic 75.29 0.7149 1.2021
GINI 33.67 0.2314 28.82 ± 9.12 GINI 38.50 0.2901 38.7968
Info.Gain 76.81 0.7326 39.99 ± 11.83 Info.Gain 78.05 0.7470 1.3379
KW 55.40 0.4861 38.11 ± 10.87 KW 66.76 0.6188 14.2269
Relief-F 75.24 0.7158 38.56 ± 12.91 Relief-F 73.02 0.6887 2.4940
MRMR 71.76 0.6733 40.12 ± 10.96 MRMR 75.69 0.7192 50.8011
SRBCT KBCGS 100 1 87.61 ± 8.78 SRBCT KBCGS 100 1 0.0768
χ2-Statistic 98.81 0.9833 86.27 ± 7.43 χ2-Statistic 99.75 0.9967 0.5811
GINI 97.60 0.9668 84.72 ± 6.62 GINI 98.65 0.9817 14.0525
Info.Gain 100 1 88.41 ± 6.34 Info.Gain 100 1 0.5623
KW 93.22 0.9061 76.7 ± 9.54 KW 99.28 0.9899 5.7861
Relief-F 100 1 90.36 ± 5.58 Relief-F 100 1 1.1407
MRMR 100 1 91.29 ± 5.64 MRMR 100 1 37.5048
Average KBCGS 92.23 0.89 Average KBCGS 92.63 0.89 0.20
χ2-Statistic 89.86 0.85 χ2-Statistic 90.34 0.85 1.03
GINI 79.83 0.74 GINI 81.28 0.76 26.90
Info.Gain 91.92 0.89 Info.Gain 91.79 0.88 1.08
KW 82.59 0.75 KW 86.26 0.79 11.39
Relief-F 90.53 0.86 Relief-F 90.34 0.86 2.13
MRMR 89.97 0.85 MRMR 91.17 0.87 39.39
The proposed DNN consists of 15 layers. The first and second layers are dense layers. Adding a fully connected layer is a cheap way for learning high-level non-linear com- binations of the features. The flattened output is fed to the feeder neural network, and “go back-” is applied in each iteration of the training. The next layer is the “drop-out,”
which is responsible for controlling the phenomenon of over-fitting. Over-fitting occurs when the neural network learns well on the training data, but it is not generalizable on the experimental set. In this method, the inputs of the
next layer are removed with 0.2 probability; then, they are re-trained. Basically, the release layer has no impact on the input or output of the next layer. It is only used for control- ling proper training. The next layer is a dense layer, which is a layer with a non-linear function on previous inputs. This layer is applied to the inputs and on a non-linear function, namely Relu, so that learning neural networks is improved.
The next layer is another release layer with 0.2 probability.
This procedure continues until the 14th layer. The final layer is a fully-connected layer with a Sigmoid activation function.
This layer sends the input probability values of the previous 14 layers to the respective class. The input microarray data of the grey wolf algorithm is placed in its respective class after 15 layers.
5 Simulation results
With respect to the above-mentioned discussion, the results of the proposed method are examined and evaluated here.
The proposed method was experimented on three cancer datasets, BRCA, LUAD, and STAD, as the datasets which were related to microarrays for cancer detection. For imple- menting the proposed method, Python language and Tensor- flow packages were used. Figure 3 depicts an error function for approximately 500 iterations of the proposed method and error reduction for BRCA. Training error was continuously reduced thanks to the proper use of the layout of the drop-out layers (release) and dense layer. It was fixed at 0.05, which is regarded as a desirable value. Furthermore, Table 3 gives
Fig. 3 The flowchart of proposed method
Table 2 Arrangement of analysis layers in proposed method
Layer (type) Output shape Param #
dense_253 (Dense) None, 160 3680
dense_254 (Dense) None, 164 26404
dropout_169 (Dropout) None, 164 0
dense_255 (Dense) None, 128 21120
dropout_170TT (Dropout) None, 128 0
dense_256 (Dense) None, 256 33924
dropout_171 (Dropout) None, 256 0
dense_257 (Dense) None, 132 33924
dropout_172 (Dropout) None, 132 0
dense_258 (Dense) None, 160 21280
dropout_173 (Dropout) None, 160 0
dense_259 (Dense) None, 80 1288
dropout_174 (Dropout) None, 80 0
dense_260 (Dense) None, 6 486
dense_261 (Dense) None, 1 7
a synopsis of the comparison of the proposed method with seven other algorithms. The proposed plan had the highest recall, accuracy, and f1 criterion values on the BRCA data- set. As shown in Fig. 4, the value of loss function for this dataset was negligible (0.05). If the value of error function regarding the LUAD dataset is analyzed, it will be found that the error, at first, had an ascending trend in 500 iterations, and it approached 0.1 value. However, within 300 iterations, the error value had an ascending tick up to 2.5. Then, the proposed method was able to desirably control the amount of error, and it was fixed at 0.1.
Figure 5 shows the diagram of the loss function and accu- racy of the proposed method for the LUAD dataset. Error function for this dataset has some fluctuations, which may be attributed to the over-fitting phenomenon. Then, the drop- out layer could error degree very well and reduce it up to 0.1 value. In 300 iterations, the error value was high, which might be due to the learning rate. Of course, the fluctua- tion was controlled very well and was reduced to the fixed
amount of 0.1 learning rate for the proposed method was regarded as 0.1.
As shown in Table 4, the accuracy of the proposed method for the LUAD dataset was close to 100, i.e., 99.89. In other words, in all evaluation criteria, the proposed method had the highest values. The value of the error function for the LUAD dataset was almost 0.1. Given the comparison of the proposed method with seven other algorithms, the preci- sion of the proposed method was almost 8% better than the best algorithm (Naive Bayes). The recall value of the pro- posed method was 2% better than the best algorithm (Naive Bayes); also, the f1 value of the proposed method was 4%
better than Naive Bayes. Regarding accuracy, the proposed method managed to optimize it for 12%.
Figure 6 depicts a small amount of error function on the STAD dataset. After ten iterations, the amount of error became fixed. That is, like the LUAD dataset, it was 0.1. In this dataset, the value of the error function was reduced very well, and it
Table 3 Results for BRCA dataset
Algorithm Precision Recall F1-score Accuracy
GW-HDNN 95 100 97 99.19
Naive Bayes 90 95 92 94.94
SVM(rbf) 91 95 93 95.344
SVM(linear) 90 94 93 94.941
Logistic regression 96 94 96 95.546
Decision tree 91 93 92 92.914
One VS all 89 94 91 93.94
K-nearest neighbor 91 95 93 95.344
Xgboost 47.8 49.82 48.78 95.29
LightGBM 47.72 50 48.83 95.45
CNN 47.81 50 48.88 95.62
Fig. 4 The loss function associated with the BRCA dataset for the proposed method
Fig. 5 The loss function associated with the LUAD dataset
Table 4 Results for the LUAD dataset
Algorithm Precision Recall F1-score Accuracy
GW-HDNN 96 89.68 92.732 99.89
Naive Bayes 88 87 88 87.719
SVM(rbf) 74 86 79 85.964
K-nearest neighbor 74 86 79 85.964
SVM(linear) 84 87 83 86.421
Logistic regression 79 87 82 86.842
Decision tree 79 78 78 87.07
One VS all 89 88 83 87.709
Xgboost 71.69 66.24 76.01 90.35
LightGBM 70.53 54.06 61.2 90.35
CNN 50.61 30.95 38.41 84.53
had an ascending peak in almost 4 to 6 iterations. Then, the ascend was stabilized, and the error amount decreased.
As shown in Fig. 6, the proposed method for the STAD dataset outperformed the other methods remarkably with respect to the evaluation criteria. The proposed method was able to achieve an accuracy value of 99.37 and a precision value of 100. If the STAD dataset is examined with regard to the results given in Table 5, it will indicate that the proposed method is 5% better than the other algorithms. In a similar vein, its recall value was almost 107% better than the best algorithm (linear regression); f1 value for the proposed method was 4% better than the closest neighbor. The accuracy value was 2% better than the linear regression. To sum it up, the results of the proposed method were better than those of other algorithms in all the criteria.
6 Conclusion
The selection of related and informative genes for cancer classification is a common task in most high-throughput gene expression studies. DNA microarray can detect the expression levels of thousands of genes under various experimental conditions. Also, microarray technology helps researchers learn about different kinds of diseases, especially cancer. Biomarker gene selection is useful for early detection and effective treatment of cancer. Early diagnosis of cancer generally increases the chances for successful treatment by focusing on detecting gene expres- sion profiling patients. In this paper, we used a hybrid method based on a grey wolf for extracting features and 15-layer DNN neural network for detecting cancer from microarrays. The proposed method obtained 99.37 accu- racy value on the STAD dataset, 99.19 value on the BRCA dataset, and 99.89 value on the LUAD dataset, which were the greatest values among all the related methods. Indeed, the proposed method was compared linear support vector machine, RBF, the closest neighbor, linear regression one versus all, Naive Bayes, and decision tree with regard to the evaluation criteria. In most cases, the proposed method was remarkably better than other algorithms. As given in 3 tables and depicted in 3 figures, the proposed method was compared with seven algorithms.
The proposed method managed to operate better than the other seven algorithms with regard to the criteria of preci- sion, accuracy, recall, and f1. The amount of error in the two datasets of LUAD and STAD was 0.1; the amount error in the BRCA dataset was much better (0.05). The investigations indicated that the proposed method had a 0.57 improvement of Xiao et al. (2018) method on the LUAD dataset. It had a 1.11 improvement on the STAD dataset and 0.78 improve- ment on the BRCA dataset. Thanks to its desired results and the proximity of the accuracy value (the closeness of the predicted value to real value), the proposed hybrid method is considered to be harmless, costless, reliable, and fast in detecting cancer with minimum error in reducing death caused by cancer.
As a direction for further research, multi-task learning based on deep learning can be used as separate tasks. Also, the combination of meta-heuristic algorithms with the pro- posed method, such as grey wolf, genetic algorithm, and other methods can be used for detecting cancer along with deep learning.
References
Abdel-Zaher AM, Eldeib AM (2016) Breast cancer classification using deep belief networks. Expert Syst Appl 46:139–144
Fig. 6 The loss function associated with the STAD dataset
Table 5 Results for the STAD dataset
AAlgorithm Precision Recall F1-score Accuracy
GW-HDNN 100 98.73 99.3 99.37
Naive Bayes 78 72 72 71
SVM(rbf) 77 60 51 60
K-nearest neighbor 93 89 95.9 92.5
SVM(linear) 95 97 95.982 86.421
Logistic regression 98 97 98 97.5
Decision tree 95 97 97 92.5
One VS all 95 94 94.49 95
Xgboost 98.71 97.72 98.21 98.33
LightGBM 97.5 95.45 96.46 96.67
CNN 96.88 96.15 96.51 96.39
Aggarwal CC (2018) Neural networks and deep learning: a textbook.
Springer, 497 p
Agrawal S, Agrawal J (2015) Neural network techniques for cancer prediction: a survey. neural network techniques for cancer predic- tion: a survey. Proced Comput Sci 60:769–774
Bunz F (2016) Principles of cancer genetics. Springer, p 343 Butterfield LH, Kaufman HL, Marincola FM (2017) Cancer immuno-
therapy principles and practice. Demos Medical, p 920
Chen D, Liu Z, Ma X, Hua D (2005) Selecting genes by test statistics.
J Biomed Biotechnol 2005(2):132–138
Chen H, Zhang Y, Gutman I (2016) A kernel-based clustering method for gene selection with gene expression data. J Biomed Inform 62:12–20
Dashtban M, Balafar M (2017) Gene selection for microarray cancer classification using a new evolutionary method employing artifi- cial intelligence concepts. Genomics 109(2):91–107
Dong H, Markovic SN (2018) The basics of cancer immunotherapy.
Springer, New York, p 172
Dwivedi AK (2018) Artificial neural network model for effective can- cer classification using microarray gene expression data. Neural Comput Appl 29(12):1545–1554
Fakoor R, Ladhak F, Nazi A, Huber M (2013) Using deep learning to enhance cancer diagnosis and classification. In Proceedings of the international conference on machine learning. New York, USA: ACM
Frigui H, Nasraoui O (2000) Simultaneous clustering and attribute discrimination. Ninth IEEE Int Conf Fuzzy Syst. https ://doi.
org/10.1109/FUZZY .2000.83865 1
Gao L, Ye M, Lu X, Huang D (2017) Hybrid method based on informa- tion gain and support vector machine for gene selection in cancer classification. Genom Proteom Bioinform 15(6):389–395 Gray JW, Collins C (2000) Genome changes and gene expression in
human solid tumors. Carcinogenesis 21:443–452
Guia JM, Devaraj M, Leung CK (2019) DeepGx: deep learning using gene expression for cancer classification. In: ACM 2019 IEEE/
ACM international conference on advances in social networks analysis and mining. https ://doi.org/10.1145/33411 61.33435 16 Knudson AG (2000) Chasing the cancer demon. Ann Rev Genet
34:1–19
Lee CP, Leu Y (2011) A novel hybrid feature selection method for microarray data analysis. Appl Soft Comput 11(1):208–213 Liu S, Xu C, Zhang Y, Liu J, Yu B, Liu X, Dehmer M (2018) Feature
selection of gene expression data for Cancer classification using double RBF-kernels. BMC Bioinformatics 19:396
Lo SB, Lou SA (1995) Artificial convolution neural network techniques and applications for lung nodule detection. IEEE Trans Med Imag- ing 14(4):711–718
Lu S, Lu Z, Zhang Y-D (2018) Pathological brain detection based on alexnet and transfer learning. J Comput Sci. https ://doi.
org/10.1016/j.jocs.2018.11.008
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Mishra S, Shaw K, Mishra D (2012) A new meta-heuristic bat inspired classification approach for microarray data. Proc Tech- nol 4:802–806
Moslehi F, Haeri A (2019) A novel hybrid wrapper–filter approach based on genetic algorithm, particle swarm optimization for feature subset selection. J Ambient Intell Human Comput 11:1105–1127
Motieghader H, Ali Najafi A, Sadeghi B, Masoudi-Nejad A (2017) A hybrid gene selection algorithm for microarray cancer classifica- tion using genetic algorithm and learning automata. Inform Med Unlock 9:246–254
Olyaee S, Dashtban Z, Dashtban MH (2013) Design and implementa- tion of super-heterodyne nano-metrology circuits. Front Optoelec- tron 6(3):318–326
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Ram M, Najafi A, Shakeri MT (2017) Classification and biomarker genes selection for cancer gene expression data using random for- est. Iran J Pathol 12(4):339–347
Rani MJ, Devaraj D (2019) Two-stage hybrid gene selection using mutual information and genetic algorithm for cancer data clas- sification. J Med Syst 43(8):235
Sayed S, Nassef M, Badr A, Farag I (2019) A nested genetic algorithm for feature selection in high-dimensional cancer microarray data- sets. Expert Syst Appl 121:233–243
Sharma A, Paliwal KK (2008) Cancer classification by gradient LDA technique using microarray gene expression data. Data Knowl Eng 66(2):338–347
Shekar BH, Dagnew G (2019) L1-regulated feature selection and clas- sification of microarray cancer data using deep learning. In: Pro- ceedings of 3rd international conference on computer vision and image processing, pp 227–242
Tabakhi S, Najafi A, Ranjbar R, Moradi P (2015) Gene selection for microarray data classification using a novel ant colony optimiza- tion. Neurocomputing 168:1024–1036
Tavakoli N, Maryam Karimi M, Norouzi A, Karimi N, Samavi S, Soroushmehr SMR (2019) Detection of abnormalities in mam- mograms using deep features. J Ambient Intell Hum Comput.
https ://doi.org/10.1007/s1265 2-019-01639 -x
Varadharajan R, Priyan MK, Panchatcharam P et al (2018) A new approach for prediction of lung carcinoma using backpropaga- tion neural network with decision tree classifiers. J Ambient Intell Human Comput. https ://doi.org/10.1007/s1265 2-018-1066-y Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KFX, Mewes
HW (2005) Gene selection from microarray data for cancer clas- sification—a machine learning approach. Comput Biol Chem 29(1):37–46
Wilmott P (2019) Machine learning: an applied mathematics introduc- tion. Panda Ohana Publishing, p 242
Yang J, Liu YL, Feng CS, Zhu GQ (2016) Applying the fisher score to identify alzheimer’s disease-related genes. Genet Mol Res 15(2):gmr15028798
Young RA (2000) Biomedical discovery with DNA arrays. Cell 102:9–15
Zhou M, Luo Y, Sun G, Mai G, Zhou F (2015) Constraint program- ming based biomarker optimization. Biomed Res Int 2015:910515 Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.